Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trust, Just Culture, and Blameless Post-Mortem KCDC

Trust, Just Culture, and Blameless Post-Mortem KCDC

A wondering meditation on reflective meetings I have run or been part of over the last decade.

Aaron Blythe

August 04, 2017
Tweet

More Decks by Aaron Blythe

Other Decks in Technology

Transcript

  1. @ablythe
    GALLUP POLL
    1/3 of workers are engaged at work

    View full-size slide

  2. @ablythe
    1/3 ARE ENGAGED AT WORK (GALLUP)
    • 80,844 adults working for an employer
    • Key Indicators
    • opportunity to do what they do best
    each day
    • someone at work who encourages
    their development
    • believing their opinions count at
    work
    51%
    17%
    32%
    Engaged Actively Disengaged
    Not Engaged

    View full-size slide

  3. @ablythe
    GALLUP DEFINITIONS
    • Engaged: Employees are highly involved in and enthusiastic about their work and
    workplace. They are psychological "owners," drive performance and innovation, and move the
    organization forward.
    • Not engaged: Employees are psychologically unattached to their work and company.
    Because their engagement needs are not being fully met, they're putting time -- but not
    energy or passion -- into their work.
    • Actively disengaged: Employees aren't just unhappy at work -- they are resentful
    that their needs aren't being met and are acting out their unhappiness. Every day, these
    workers potentially undermine what their engaged coworkers accomplish.

    View full-size slide

  4. @ablythe
    40 HOUR WORK WEEK? (GALLUP - 2014)

    View full-size slide

  5. @ablythe
    Aaron Blythe (@ablythe)
    • Lead Organizer
    @devopskc
    @devopsdayskc

    View full-size slide

  6. @ablythe
    http://aaronblythe.org/

    View full-size slide

  7. @ablythe
    TRUST, JUST CULTURE AND BLAMELESS
    POST MORTEMS
    Aaron Blythe

    View full-size slide

  8. @ablythe
    Trust

    View full-size slide

  9. @ablythe
    Justice

    View full-size slide

  10. @ablythe
    BLAMELESS
    POST-MORTEM

    View full-size slide

  11. @ablythe
    !
    Jennifer - Executive

    View full-size slide

  12. @ablythe
    !
    "Jennifer - Executive
    Rob - Manager

    View full-size slide

  13. @ablythe
    !
    "Jennifer - Executive
    Rob - Manager

    View full-size slide

  14. @ablythe
    !
    "
    #
    Jennifer - Executive
    Rob - Manager
    Ethan
    Sr. Engineer

    View full-size slide

  15. @ablythe
    !
    "
    $
    #
    Jennifer - Executive
    Rob - Manager
    Ethan
    Sr. Engineer
    Shelly
    Sr. Engineer

    View full-size slide

  16. @ablythe
    !
    "
    %
    $
    #
    Jennifer - Executive
    Rob - Manager
    Ethan
    Sr. Engineer
    Shelly
    Sr. Engineer
    Tabitha
    Engineer

    View full-size slide

  17. @ablythe
    !
    "
    %
    $
    # &
    Jennifer - Executive
    Rob - Manager
    Ethan
    Sr. Engineer
    Shelly
    Sr. Engineer
    Tabitha
    Engineer
    Michael
    Engineering Intern

    View full-size slide

  18. @ablythe
    $ Shelly
    Shelly
    Shelly: I think the site is down
    Rob: The whole site is down?
    Shelly: I think so
    Rob: Who changed something?
    Shelly: I don’t know
    Rob: Why not? How the hell can we not know?
    Get it back up.
    Shelly: We don’t know how, and Ethan is not at his
    desk
    $
    Shelly
    Sr. Engineer
    "
    Rob - Manager
    Well I am going to come and find out
    Rob

    View full-size slide

  19. @ablythe
    %
    $ &
    Shelly
    Sr. Engineer
    Tabitha
    Engineer
    Michael
    Engineering Intern

    View full-size slide

  20. @ablythe
    "
    %
    $ &
    Rob - Manager
    Shelly
    Sr. Engineer
    Tabitha
    Engineer
    Michael
    Engineering Intern

    View full-size slide

  21. @ablythe
    "
    %
    $
    # &
    Rob - Manager
    Ethan
    Sr. Engineer
    Shelly
    Sr. Engineer
    Tabitha
    Engineer
    Michael
    Engineering Intern

    View full-size slide

  22. @ablythe
    %
    $
    # &
    Ethan
    Sr. Engineer
    Shelly
    Sr. Engineer
    Tabitha
    Engineer
    Michael
    Engineering Intern

    View full-size slide

  23. @ablythe
    http://sidneydekker.com/just-culture/

    View full-size slide

  24. @ablythe
    Cheyne Horan

    View full-size slide

  25. @ablythe
    http://sidneydekker.com/just-culture/
    @sidneydekkercom

    View full-size slide

  26. @ablythe
    RETRIBUTIVE JUST CULTURE
    • Which rule is broken?
    • Who did it?
    • How bad was the breach, and what
    should the consequences be?
    • Who gets to decide this?

    View full-size slide

  27. @ablythe
    RETRIBUTIVE
    RESTORATIVE JUST CULTURE
    • Which rule is broken?
    • Who did it?
    • How bad was the breach, and what
    should the consequences be?
    • Who gets to decide this?
    • Who is hurt?
    • What do they need?
    • Whose obligation is it to meet that
    need?
    • How do you involve the community in
    this conversation?

    View full-size slide

  28. @ablythe
    Trust

    View full-size slide

  29. @ablythe
    Justice

    View full-size slide

  30. @ablythe
    THIRD WAY: CULTURE OF CONTINUOUS LEARNING
    Dev Ops
    (Business) (Customers)

    View full-size slide

  31. @ablythe
    CHOOSE ONE
    •Learn
    @ablythe
    •Blame

    View full-size slide

  32. @ablythe
    RETRIBUTIVE CULTURE
    •Which rule was broken?
    •Who did it?
    •How bad was the breach?
    •What should the consequences be?
    • Who gets to decide?

    View full-size slide

  33. @ablythe
    RESTORATIVE CULTURE
    •Who is hurt?
    •What are their needs?
    •Whose obligation is it to meet those needs?
    • How do you involve the community in this conversation?

    View full-size slide

  34. @ablythe
    SIDNEY DEKKER – JUST CULTURE

    View full-size slide

  35. @ablythe
    •Retributive Culture
    • You pay or settle account
    • Backward-looking accountability
    • Who is responsible?
    •Restorative Culture
    • You tell account
    • Forward-looking accountability
    • What is responsible?
    @ablythe

    View full-size slide

  36. @ablythe
    MAKE IT SAFE TO FAIL
    @ablythe

    View full-size slide

  37. @ablythe
    NETFLIX
    “… massive outage… It was
    caused by, quite frankly, a dumb
    mistake. In fact by an engineer
    who had taken down Netflix
    twice in the last 18 months…”

    View full-size slide

  38. @ablythe
    NETFLIX
    “… in the same 18 months
    that engineer moved …
    … forward not by
    miles but by light years.”
    @ablythe

    View full-size slide

  39. @ablythe
    WHAT HAPPENS WHEN IT IS NOT SAFE
    TO FAIL?
    • Hiding
    • Secrecy
    • Evasion
    • Self-protection
    • Finger-pointing
    • REPETITION OF ERRORS

    View full-size slide

  40. @ablythe
    Fremont Assembly Plant
    http://en.wikipedia.org/wiki/Fremont_Assembly
    47

    View full-size slide

  41. @ablythe
    NUMMI plant
    http://en.wikipedia.org/wiki/NUMMI
    48

    View full-size slide

  42. @ablythe
    Tesla Factory
    http://en.wikipedia.org/wiki/Tesla_Factory
    49

    View full-size slide

  43. @ablythe
    Netflix Culture Deck
    50
    https://jobs.netflix.com/culture

    View full-size slide

  44. @ablythe
    My Favorite Slides
    51

    View full-size slide

  45. @ablythe
    Adrian Cockcroft - Formerly Netflix
    52

    View full-size slide

  46. @ablythe
    3 TYPES OF MEETINGS
    • Root Cause Analysis (2007-2010)
    • Team Retrospective Meetings (2010-Now)
    • Post-Mortem (2014-Now)

    View full-size slide

  47. @ablythe
    ME IN 2007/2008 - “5 WHY’S”

    View full-size slide

  48. @ablythe
    Mars Land Rover
    $125 Million loss
    English to Metric Conversion
    Intel’s Math Error
    $475 Million against earning
    Math rounding error at 9 significant digits
    Ariane 5 Explosion
    $370 Million loss
    Integer Overflow

    View full-size slide

  49. @ablythe
    Therac-25

    View full-size slide

  50. @ablythe
    ME IN 2007/2008 - “5 WHY’S”

    View full-size slide

  51. @ablythe
    PARETO’S 80/20 RULE

    View full-size slide

  52. @ablythe
    ME IN 2007/2008 - “5 WHY’S”

    View full-size slide

  53. @ablythe
    –Taylor Swift
    “Haters gonna hate,
    hate,
    hate,
    hate”

    View full-size slide

  54. @ablythe
    • Grenade Person
    • Know-it-alls
    • Maybe Person
    • No Person
    • Nothing Person
    • Snipers
    • Tanks
    • Think-they-know-it-alls
    • Whiners
    • Yes Person

    View full-size slide

  55. @ablythe
    From the Introduction:
    "it should in no way be associated with
    that great body of factual information
    relating to orthodox Zen Buddhist
    practice. It's not very factual on
    motorcycles, either.”

    View full-size slide

  56. @ablythe
    • Romantic - a friend of the narrator decides not to learn how
    to maintain his expensive new motorcycle. When something on
    the bike breaks he is frustrated and needs to rely on
    professional mechanics to repair it.
    • Classical - the narrator has an older bike that he is usually
    able to diagnose and repair through rational problem solving.

    View full-size slide

  57. @ablythe
    – Kurt Vonnegut, Hocus Pocus
    “Another flaw in the human character is
    that everybody wants to build and
    nobody wants to do maintenance.”

    View full-size slide

  58. @ablythe
    5 WHY’S HAVE FALLEN OUT OF FAVOR
    • https://www.kitchensoap.com/2014/11/14/the-infinite-hows-or-the-
    dangers-of-the-five-whys/
    • Really asking “How?” and doing this in a group is important
    • Even though this is easy to grasp, it is tunnel-visioned

    View full-size slide

  59. @ablythe
    –Adam Gale, President, KLAS
    “As a result of these and other changes,
    Cerner’s KLAS ranking has skyrocketed,
    moving from seventh to second in a
    four-year period (December 2007 to
    December 2011).”
    http://healthsystemcio.com/2012/04/09/how-cerner-was-able-to-turn-the-corner/

    View full-size slide

  60. @ablythe
    POST MORTEM MEETING
    • Before meeting:
    • Time line of incident - facts,
    assumptions, expectations
    • During meeting
    • Level set expectations
    • Discuss without Blame
    • Only take action items that can be
    assigned and completed in next week

    View full-size slide

  61. @ablythe
    Machine setup
    Machine admin
    Application
    Zabbix
    Infra Team
    Operations Team
    Development
    Team **
    SAN (Storage Area Network)

    View full-size slide

  62. @ablythe
    Machine setup
    Machine admin
    Application
    Zabbix
    Infra Team
    Operations Team
    Development
    Team **
    SAN (Storage Area Network)

    View full-size slide

  63. @ablythe
    POST MORTEM MEETING
    • Before meeting:
    • Time line of incident - facts,
    assumptions, expectations
    • During meeting
    • Level set expectations
    • Discuss without Blame
    • Only take action items that can be
    assigned and completed in next week

    View full-size slide

  64. @ablythe
    Machine setup
    /etc/multipath.conf
    Machine admin
    Application
    Zabbix
    Infra Team
    Operations Team
    Development
    Team **
    SAN (Storage Area Network)

    View full-size slide

  65. @ablythe
    COGNITIVE BIASES
    • Hindsight Bias
    • Outcome Bias
    • Availability Bias (AKA Recency Bias)
    • Sunk Cost Bias
    • Confirmation Bias

    View full-size slide

  66. @ablythe
    RETROSPECTIVE MEETINGS
    • 10 minutes to quietly review the past two weeks
    • Write down 3 biggest accomplishments (team or individual)
    • Discussion and classification

    • Thank you (chance to formally in front of everyone thank someone on the team)
    • Action Items (to be followed up on next meeting)
    • Post publicly

    View full-size slide

  67. @ablythe
    “We Believe”
    What do we want people to
    view our team as?

    View full-size slide

  68. @ablythe
    – Kurt Vonnegut, Sirens of Titan
    “Now, you can say your Daddy is right and the other
    little child's Daddy is wrong, but the universe is an
    awfully big place. There is room enough for an awful
    lot of people to be right about things and still not
    agree.”

    View full-size slide

  69. @ablythe
    ALTERNATE UNIVERSE

    View full-size slide

  70. @ablythe
    ONE CHANGE:
    REGULAR RETROSPECTIVE MEETINGS

    View full-size slide

  71. @ablythe
    "
    Rob - Manager

    View full-size slide

  72. @ablythe
    "
    Rob - Manager

    View full-size slide

  73. @ablythe
    #
    Ethan
    Sr. Engineer

    View full-size slide

  74. @ablythe
    $
    Shelly
    Sr. Engineer

    View full-size slide

  75. @ablythe
    %
    Tabitha
    Engineer

    View full-size slide

  76. @ablythe
    &
    Michael
    Engineering Intern

    View full-size slide

  77. @ablythe
    DIFFERENCES THIS TIME AROUND
    • Group Chat (Slack)
    • Alert assigned to rotation
    • Alert Posted to Group Chat
    • Acknowledgement visible to team
    • Code build in CI/CD
    • Rollback switch DNS back (10-min mark)
    • Blameless Post-Mortem Scheduled
    • One-on-one IM
    • Alert just to Shelly’s email
    • Alert just to Shelly’s email
    • Shelly forwarding email/IM’ing people
    • Code manually deployed by Ethan
    • Rollback manually removing code
    • Team left defeated/dejected

    View full-size slide

  78. @ablythe
    BLAMELESS POST-MORTEM
    • Test Environment just like Prod (found differences between two)
    • Use dns-a and dns-b (As do today)
    • However test before making the switch

    View full-size slide

  79. @ablythe
    Trust

    View full-size slide

  80. @ablythe
    Justice

    View full-size slide

  81. @ablythe
    …BUILDING A HIGH TRUST CULTURE IS
    LIKELY THE LARGEST MANAGEMENT
    CHALLENGE OF THIS DECADE.
    Gene Kim

    View full-size slide

  82. @ablythe
    TYPES OF MEETINGS
    • Root Cause Analysis Meeting (Monthly)
    • Post Mortem Meeting (per Incident)
    • Retrospective Meeting (Fortnightly)

    View full-size slide

  83. @ablythe
    • To Err is Human
    • Blame does NOT do what you think it does
    • Group reflection is key - regardless of what type of meeting you have
    • Justice comes in more forms that Retributive

    View full-size slide

  84. @ablythe
    –Kurt Vonnegut, Slaughterhouse Five
    “I think about my education sometimes. I went to the University of
    Chicago for awhile after the Second World War. I was a student in the
    Department of Anthropology. At that time they were teaching that there
    was absolutely no difference between anybody.


    They may be teaching that still.


    Another thing they taught was that no one was ridiculous or bad or
    disgusting. Shortly before my father died, he said to me, ‘You know – you
    never wrote a story with a villain in it.’


    I told him that was one of the things I learned in college after the war.”

    View full-size slide

  85. @ablythe
    Reed Hastings
    Culture Deck
    Paul Graham
    Makers Schedule vs.
    Manager’s Schedule
    John Allspaw
    Blameless PostMortems
    and a
    Just Culture
    Dr. Rick Brinkman
    Dr. Rick Kirchner
    Dealing with People
    You Can’t Stand
    David Zweiback
    Human Side of
    Postmortems
    Sidney Dekker
    Just Culture

    View full-size slide

  86. @ablythe
    HTTP://AARONBLYTHE.ORG/

    View full-size slide