Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Blameless Postmortems - Security by Inclusion

josh_robb
December 10, 2014

Blameless Postmortems - Security by Inclusion

josh_robb

December 10, 2014
Tweet

More Decks by josh_robb

Other Decks in Technology

Transcript

  1. Pushpay 17 Members of Technical Staff (Engineering) Continuous Delivery Mobile

    Apps PCI DSS Level 1 Devops/#chatops (really Slackops) Heavy Code review culture METRICS (4,933 or so currently)
  2. Origins Regardless of what we discover, we understand and truly

    believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand. Retrospective Prime Directive (2001) Norman Kerth
  3. Etsy Blameless Postmortems and a Just Culture - John Allspaw

    (2012) Human Factors Research (in Healthcare and Aviation)
  4. SR71 Blackbird Then you'd debrief for an hour or more,

    [...] these [planes] were all hand-built, so you had to go through things with the other pilots and engineers like "I had this happen, and it's not in the checklist." Then another pilot would say, "I saw something like that before," and go back and try to correlate it.
  5. SR71 Blackbird You were all working on it together, and

    there were no secrets. You'd say "I screwed this up" to everyone in order to grow the knowledge base.
  6. Why? You want multiple and diverse perspectives. You get these

    by asking people for their own narratives. Effectively, you’re asking “how?“ Asking “why?” too easily gets you to an answer to the question “who?” The Infinite hows (not 5 whys) John Alspaw (again!) (2014)
  7. Why? Continuous improvement Increased quality Safe - people (more) willing

    to say if they’re under trained for the situations they find themselves in More secure
  8. Not “How did this happen” BUT “What can we do

    to prevent this happening next time”
  9. NOT A WHIP It’s tempting to “make” someone write a

    postmortem up. It’s tempting to use them as performance reviews or to pressure people. DONT
  10. Lead by example Your behaviour sets the tone Be the

    first to write up postmortems Watch YOUR tone Coach others privately on tone “It’s not YOUR fault”
  11. Etsy (again) An engineer who thinks they’re going to be

    reprimanded is disincentivized to give the details necessary to get an understanding of the mechanism, pathology, and operation of the failure. This lack of understanding of how the accident occurred all but guarantees that it will repeat. If not with the original engineer, another one in the future.
  12. Look at what went well We try to mention/“celebrate” previous

    mitigations which reduced the blast radius this time around.
  13. Write it up We have a template 1. Timeline (reconstructed

    from slack chatops #situation-room and #devops channels) 2. Discussion. What happened. Assumptions. Other factors. 3. Mitigation - What can we do to stop this next time?
  14. Good mitigations cont’d • Owned by an individual • Tracked

    • Followed through • For us - this means a JIRA ticket
  15. Postmortem Postmortem One thing we’re going to do next week

    is a postmortem on our postmortems. (Well - retrospective - but that sounds less recursive) What could we do better?