Slide 1

Slide 1 text

Blameless Postmortems Security by Inclusion

Slide 2

Slide 2 text

VP Engineering - Pushpay @josh_robb Me

Slide 3

Slide 3 text

Pushpay 17 Members of Technical Staff (Engineering) Continuous Delivery Mobile Apps PCI DSS Level 1 Devops/#chatops (really Slackops) Heavy Code review culture METRICS (4,933 or so currently)

Slide 4

Slide 4 text

What Why How

Slide 5

Slide 5 text

Origins Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand. Retrospective Prime Directive (2001) Norman Kerth

Slide 6

Slide 6 text

Etsy Blameless Postmortems and a Just Culture - John Allspaw (2012) Human Factors Research (in Healthcare and Aviation)

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

SR71 Blackbird Then you'd debrief for an hour or more, [...] these [planes] were all hand-built, so you had to go through things with the other pilots and engineers like "I had this happen, and it's not in the checklist." Then another pilot would say, "I saw something like that before," and go back and try to correlate it.

Slide 9

Slide 9 text

SR71 Blackbird You were all working on it together, and there were no secrets. You'd say "I screwed this up" to everyone in order to grow the knowledge base.

Slide 10

Slide 10 text

Why? You want multiple and diverse perspectives. You get these by asking people for their own narratives. Effectively, you’re asking “how?“ Asking “why?” too easily gets you to an answer to the question “who?” The Infinite hows (not 5 whys) John Alspaw (again!) (2014)

Slide 11

Slide 11 text

Why -> Who is responsible How -> What is responsible

Slide 12

Slide 12 text

Why? Continuous improvement Increased quality Safe - people (more) willing to say if they’re under trained for the situations they find themselves in More secure

Slide 13

Slide 13 text

Not “How did this happen” BUT “What can we do to prevent this happening next time”

Slide 14

Slide 14 text

Not a whip

Slide 15

Slide 15 text

NOT A WHIP It’s tempting to “make” someone write a postmortem up. It’s tempting to use them as performance reviews or to pressure people. DONT

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Lead by example Your behaviour sets the tone Be the first to write up postmortems Watch YOUR tone Coach others privately on tone “It’s not YOUR fault”

Slide 19

Slide 19 text

Etsy (again) An engineer who thinks they’re going to be reprimanded is disincentivized to give the details necessary to get an understanding of the mechanism, pathology, and operation of the failure. This lack of understanding of how the accident occurred all but guarantees that it will repeat. If not with the original engineer, another one in the future.

Slide 20

Slide 20 text

Look at what went well We try to mention/“celebrate” previous mitigations which reduced the blast radius this time around.

Slide 21

Slide 21 text

How Slack/Chat - not in person Alpha geek doesn’t shout the loudest in chat (sometimes)

Slide 22

Slide 22 text

Write it up We have a template

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Write it up We have a template 1. Timeline (reconstructed from slack chatops #situation-room and #devops channels) 2. Discussion. What happened. Assumptions. Other factors. 3. Mitigation - What can we do to stop this next time?

Slide 25

Slide 25 text

Timeline What happened? When? ● Assumptions ● Workarounds ● Solutions

Slide 26

Slide 26 text

Discussion Metrics Things which are being investigated Possible resolutions/mitigations Assumptions

Slide 27

Slide 27 text

Good mitigations ● Anticipate ● Monitor ● Respond ● Learn

Slide 28

Slide 28 text

Good mitigations cont’d ● Owned by an individual ● Tracked ● Followed through ● For us - this means a JIRA ticket

Slide 29

Slide 29 text

Mitigations Technical/automatic are STRONGLY preferred over “scar tissue” (i.e. human processes)

Slide 30

Slide 30 text

Postmortem Postmortem One thing we’re going to do next week is a postmortem on our postmortems. (Well - retrospective - but that sounds less recursive) What could we do better?

Slide 31

Slide 31 text

Tools Etsy Morgue - github.com/etsy/morgue Nonviolent Communication - Marshall Rosenberg The Human Side of Postmortems - Dave Zwieback