Slide 1

Slide 1 text

Learning from Failure Finding the ‘second story’

Slide 2

Slide 2 text

Pat Hermens Development Manager Coding for ~20 years Father & husband Rotterdam, Netherlands @phermens hermens.com.au

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

5 @phermens Failure? Show of hands please.

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

7 @phermens Failure?

Slide 8

Slide 8 text

“99.88% uptime”

Slide 9

Slide 9 text

9 @phermens Failure?

Slide 10

Slide 10 text

“Second Story”

Slide 11

Slide 11 text

The Field Guide to Understanding ‘Human Error’ Sidney Dekker

Slide 12

Slide 12 text

The Field Guide to Understanding ‘Human Error’ Sidney Dekker Underneath every simple, obvious story about ‘human error’, there is a deeper, more complex story about the organisation.

Slide 13

Slide 13 text

13 @phermens So, what is a ‘Second Story’?

Slide 14

Slide 14 text

14 @phermens So, what is a ‘Second Story’? First Stories Second Stories

Slide 15

Slide 15 text

15 @phermens So, what is a ‘Second Story’? First Stories Second Stories Human error is seen as the cause of failure

Slide 16

Slide 16 text

16 @phermens So, what is a ‘Second Story’? First Stories Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system

Slide 17

Slide 17 text

17 @phermens So, what is a ‘Second Story’? First Stories Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake

Slide 18

Slide 18 text

18 @phermens So, what is a ‘Second Story’? First Stories Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did.

Slide 19

Slide 19 text

19 @phermens So, what is a ‘Second Story’? First Stories Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did. Telling people to be more careful will make the problem go away

Slide 20

Slide 20 text

20 @phermens So, what is a ‘Second Story’? First Stories Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did. Telling people to be more careful will make the problem go away Only by constantly seeking out its vulnerabilities can organisations enhance safety

Slide 21

Slide 21 text

21 @phermens So, what is a ‘Second Story’? First Stories Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did. Telling people to be more careful will make the problem go away Only by constantly seeking out its vulnerabilities can organisations enhance safety

Slide 22

Slide 22 text

22 @phermens So, what is a ‘Second Story’? It is the real story of the complexity in which people work

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

CC BY 2.0 https://www.flickr.com/photos/nrcgov/28751374767

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

29 @phermens Failure?

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

James Thomas - April 15, 2019 Death by PowerPoint: the slide that killed seven people

Slide 34

Slide 34 text

James Thomas - April 15, 2019 Death by PowerPoint: the slide that killed seven people

Slide 35

Slide 35 text

James Thomas - April 15, 2019 Death by PowerPoint: the slide that killed seven people

Slide 36

Slide 36 text

James Thomas - April 15, 2019 Death by PowerPoint: the slide that killed seven people

Slide 37

Slide 37 text

37 @phermens Failure?

Slide 38

Slide 38 text

“99.88% uptime”

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

https://www.ideal.nl/en/latest-news/keyfigures/ ideal-availability/

Slide 41

Slide 41 text

41 @phermens Failure?

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

“Just Culture”

Slide 44

Slide 44 text

https://eur-lex.europa.eu/LexUriServ/LexUriServ.do ?uri=OJ:L:2010:201:0001:0022:EN:PDF

Slide 45

Slide 45 text

https://eur-lex.europa.eu/LexUriServ/LexUriServ.do ?uri=OJ:L:2010:201:0001:0022:EN:PDF

Slide 46

Slide 46 text

https://eur-lex.europa.eu/LexUriServ/LexUriServ.do ?uri=OJ:L:2010:201:0001:0022:EN:PDF

Slide 47

Slide 47 text

47 @phermens Sure, but what is a ‘Just Culture’ in Tech?

Slide 48

Slide 48 text

48 @phermens Sure, but what is a ‘Just Culture’ in Tech? It is a method of investigating mistakes

Slide 49

Slide 49 text

49 @phermens Sure, but what is a ‘Just Culture’ in Tech? It is a method of investigating mistakes in a way that focuses on the situational aspects of a failure’s mechanism,

Slide 50

Slide 50 text

50 @phermens Sure, but what is a ‘Just Culture’ in Tech? It is a method of investigating mistakes in a way that focuses on the situational aspects of a failure’s mechanism, as well as the decision-making process of people proximate to the failure - John Allspaw: https://codeascraft.com/2012/05/22/blameless-postmortems/

Slide 51

Slide 51 text

“Blameless Postmortem”

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

https://landing.google.com/sre/sre-book/chapters/ postmortem-culture/

Slide 54

Slide 54 text

https://www.atlassian.com/software/jira/ops/handbook/ incident-postmortems

Slide 55

Slide 55 text

https://www.etsy.com/progress-report/2015/ blamess-post-mortems

Slide 56

Slide 56 text

https://medium.com/hootsuite-engineering/5-whys-how-we- conduct-blameless-post-mortems-after-something-goes-wrong

Slide 57

Slide 57 text

https://www.pagerduty.com/blog/ postmortem-guide-documentation/

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

John Allspaw, May 2012 - https://codeascraft.com/2012/05/22/blameless-postmortems

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

Failing Forward John C. Maxwell, 2010

Slide 65

Slide 65 text

Failing Forward John C. Maxwell, 2010 Fail early, fail often, but always fail forward.

Slide 66

Slide 66 text

Psychological Conditions of Personal Engagement and Disengagement at Work, Kahn, 1990 (JSTOR)

Slide 67

Slide 67 text

Psychological Conditions of Personal Engagement and Disengagement at Work, Kahn, 1990 (JSTOR) Psychological safety is being able to show and employ one's self without fear of negative consequences of self-image, status or career.

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

The 7 Habits of Highly Effective People Stephen R. Covey, 1989

Slide 72

Slide 72 text

The 7 Habits of Highly Effective People Stephen R. Covey, 1989 Our behavior is a function of our decisions, not our conditions.

Slide 73

Slide 73 text

Finding the ‘second story’

Slide 74

Slide 74 text

74 @phermens Finding the ‘second story’ 3 questions.

Slide 75

Slide 75 text

75 @phermens Finding the ‘second story’ 1. WHAT happened that led to this moment?

Slide 76

Slide 76 text

76 @phermens Finding the ‘second story’ 1. WHAT happened that led to this moment? 2. WHY did this make sense to the operators?

Slide 77

Slide 77 text

77 @phermens Finding the ‘second story’ 1. WHAT happened that led to this moment? 2. WHY did this make sense to the operators? 3. HOW did the operators manage to do this?

Slide 78

Slide 78 text

No content

Slide 79

Slide 79 text

No content

Slide 80

Slide 80 text

80 @phermens WHAT happened? WHY do this? HOW is it possible?

Slide 81

Slide 81 text

No content

Slide 82

Slide 82 text

James Thomas - April 15, 2019 Death by PowerPoint: the slide that killed seven people WHAT happened? WHY do this? HOW is it possible?

Slide 83

Slide 83 text

No content

Slide 84

Slide 84 text

https://www.ideal.nl/en/latest-news/keyfigures/ ideal-availability/ WHAT happened? WHY do this? HOW is it possible?

Slide 85

Slide 85 text

3 actions

Slide 86

Slide 86 text

86 @phermens Find the incentivisation Ask what is responsible, not who.

Slide 87

Slide 87 text

87 @phermens Enable the ‘right’ outcome Seek forward accountability, not backward.

Slide 88

Slide 88 text

88 @phermens Assume positive intent No-one comes to work, aiming to do a bad job.

Slide 89

Slide 89 text

No content

Slide 90

Slide 90 text

No content

Slide 91

Slide 91 text

91 @phermens Cited references (in order) ● “Who Destroyed 3 Mile Island”, a presentation by Nickolas Means at Lead Developer Conference, London 2018 ● 3 Mile Island & 3 Mile Island Accident articles on Wikipedia, plus the related article at the Smithsonian ● Space Shuttle Columbia & Space Shuttle Columbia Disaster articles on Wikipedia ● iDEAL article on Wikipedia, and the Currence ‘Facts & Figures’ site ● “The Field Guide to Understanding ‘Human Error’” by Dr. Sidney Dekker (ISBN: 1472439058) ● Commission Regulation (EU) No 691/2010 of 29 July 2010 ● Google’s “SRE Handbook”, Chapter 15: “Postmortem Culture” ● Atlassian’s “JIRA Ops Incident Handbook”: Incident Postmortems section ● Etsy’s Progress Report from 2015: Blameless Postmortems section ● Hootsuite Engineering’s Medium page: An article on using the 5-Why’s exercise in Postmortems ● PagerDuty’s Blog: An article titled “Introducing the PagerDuty Postmortem Guide” ● John Allspaw’s article on “Blameless Postmortems and a Just Culture” (at Etsy) ● “Failing Forward” by John C. Maxwell (ISBN: 0785288570) ● “Psychological Conditions of Personal Engagement and Disengagement at Work” by William A. Kahn (JSTOR) ● “The 7 Habits of Highly Effective People” by Stephen R. Covey (ISBN: 9781451639612) ● “Death by PowerPoint: the slide that killed seven people”, a blog post by James Thomas ● Fundamental Attribution Error article on Wikipedia

Slide 92

Slide 92 text

92 @phermens Credits/disclaimers ● BIG THANKS to all those that have come before me and enabled me to share THEIR knowledge, achievements, and experiences. --- ● All icons & shapes are from Wikimedia Commons: CC BY-SA 3.0. ● All book covers are copyright their respective owners, utilised under “fair use”. ● All photos are either “public domain”, or rights have been granted. ● Any tweets have been obtained publicly, referenced & hyperlinked. ● References and “sources of inspiration” have been linked on the previous slide.

Slide 93

Slide 93 text

93 @phermens Thanks Vragen? Vraag maar! [email protected]