Finding the Second Story - Learning from Failure

Finding the Second Story - Learning from Failure

Behind every classic failure, there is someone to blame, right?

What if this isn’t the case? What happens when we make it explicitly NOT the case?

Join Pat as we try to understand human error, change the focus of our post-mortems, and discover what we can do to make our teams a safer place to deliver better software, faster, while learning from our (inevitable) mistakes.

C1d39cd3809e83af7958c6f45fa3c4f2?s=128

Pat Hermens

May 16, 2019
Tweet

Transcript

  1. Learning from Failure Finding the ‘second story’

  2. Pat Hermens Development Manager Coding for ~20 years Father &

    husband Rotterdam, Netherlands @phermens hermens.com.au
  3. None
  4. None
  5. 5 @phermens Failure? Show of hands please.

  6. None
  7. 7 @phermens Failure?

  8. “99.88% uptime”

  9. 9 @phermens Failure?

  10. “Second Story”

  11. The Field Guide to Understanding ‘Human Error’ Sidney Dekker

  12. The Field Guide to Understanding ‘Human Error’ Sidney Dekker Underneath

    every simple, obvious story about ‘human error’, there is a deeper, more complex story about the organisation.
  13. 13 @phermens So, what is a ‘Second Story’?

  14. 14 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories
  15. 15 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure
  16. 16 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system
  17. 17 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake
  18. 18 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did.
  19. 19 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did. Telling people to be more careful will make the problem go away
  20. 20 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did. Telling people to be more careful will make the problem go away Only by constantly seeking out its vulnerabilities can organisations enhance safety
  21. 21 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did. Telling people to be more careful will make the problem go away Only by constantly seeking out its vulnerabilities can organisations enhance safety
  22. 22 @phermens So, what is a ‘Second Story’? It is

    the real story of the complexity in which people work
  23. None
  24. None
  25. None
  26. None
  27. CC BY 2.0 https://www.flickr.com/photos/nrcgov/28751374767

  28. None
  29. 29 @phermens Failure?

  30. None
  31. None
  32. None
  33. James Thomas - April 15, 2019 Death by PowerPoint: the

    slide that killed seven people
  34. James Thomas - April 15, 2019 Death by PowerPoint: the

    slide that killed seven people
  35. James Thomas - April 15, 2019 Death by PowerPoint: the

    slide that killed seven people
  36. James Thomas - April 15, 2019 Death by PowerPoint: the

    slide that killed seven people
  37. 37 @phermens Failure?

  38. “99.88% uptime”

  39. None
  40. https://www.ideal.nl/en/latest-news/keyfigures/ ideal-availability/

  41. 41 @phermens Failure?

  42. None
  43. “Just Culture”

  44. https://eur-lex.europa.eu/LexUriServ/LexUriServ.do ?uri=OJ:L:2010:201:0001:0022:EN:PDF

  45. https://eur-lex.europa.eu/LexUriServ/LexUriServ.do ?uri=OJ:L:2010:201:0001:0022:EN:PDF

  46. https://eur-lex.europa.eu/LexUriServ/LexUriServ.do ?uri=OJ:L:2010:201:0001:0022:EN:PDF

  47. 47 @phermens Sure, but what is a ‘Just Culture’ in

    Tech?
  48. 48 @phermens Sure, but what is a ‘Just Culture’ in

    Tech? It is a method of investigating mistakes
  49. 49 @phermens Sure, but what is a ‘Just Culture’ in

    Tech? It is a method of investigating mistakes in a way that focuses on the situational aspects of a failure’s mechanism,
  50. 50 @phermens Sure, but what is a ‘Just Culture’ in

    Tech? It is a method of investigating mistakes in a way that focuses on the situational aspects of a failure’s mechanism, as well as the decision-making process of people proximate to the failure - John Allspaw: https://codeascraft.com/2012/05/22/blameless-postmortems/
  51. “Blameless Postmortem”

  52. None
  53. https://landing.google.com/sre/sre-book/chapters/ postmortem-culture/

  54. https://www.atlassian.com/software/jira/ops/handbook/ incident-postmortems

  55. https://www.etsy.com/progress-report/2015/ blamess-post-mortems

  56. https://medium.com/hootsuite-engineering/5-whys-how-we- conduct-blameless-post-mortems-after-something-goes-wrong

  57. https://www.pagerduty.com/blog/ postmortem-guide-documentation/

  58. None
  59. John Allspaw, May 2012 - https://codeascraft.com/2012/05/22/blameless-postmortems

  60. None
  61. None
  62. None
  63. None
  64. Failing Forward John C. Maxwell, 2010

  65. Failing Forward John C. Maxwell, 2010 Fail early, fail often,

    but always fail forward.
  66. Psychological Conditions of Personal Engagement and Disengagement at Work, Kahn,

    1990 (JSTOR)
  67. Psychological Conditions of Personal Engagement and Disengagement at Work, Kahn,

    1990 (JSTOR) Psychological safety is being able to show and employ one's self without fear of negative consequences of self-image, status or career.
  68. None
  69. None
  70. None
  71. The 7 Habits of Highly Effective People Stephen R. Covey,

    1989
  72. The 7 Habits of Highly Effective People Stephen R. Covey,

    1989 Our behavior is a function of our decisions, not our conditions.
  73. Finding the ‘second story’

  74. 74 @phermens Finding the ‘second story’ 3 questions.

  75. 75 @phermens Finding the ‘second story’ 1. WHAT happened that

    led to this moment?
  76. 76 @phermens Finding the ‘second story’ 1. WHAT happened that

    led to this moment? 2. WHY did this make sense to the operators?
  77. 77 @phermens Finding the ‘second story’ 1. WHAT happened that

    led to this moment? 2. WHY did this make sense to the operators? 3. HOW did the operators manage to do this?
  78. None
  79. None
  80. 80 @phermens WHAT happened? WHY do this? HOW is it

    possible?
  81. None
  82. James Thomas - April 15, 2019 Death by PowerPoint: the

    slide that killed seven people WHAT happened? WHY do this? HOW is it possible?
  83. None
  84. https://www.ideal.nl/en/latest-news/keyfigures/ ideal-availability/ WHAT happened? WHY do this? HOW is it

    possible?
  85. 3 actions

  86. 86 @phermens Find the incentivisation Ask what is responsible, not

    who.
  87. 87 @phermens Enable the ‘right’ outcome Seek forward accountability, not

    backward.
  88. 88 @phermens Assume positive intent No-one comes to work, aiming

    to do a bad job.
  89. None
  90. None
  91. 91 @phermens Cited references (in order) • “Who Destroyed 3

    Mile Island”, a presentation by Nickolas Means at Lead Developer Conference, London 2018 • 3 Mile Island & 3 Mile Island Accident articles on Wikipedia, plus the related article at the Smithsonian • Space Shuttle Columbia & Space Shuttle Columbia Disaster articles on Wikipedia • iDEAL article on Wikipedia, and the Currence ‘Facts & Figures’ site • “The Field Guide to Understanding ‘Human Error’” by Dr. Sidney Dekker (ISBN: 1472439058) • Commission Regulation (EU) No 691/2010 of 29 July 2010 • Google’s “SRE Handbook”, Chapter 15: “Postmortem Culture” • Atlassian’s “JIRA Ops Incident Handbook”: Incident Postmortems section • Etsy’s Progress Report from 2015: Blameless Postmortems section • Hootsuite Engineering’s Medium page: An article on using the 5-Why’s exercise in Postmortems • PagerDuty’s Blog: An article titled “Introducing the PagerDuty Postmortem Guide” • John Allspaw’s article on “Blameless Postmortems and a Just Culture” (at Etsy) • “Failing Forward” by John C. Maxwell (ISBN: 0785288570) • “Psychological Conditions of Personal Engagement and Disengagement at Work” by William A. Kahn (JSTOR) • “The 7 Habits of Highly Effective People” by Stephen R. Covey (ISBN: 9781451639612) • “Death by PowerPoint: the slide that killed seven people”, a blog post by James Thomas • Fundamental Attribution Error article on Wikipedia
  92. 92 @phermens Credits/disclaimers • BIG THANKS to all those that

    have come before me and enabled me to share THEIR knowledge, achievements, and experiences. --- • All icons & shapes are from Wikimedia Commons: CC BY-SA 3.0. • All book covers are copyright their respective owners, utilised under “fair use”. • All photos are either “public domain”, or rights have been granted. • Any tweets have been obtained publicly, referenced & hyperlinked. • References and “sources of inspiration” have been linked on the previous slide.
  93. 93 @phermens Thanks Vragen? Vraag maar! p@hermens.com.au