Finding the Second Story - Learning from Failure

Finding the Second Story - Learning from Failure

Behind every classic failure, there is someone to blame, right?

What if this isn’t the case? What happens when we make it explicitly NOT the case?

Join Pat as we try to understand human error, change the focus of our post-mortems, and discover what we can do to make our teams a safer place to deliver better software, faster, while learning from our (inevitable) mistakes.

C1d39cd3809e83af7958c6f45fa3c4f2?s=128

Pat Hermens

May 16, 2019
Tweet

Transcript

  1. 2.

    Pat Hermens Development Manager Coding for ~20 years Father &

    husband Rotterdam, Netherlands @phermens hermens.com.au
  2. 3.
  3. 4.
  4. 6.
  5. 12.

    The Field Guide to Understanding ‘Human Error’ Sidney Dekker Underneath

    every simple, obvious story about ‘human error’, there is a deeper, more complex story about the organisation.
  6. 15.

    15 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure
  7. 16.

    16 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system
  8. 17.

    17 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake
  9. 18.

    18 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did.
  10. 19.

    19 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did. Telling people to be more careful will make the problem go away
  11. 20.

    20 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did. Telling people to be more careful will make the problem go away Only by constantly seeking out its vulnerabilities can organisations enhance safety
  12. 21.

    21 @phermens So, what is a ‘Second Story’? First Stories

    Second Stories Human error is seen as the cause of failure Human error is seen as the effect of systemic vulnerabilities deeper inside the organisation or system Saying what people SHOULD have done is a satisfying way to describe THEIR mistake Saying what people SHOULD have done, doesn’t explain WHY it made sense for them to do what they did. Telling people to be more careful will make the problem go away Only by constantly seeking out its vulnerabilities can organisations enhance safety
  13. 22.

    22 @phermens So, what is a ‘Second Story’? It is

    the real story of the complexity in which people work
  14. 23.
  15. 24.
  16. 25.
  17. 26.
  18. 28.
  19. 30.
  20. 31.
  21. 32.
  22. 33.
  23. 34.
  24. 35.
  25. 36.
  26. 39.
  27. 42.
  28. 48.

    48 @phermens Sure, but what is a ‘Just Culture’ in

    Tech? It is a method of investigating mistakes
  29. 49.

    49 @phermens Sure, but what is a ‘Just Culture’ in

    Tech? It is a method of investigating mistakes in a way that focuses on the situational aspects of a failure’s mechanism,
  30. 50.

    50 @phermens Sure, but what is a ‘Just Culture’ in

    Tech? It is a method of investigating mistakes in a way that focuses on the situational aspects of a failure’s mechanism, as well as the decision-making process of people proximate to the failure - John Allspaw: https://codeascraft.com/2012/05/22/blameless-postmortems/
  31. 52.
  32. 58.
  33. 60.
  34. 61.
  35. 62.
  36. 63.
  37. 67.

    Psychological Conditions of Personal Engagement and Disengagement at Work, Kahn,

    1990 (JSTOR) Psychological safety is being able to show and employ one's self without fear of negative consequences of self-image, status or career.
  38. 68.
  39. 69.
  40. 70.
  41. 72.

    The 7 Habits of Highly Effective People Stephen R. Covey,

    1989 Our behavior is a function of our decisions, not our conditions.
  42. 76.

    76 @phermens Finding the ‘second story’ 1. WHAT happened that

    led to this moment? 2. WHY did this make sense to the operators?
  43. 77.

    77 @phermens Finding the ‘second story’ 1. WHAT happened that

    led to this moment? 2. WHY did this make sense to the operators? 3. HOW did the operators manage to do this?
  44. 78.
  45. 79.
  46. 81.
  47. 82.

    James Thomas - April 15, 2019 Death by PowerPoint: the

    slide that killed seven people WHAT happened? WHY do this? HOW is it possible?
  48. 83.
  49. 85.
  50. 89.
  51. 90.
  52. 91.

    91 @phermens Cited references (in order) • “Who Destroyed 3

    Mile Island”, a presentation by Nickolas Means at Lead Developer Conference, London 2018 • 3 Mile Island & 3 Mile Island Accident articles on Wikipedia, plus the related article at the Smithsonian • Space Shuttle Columbia & Space Shuttle Columbia Disaster articles on Wikipedia • iDEAL article on Wikipedia, and the Currence ‘Facts & Figures’ site • “The Field Guide to Understanding ‘Human Error’” by Dr. Sidney Dekker (ISBN: 1472439058) • Commission Regulation (EU) No 691/2010 of 29 July 2010 • Google’s “SRE Handbook”, Chapter 15: “Postmortem Culture” • Atlassian’s “JIRA Ops Incident Handbook”: Incident Postmortems section • Etsy’s Progress Report from 2015: Blameless Postmortems section • Hootsuite Engineering’s Medium page: An article on using the 5-Why’s exercise in Postmortems • PagerDuty’s Blog: An article titled “Introducing the PagerDuty Postmortem Guide” • John Allspaw’s article on “Blameless Postmortems and a Just Culture” (at Etsy) • “Failing Forward” by John C. Maxwell (ISBN: 0785288570) • “Psychological Conditions of Personal Engagement and Disengagement at Work” by William A. Kahn (JSTOR) • “The 7 Habits of Highly Effective People” by Stephen R. Covey (ISBN: 9781451639612) • “Death by PowerPoint: the slide that killed seven people”, a blog post by James Thomas • Fundamental Attribution Error article on Wikipedia
  53. 92.

    92 @phermens Credits/disclaimers • BIG THANKS to all those that

    have come before me and enabled me to share THEIR knowledge, achievements, and experiences. --- • All icons & shapes are from Wikimedia Commons: CC BY-SA 3.0. • All book covers are copyright their respective owners, utilised under “fair use”. • All photos are either “public domain”, or rights have been granted. • Any tweets have been obtained publicly, referenced & hyperlinked. • References and “sources of inspiration” have been linked on the previous slide.