Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Get Your Story Straight

516fcd20ab7b946f50090ce1d557638c?s=47 j.hand
November 03, 2015

Get Your Story Straight

Gartner has predicted that in 2015, 80% of outages will be caused by people and process issues. Are you considering the Human element when revisiting incidents and outages with your infrastructure? If so, are you approaching it with a blameless mindset? An agenda focused on removing bias in many forms, searching for absolute truth. Do you believe that there is always a root cause to problems or is it more accurate to seek out additional aspects that may have attributed to the incident, especially with regard to the people and processes? Regardless of your approach, the point of a post-mortem is to accurately describe the "story" about what took place in as much detail as possible. The good, the bad, those involved, conversations had, actions taken, related timestamps, who was on-call, etc. You want to know absolutely everything that took place and was related in some degree so that you can review the data and learn from it. How do we ensure that we are asking the right questions and seeking out relevant and important information that will help us understand what took place and ultimately how to become a better team, company, and product as a result? I'll introduce best practices for conducting effective post-mortems and illustrate their importance with statistical data to back up the claims, demonstrating that there are measurable benefits from adopting post-mortems especially those of a "blameless" nature.



November 03, 2015


  1. Get Your Story Straight Jason Hand jasonhand 1 — @jasonhand

    | victorops.com
  2. Outages will happen What are the underlying contributing factors? 2

    — @jasonhand | victorops.com
  3. Fail: Fast, Better, Forward Continuous Experimentation, Learning, & Improvement 3

    — @jasonhand | victorops.com
  4. Systems are really F#%*ing Complicated & Complex 4 — @jasonhand

    | victorops.com
  5. Cynefin Framework 5 — @jasonhand | victorops.com

  6. How Do We Get Better? Learn From incidents, outages, and

    events 6 — @jasonhand | victorops.com
  7. Postmortems Learning Reviews Retrospectives What do they look like? 7

    — @jasonhand | victorops.com
  8. What? A process intended to inform improvements by determining aspects

    that were successful or unsuccessful. When? As soon as feasible after the Incident is resolved. 8 — @jasonhand | victorops.com
  9. Who? Everyone involved and related stakeholders Why? To communicate with

    your team To understand what happened for learning and improving 9 — @jasonhand | victorops.com
  10. How? Basic things to consider for your Learning Review 10

    — @jasonhand | victorops.com
  11. We are here to Learn NOT Blame Learn & Identify

    Improvements 11 — @jasonhand | victorops.com
  12. Avoid the 5 Whys 12 — @jasonhand | victorops.com

  13. Describe > Explain 13 — @jasonhand | victorops.com

  14. Understand how 14 — @jasonhand | victorops.com

  15. Blaming! de-incentives anyone to give details 15 — @jasonhand |

  16. This is an opportunity to learn ... in a safe

    environment 16 — @jasonhand | victorops.com
  17. The Human Element Bias 17 — @jasonhand | victorops.com

  18. 18 — @jasonhand | victorops.com

  19. Small Incremental & Actionable Improvements jasonhand 19 — @jasonhand |

  20. Thank You Jason Hand jasonhand 20 — @jasonhand | victorops.com