Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chaos management during a major incident

Aish
April 24, 2017

Chaos management during a major incident

This is a short talk that I gave about Incident Response and how PagerDuty does it's incident response.
I gave two versions of this talk one at dotScale 2017 in Paris and the other at Full Stack Fest, Barcelona

Aish

April 24, 2017
Tweet

More Decks by Aish

Other Decks in Programming

Transcript

  1. “It was a dark and stormy night; the rain fell

    in torrents — except at occasional intervals, when it was checked by a violent gust of wind…..” - Almost every clueless person on the call
  2. People are your most valuable asset. Don’t burn them out

    doing something that can be automated.
  3. Appendix • The Big Red Button from Flickr by włodi

    CC SA • Upside down from Flickr by Akimasa Harada (CC SA) • Geography from Flicker (CC SA) • Gene Kranz’s image on Public Domain • That’s all folks image on the Public Domain
  4. References • Cook, Richard I. "How complex systems fail." Cognitive

    Technologies Laboratory, University of Chicago. Chicago IL (1998). • Incident Response, PagerDuty Incident Response Docs, https:/ /response.pagerduty.com/ • Allspaw, John. "Fault injection in production." Communications of the ACM 55.10 (2012): 48-52. • Krishnan, Kripa. "Weathering the Unexpected." Commun. ACM 55.11 (2012): 48-52. • Limoncelli, Tom, et al. "Resilience engineering: learning to embrace failure." Communications of the ACM 55.11 (2012): 40-47.