Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Duck, duck, goose and the Black Swan: Understanding incidents and your response

Duck, duck, goose and the Black Swan: Understanding incidents and your response

Thanks to the strides made in monitoring over the recent years, it has become easier and easier to watch your production systems like a hawk. But your dashboards are often just peacocks: beautiful and not really useful. In critical situations, you need to get the systems up and flying again. The symptoms of underlying issues are all there, just waiting to be plucked. And the good news is that you’re not alone in figuring it out: ducks fly together… as long as they are coordinated.

Through broad research across PagerDuty’s diverse customer base, and leveraging principles from public emergency response, a framework for understanding operational incident response has emerged. How do you quickly diagnose the severity of an incident? When is it not an incident? How do workflows differ for who to contact and when? What makes an effective Incident Commander? How does collaboration vary, and what is the role of ChatOps? In this session, we will answer questions like these in practical ways that will make a meaningful impact on how you manage incidents, from the duckiest to the fowl-est of them.



Arup Chakrabarti

May 29, 2015

More Decks by Arup Chakrabarti

Other Decks in Technology


  1. #velocityconf @arupchak @cliffehangers Duck, Duck, Goose and the Black Swan

    Understanding Incidents and your Response @arupchak @cliffehangers @pagerduty