Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Don't Panic!

Don't Panic!

How to Cope Now You're Responsible for Production


More and more developers are expected to be on-call, provide out-of-hours support, and respond to production outages. Without much experience handling incidents, it can be scary, intimidating, and feel like being dropped in the deep end. But it doesn’t have to be that way!

Over two years on the FT’s Content team, we’ve transformed our incident response – from a number of mildly terrifying multi-hour outages, to a stable platform where team members feel comfortable on-call.

This talk will provide practical tips and advice on:

- setting up an incident response framework
- what to do when Everything Is On Fire™
- improving things afterwards
and some horror stories of our own…

Required audience experience:
Low – aimed at engineers and teams new to supporting production services

Objective of the talk:
Attendees will leave with practical ideas for setting up a standard incident framework.

I’ll cover:
- how we used to handle incidents at the FT
- what we did to improve
- standard processes to follow during an incident
- what to do afterwards to ensure problems don’t happen again



Euan Finlay

May 16, 2018

Other Decks in Technology


  1. "All incidents are equal, but some incidents are more equal

    than others." George Orwell, probably @efinlay24