Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Incident Response Done Right: From First Page to Postmortem

Incident Response Done Right: From First Page to Postmortem

A rambling sort of thing presented at DevOps ATL April 2014.

Will Farrington

April 17, 2014
Tweet

More Decks by Will Farrington

Other Decks in Technology

Transcript

  1. I N C I D E N T R E

    S P O N S E D O N E R I G H T F R O M F I R S T PA G E T O P O S T M O R T E M
  2. W I L L FA R R I N G

    T O N @wfarr on the Internet ! Ops @ GitHub, 2012-now Ops @ Rails Machine, 2009-2011
  3. I N C I D E N T R E

    S P O N S E L E T ’ S TA L K A B O U T
  4. I N C I D E N T E V

    E N T N O T I F I C AT I O N I D E N T I F I C AT I O N R E S O L U T I O N P O S T M O RT E M
  5. I N C I D E N T E V

    E N T N O T I F I C AT I O N I D E N T I F I C AT I O N R E S O L U T I O N P O S T M O RT E M
  6. I N C I D E N T E V

    E N T N O T I F I C AT I O N I D E N T I F I C AT I O N R E S O L U T I O N P O S T M O RT E M
  7. I N C I D E N T E V

    E N T N O T I F I C AT I O N I D E N T I F I C AT I O N R E S O L U T I O N P O S T M O RT E M
  8. W H AT I S T H E P R

    O B L E M ? W H A T Y O U R E A L LY WA N T T O K N O W I S
  9. I N P U T O U T P U

    T L B A P P A U T H D B C A C H E A P I S
  10. I N P U T O U T P U

    T ARCHITECTURE
  11. P R O C E S S H O W

    W O U L D Y O U D E S C R I B E Y O U R
  12. C H E C K L I S T S

    I R E C O M M E N D T H I S B O O K A B O U T
  13. – AT U L G A WA N D E

    “ It is common to misconceive how checklists function in complex lines of work. They are not comprehensive how-to guides, whether for building a skyscraper or getting a plane out of trouble. They are quick and simple tools aimed to buttress the skills of expert professionals.”
  14. A G O O D C H E C K

    L I S T P R E C I S E E F F I C I E N T C O N C I S E P R A C T I C A L E A S Y T O U S E
  15. ENGINE FAILURE DURING FLIGHT • Airspeed ! • Fuel Shutoff

    Valve • Fuel Selector • Auxiliary Fuel Pump • Mixture • Ignition Switch FLY THE AIRPLANE! 68 KIAS ! ON (IN) BOTH ON RICH BOTH
  16. Checklists transform the process of identifying problems in a rapidly

    degrading situation from being haphazard and error-prone to methodical and organized.
  17. I N C I D E N T E V

    E N T N O T I F I C AT I O N I D E N T I F I C AT I O N R E S O L U T I O N P O S T M O RT E M
  18. – AT U L G A WA N D E

    ““That’s not my problem” is possibly the worst thing people can think.”
  19. F I X T H E P R O B

    L E M I T ’ S T I M E T O
  20. C H E C K L I S T S

    I R E C O M M E N D T H I S B O O K A B O U T ( A G A I N )
  21. ENGINE FAILURE DURING FLIGHT • Airspeed ! • Fuel Shutoff

    Valve • Fuel Selector • Auxiliary Fuel Pump • Mixture • Ignition Switch FLY THE AIRPLANE! 68 KIAS ! ON (IN) BOTH ON RICH BOTH
  22. ELASTICSEARCH: SPLIT BRAIN • circuit break search OFF ! •

    disable allocation • get cluster state • shutdown all nodes w/ API • start the cluster • wait for all members • enable allocation UPDATE THE STATUS!
  23. MTTR is the name of the game. ! Reduce it

    safely, by whatever means.
  24. Take 30s at the start of the hangout to make

    sure everyone knows who’s doing what. ! Make sure you say what your role is.
  25. Atul Gawande found that the simple act of a surgical

    team introducing themselves to one another before an operation increased the feeling of teamwork and efficacy across the team. ! It also enabled people to speak up when they see something.
  26. Terrible things happen and if you don’t communicate to your

    customers, they’ll assume the worst.
  27. I N C I D E N T E V

    E N T N O T I F I C AT I O N I D E N T I F I C AT I O N R E S O L U T I O N P O S T M O RT E M
  28. N O W L E T ’ S TA L

    K A B O U T I T W E ’ V E F I X E D T H E P R O B L E M
  29. – J E S S E R O B B

    I N S “Regular postmortems are the closest thing you have to employing a scientific method to the complicated problem of web operations. By gathering real evidence, you can focus your limited resources on solving the issues that are actually causing you problems.”
  30. A G O O D P O S T M

    O R T E M D E S C R I P T I O N O F T H E I N C I D E N T D E S C R I P T I O N O F T H E R O O T C A U S E D E S C R I P T I O N O F T H E R E S O L U T I O N P R O C E S S T I M E L I N E O F T H E I N C I D E N T H O W T H E I N C I D E N T A F F E C T E D C U S T O M E R S R E M E D I AT I O N S O R C O R R E C T I V E A C T I O N S
  31. A G O O D P O S T M

    O R T E M D E S C R I P T I O N O F T H E I N C I D E N T D E S C R I P T I O N O F T H E R O O T C A U S E D E S C R I P T I O N O F T H E R E S O L U T I O N P R O C E S S T I M E L I N E O F T H E I N C I D E N T H O W T H E I N C I D E N T A F F E C T E D C U S T O M E R S R E M E D I AT I O N S O R C O R R E C T I V E A C T I O N S
  32. T R U S T A N D H O

    N E S T Y A G O O D P O S T M O R T E M R E Q U I R E S
  33. Blame and punitive measures cannot enter the realm of possibility.

    ! Otherwise, you create a conflict of interest about honesty.
  34. H U M A N E R R O R

    I R E C O M M E N D T H I S B O O K A B O U T
  35. – S I D N E Y D E K

    K E R “Different perspectives on a sequence of events: Looking from the outside and hindsight you have knowledge of the outcome and dangers involved. From the inside, you may have neither.”
  36. Let’s entertain the thought that we don’t hire mindless automatons.

    ! We hire people who can and do think, and who care.
  37. Faced with a complex problem in a high-pressure scenario, with

    a process ill-equipped to effectively help them navigate the situation, their actions were entirely logical and yet doomed to fail.
  38. A G O O D P O S T M

    O R T E M D E S C R I P T I O N O F T H E I N C I D E N T D E S C R I P T I O N O F T H E R O O T C A U S E D E S C R I P T I O N O F T H E R E S O L U T I O N P R O C E S S T I M E L I N E O F T H E I N C I D E N T H O W T H E I N C I D E N T A F F E C T E D C U S T O M E R S R E M E D I AT I O N S O R C O R R E C T I V E A C T I O N S
  39. S E T T I N G Y O U

    R S E L F U P F O R FA I L U R E T H E M O S T C O M M O N P R O B L E M I S
  40. Your corrective actions should be aimed at figuring out how

    your process made the failure possible, and fixing the process.
  41. P U B L I C P O S T

    M O R T E M S
  42. B U L L S H I T I R

    E C O M M E N D T H I S B L O G P O S T A B O U T
  43. – D AV I D H E I N E

    M E I E R H A N S S O N “The most important part of saying you’re sorry is to project some real empathy. If you can’t put yourself in your users’ shoes, then it’s going to out wrong.”