Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Incident response at Heroku" by Michael Friis

Opbeat
November 20, 2014

"Incident response at Heroku" by Michael Friis

Opbeat

November 20, 2014
Tweet

More Decks by Opbeat

Other Decks in Technology

Transcript

  1. My name is Michael Ex co-founder at AppHarbor (YC W11)

    Product Manager at Heroku I don’t wear a pager anymore
  2. ~150 people total Lots of teams, lots of software, lots

    of change Everything on AWS (except status.heroku.com) Total ownership: “You built it, you get the pager” SRE team monitors system, tracks uptime, etc. Ops at Heroku
  3. “Bash Security Advisory” “Git Push and Build Issue” “Network issues

    within the EU Region” [Internal] “Cloudbees shutting down” Some Heroku incidents
  4. 6. Mitigate problem 7. Coordinate response 8. Manage response 9.

    Post incident clean-up 10. Post-incident follow-up 1. Move to central chat 2. Designate IC 3. Update Status Site 4. Send out sitrep 5. Assess problem Summary https://blog.heroku.com/archives/2014/5/9/incident-response-at-heroku