Four years of breaking things in production, on purpose.

7067ff85573929e5257aa9e9c1069de9?s=47 Eric Sigler
November 09, 2017

Four years of breaking things in production, on purpose.

Presented at Chaos Day Twin Cities, November 2017.

7067ff85573929e5257aa9e9c1069de9?s=128

Eric Sigler

November 09, 2017
Tweet

Transcript

  1. Eric Sigler, Head of DevOps, PagerDuty @esigler Four years of

    breaking things in production, on purpose.
  2. @esigler Obligatory disclaimer: This is what works for us. Take

    away ideas, not dogmas.
  3. @esigler

  4. @esigler 2013: Every Friday, 1 hour. 2013 2014 2015 2016

    2017
  5. @esigler 2013 2014 2015 2016 2017

  6. None
  7. @esigler 2014: Expanding Scope 2013 2014 2015 2016 2017

  8. @esigler 2013 2014 2015 2016 2017

  9. @esigler 2015: Automation 2013 2014 2015 2016 2017

  10. @esigler 2013 2014 2015 2016 2017

  11. @esigler 2013 2014 2015 2016 2017

  12. @esigler 2016: Adding In Randomness 2013 2014 2015 2016 2017

  13. @esigler 2013 2014 2015 2016 2017

  14. @esigler Also 2016: Putting It All Together 2013 2014 2015

    2016 2017
  15. @esigler 2013 2014 2015 2016 2017

  16. @esigler 2017: Distributing Knowledge 2013 2014 2015 2016 2017

  17. @esigler 2013 2014 2015 2016 2017

  18. @esigler Failure Friday sessions: 133 Faults injected: 708 Fault injections

    resulting in a public postmortem: 3
  19. @esigler Simulated full AZ failures: 4 Simulated full Region failures:

    3 Simulated partial Disaster Recovery: 2
  20. @esigler Tickets created from Failure Friday: over 225 Distinct services

    that had faults injected: 49
  21. @esigler

  22. @esigler Optimized for learning first, tooling second Built the toolchain

    to enable other teams Distributed chaos engineering knowledge
  23. @esigler