Upgrade to Pro — share decks privately, control downloads, hide ads and more …


Chaos Conf
September 26, 2019


Paul Osman, Under Armour
Ana Medina, Gremlin

Practical steps for getting started with Chaos Engineering. Using concrete examples we'll cover onboarding teams onto a Chaos Engineering platform, identifying teams that are ready to do GameDays and creating feedback loops to measure resilience.

Chaos Conf

September 26, 2019

More Decks by Chaos Conf

Other Decks in Technology



    @ana_m_medina | @paulosman | #ChaosConf
  2. 6 PAUL OSMAN @paulosman Sr Manager, SRE at Under Armour

    Previous lives: PagerDuty, SoundCloud, 500px, Mozilla @ana_m_medina | @paulosman | #ChaosConf
  3. 7 ANA MEDINA @ana_m_medina Chaos Engineer at Gremlin Previous: Uber,

    Google, Quicken Loans, SFEFCU @ana_m_medina | @paulosman | #ChaosConf
  4. Definitions Chaos Engineering: injecting precise and measured amounts of harm

    to a system for the purpose of improving the system’s resilience. GameDays: Group activity to perform Chaos Engineering. Incidents that didn’t happen. 8 @ana_m_medina | @paulosman | #ChaosConf
  5. 16 uDestroy -  Inspired by Netflix’s Chaos Monkey -  1000s

    Microservices -  Bare Metal -  Agent on every host talks to workers -  UI/CLI support @ana_m_medina | @paulosman | #ChaosConf
  6. uDestroy On-boarding: Engineering Education classes Adoption: Drills to plan for

    peak days (Hailstorm) Production: Teams that needed to deploy to production needed to have run a chaos experiment (Gatekeeper) 19 Uber’s internal Chaos Engineering tool @ana_m_medina | @paulosman | #ChaosConf
  7. 21 @ana_m_medina | @paulosman | #ChaosConf Feature Testing GameDays Scheduled

    Attacks Community Content Resilience Company that offers a Chaos Engineering Platform Chaos Engineering @ Gremlin
  8. 22 @ana_m_medina | @paulosman | #ChaosConf Roles: Chaos General, Chaos

    Commander, Chaos Scribe, Chaos Observer 1 week before: Calendar invite for entire company, Slack room, zoom call, prepare GameDay Workbook GameDay: 1 hour for 3 experiments Post GameDay: High Priority Action Items, One Pager Executive Summary Resilience Company that offers a Chaos Engineering Platform GameDays @ Gremlin
  9. 25 @ana_m_medina | @paulosman | #ChaosConf Key Takeaways Incidents are

    unplanned investments Chaos Experiments are planned investments ROI is the same: Learning! Turn incidents and chaos into stories feedback loops to measure resilience @ana_m_medina | @paulosman | #ChaosConf
  10. Focus on key areas of your business Education GameDays On-Call

    Training 26 @ana_m_medina | @paulosman | #ChaosConf Key Takeaways On-boarding and identifying teams