EMBRACING CHAOS!

777bc656cb5c276519c2d52951d6ebca?s=47 Chaos Conf
September 26, 2019

EMBRACING CHAOS!

Paul Osman, Under Armour
Ana Medina, Gremlin

Practical steps for getting started with Chaos Engineering. Using concrete examples we'll cover onboarding teams onto a Chaos Engineering platform, identifying teams that are ready to do GameDays and creating feedback loops to measure resilience.

777bc656cb5c276519c2d52951d6ebca?s=128

Chaos Conf

September 26, 2019
Tweet

Transcript

  1. EMBRACING CHAOS! PAUL OSMAN, UNDER ARMOUR ANA MEDINA, GREMLIN 1

    @ana_m_medina | @paulosman | #ChaosConf
  2. How did you get started? 2 @ana_m_medina | @paulosman |

    #ChaosConf
  3. 3:50 AM 3 @ana_m_medina | @paulosman | #ChaosConf

  4. 4 @ana_m_medina | @paulosman | #ChaosConf

  5. 5 @ana_m_medina | @paulosman | #ChaosConf

  6. 6 PAUL OSMAN @paulosman Sr Manager, SRE at Under Armour

    Previous lives: PagerDuty, SoundCloud, 500px, Mozilla @ana_m_medina | @paulosman | #ChaosConf
  7. 7 ANA MEDINA @ana_m_medina Chaos Engineer at Gremlin Previous: Uber,

    Google, Quicken Loans, SFEFCU @ana_m_medina | @paulosman | #ChaosConf
  8. Definitions Chaos Engineering: injecting precise and measured amounts of harm

    to a system for the purpose of improving the system’s resilience. GameDays: Group activity to perform Chaos Engineering. Incidents that didn’t happen. 8 @ana_m_medina | @paulosman | #ChaosConf
  9. 9 Blameless Postmortems Monitoring / Alerting Incident Response Observability Chaos

    Engineering @ana_m_medina | @paulosman | #ChaosConf
  10. 10 @ana_m_medina | @paulosman | #ChaosConf

  11. 11 @ana_m_medina | @paulosman | #ChaosConf

  12. 12 @ana_m_medina | @paulosman | #ChaosConf

  13. 13 @ana_m_medina | @paulosman | #ChaosConf

  14. 14 @ana_m_medina | @paulosman | #ChaosConf

  15. 15 @ana_m_medina | @paulosman | #ChaosConf

  16. 16 uDestroy -  Inspired by Netflix’s Chaos Monkey -  1000s

    Microservices -  Bare Metal -  Agent on every host talks to workers -  UI/CLI support @ana_m_medina | @paulosman | #ChaosConf
  17. 17 @ana_m_medina | @paulosman | #ChaosConf * UI might not

    look this anymore
  18. 18 @ana_m_medina | @paulosman | #ChaosConf Donald Sumbry @ Production

    Engineering meetup (2019)
  19. uDestroy On-boarding: Engineering Education classes Adoption: Drills to plan for

    peak days (Hailstorm) Production: Teams that needed to deploy to production needed to have run a chaos experiment (Gatekeeper) 19 Uber’s internal Chaos Engineering tool @ana_m_medina | @paulosman | #ChaosConf
  20. 20 Gremlin -  Resilience Company -  Chaos Engineering Platform @ana_m_medina

    | @paulosman | #ChaosConf
  21. 21 @ana_m_medina | @paulosman | #ChaosConf Feature Testing GameDays Scheduled

    Attacks Community Content Resilience Company that offers a Chaos Engineering Platform Chaos Engineering @ Gremlin
  22. 22 @ana_m_medina | @paulosman | #ChaosConf Roles: Chaos General, Chaos

    Commander, Chaos Scribe, Chaos Observer 1 week before: Calendar invite for entire company, Slack room, zoom call, prepare GameDay Workbook GameDay: 1 hour for 3 experiments Post GameDay: High Priority Action Items, One Pager Executive Summary Resilience Company that offers a Chaos Engineering Platform GameDays @ Gremlin
  23. 23 @ana_m_medina | @paulosman | #ChaosConf

  24. 24 @ana_m_medina | @paulosman | #ChaosConf

  25. 25 @ana_m_medina | @paulosman | #ChaosConf Key Takeaways Incidents are

    unplanned investments Chaos Experiments are planned investments ROI is the same: Learning! Turn incidents and chaos into stories feedback loops to measure resilience @ana_m_medina | @paulosman | #ChaosConf
  26. Focus on key areas of your business Education GameDays On-Call

    Training 26 @ana_m_medina | @paulosman | #ChaosConf Key Takeaways On-boarding and identifying teams
  27. THANK YOU! @ana_m_medina @paulosman 27