Slide 1

Slide 1 text

EMBRACING CHAOS! PAUL OSMAN, UNDER ARMOUR ANA MEDINA, GREMLIN 1 @ana_m_medina | @paulosman | #ChaosConf

Slide 2

Slide 2 text

How did you get started? 2 @ana_m_medina | @paulosman | #ChaosConf

Slide 3

Slide 3 text

3:50 AM 3 @ana_m_medina | @paulosman | #ChaosConf

Slide 4

Slide 4 text

4 @ana_m_medina | @paulosman | #ChaosConf

Slide 5

Slide 5 text

5 @ana_m_medina | @paulosman | #ChaosConf

Slide 6

Slide 6 text

6 PAUL OSMAN @paulosman Sr Manager, SRE at Under Armour Previous lives: PagerDuty, SoundCloud, 500px, Mozilla @ana_m_medina | @paulosman | #ChaosConf

Slide 7

Slide 7 text

7 ANA MEDINA @ana_m_medina Chaos Engineer at Gremlin Previous: Uber, Google, Quicken Loans, SFEFCU @ana_m_medina | @paulosman | #ChaosConf

Slide 8

Slide 8 text

Definitions Chaos Engineering: injecting precise and measured amounts of harm to a system for the purpose of improving the system’s resilience. GameDays: Group activity to perform Chaos Engineering. Incidents that didn’t happen. 8 @ana_m_medina | @paulosman | #ChaosConf

Slide 9

Slide 9 text

9 Blameless Postmortems Monitoring / Alerting Incident Response Observability Chaos Engineering @ana_m_medina | @paulosman | #ChaosConf

Slide 10

Slide 10 text

10 @ana_m_medina | @paulosman | #ChaosConf

Slide 11

Slide 11 text

11 @ana_m_medina | @paulosman | #ChaosConf

Slide 12

Slide 12 text

12 @ana_m_medina | @paulosman | #ChaosConf

Slide 13

Slide 13 text

13 @ana_m_medina | @paulosman | #ChaosConf

Slide 14

Slide 14 text

14 @ana_m_medina | @paulosman | #ChaosConf

Slide 15

Slide 15 text

15 @ana_m_medina | @paulosman | #ChaosConf

Slide 16

Slide 16 text

16 uDestroy -  Inspired by Netflix’s Chaos Monkey -  1000s Microservices -  Bare Metal -  Agent on every host talks to workers -  UI/CLI support @ana_m_medina | @paulosman | #ChaosConf

Slide 17

Slide 17 text

17 @ana_m_medina | @paulosman | #ChaosConf * UI might not look this anymore

Slide 18

Slide 18 text

18 @ana_m_medina | @paulosman | #ChaosConf Donald Sumbry @ Production Engineering meetup (2019)

Slide 19

Slide 19 text

uDestroy On-boarding: Engineering Education classes Adoption: Drills to plan for peak days (Hailstorm) Production: Teams that needed to deploy to production needed to have run a chaos experiment (Gatekeeper) 19 Uber’s internal Chaos Engineering tool @ana_m_medina | @paulosman | #ChaosConf

Slide 20

Slide 20 text

20 Gremlin -  Resilience Company -  Chaos Engineering Platform @ana_m_medina | @paulosman | #ChaosConf

Slide 21

Slide 21 text

21 @ana_m_medina | @paulosman | #ChaosConf Feature Testing GameDays Scheduled Attacks Community Content Resilience Company that offers a Chaos Engineering Platform Chaos Engineering @ Gremlin

Slide 22

Slide 22 text

22 @ana_m_medina | @paulosman | #ChaosConf Roles: Chaos General, Chaos Commander, Chaos Scribe, Chaos Observer 1 week before: Calendar invite for entire company, Slack room, zoom call, prepare GameDay Workbook GameDay: 1 hour for 3 experiments Post GameDay: High Priority Action Items, One Pager Executive Summary Resilience Company that offers a Chaos Engineering Platform GameDays @ Gremlin

Slide 23

Slide 23 text

23 @ana_m_medina | @paulosman | #ChaosConf

Slide 24

Slide 24 text

24 @ana_m_medina | @paulosman | #ChaosConf

Slide 25

Slide 25 text

25 @ana_m_medina | @paulosman | #ChaosConf Key Takeaways Incidents are unplanned investments Chaos Experiments are planned investments ROI is the same: Learning! Turn incidents and chaos into stories feedback loops to measure resilience @ana_m_medina | @paulosman | #ChaosConf

Slide 26

Slide 26 text

Focus on key areas of your business Education GameDays On-Call Training 26 @ana_m_medina | @paulosman | #ChaosConf Key Takeaways On-boarding and identifying teams

Slide 27

Slide 27 text

THANK YOU! @ana_m_medina @paulosman 27