Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chaos Engineering, Community, and AWS

Chaos Engineering, Community, and AWS

Adrian Hornsby

February 08, 2020
Tweet

More Decks by Adrian Hornsby

Other Decks in Technology

Transcript

  1. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Chaos Engineering, Community and AWS Adrian Hornsby Principal Technical Evangelist Amazon Web Services
  2. What if…? “What if this load balancer breaks?” “What if

    Redis becomes slow?” “What if a host on Cassandra goes away?” ”What if latency increases by 300ms?” ”What if the master database stops?” Make it everyone’s problem!
  3. Failure injection • Start small & build confidence • Application

    level (exceptions, errors, …) • Host level (services, processes, …) • Resource attacks (CPU, memory, IO, …) • Network attacks (dependencies, latency, packet loss…) • AZ attack • Region attack • People attack
  4. Postmortems – COE (Correction of Errors) • What happened? •

    What was the impact on customers and your business? • What were the contributing factors? • What data do you have to support this? • especially metrics and graphs • What lessons did you learn? • What corrective actions are you taking? • Actions items • Related items (trouble tickets etc.)
  5. Thank you! © 2020, Amazon Web Services, Inc. or its

    affiliates. All rights reserved. Adrian Hornsby https://medium.com/@adhorn adhorn