Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Easy Recipes for Building Resilience with Chaos Engineering

Yury Nino
August 22, 2020

Easy Recipes for Building Resilience with Chaos Engineering

Yury Nino

August 22, 2020
Tweet

More Decks by Yury Nino

Other Decks in Science

Transcript

  1. • Why are you speaking about cook? • Cook &

    Science & Chaos. • Scientific method. • Cook chaos Recipes. • Ingredients: Cloud, Chaos Tools & Observability. • Learn for the next dinner! Agenda
  2. Chaos Experimenting and Cooking is a combination of art and

    science! Chaos Engineering and Cooking have many things in common ...
  3. A Cookbook is a hands-on guide to exploring a technology!

    A Cookbook is a guide for learning how to practice Chaos Engineering using recipes.
  4. The infrastructure required by a software system can be as

    complex as the software itself. We need a hands-on guide to exploring the world of Chaos! Netflix Twitter
  5. Chaos Engineering It is the discipline of experimenting failures in

    production in order to reveal their weakness and to build confidence in their resilience capability. https://principlesofchaos.org/
  6. Chaos Engineering It is a scientific method that consists in

    specifying and evaluating resilience hypotheses 1) injecting faults in production 2) observing the impact 3) building resilience Long Zhang. A Chaos Engineering System
  7. History 2008 Chaos Engineering was born at Netflix 2010 Chaos

    Monkey & Simian Army were launched 2016 Gremlin was born 2019 Chaos Massification 2017 SRE USenix Chaos IQ ChaosConf 2018 Book Chaos Eng 2020 Book Chaos Eng
  8. 1. Pick a Hypothesis: Recipe! 2. Choose the tools: Ingredients!

    3. Launch an attack: Cook! 4. Notify the Org: Invite! 5. Run the Experiment: Enjoy! 6. Analyze the Results 7. Automate To Cook
  9. Recipe 1 Tools/ Ingredients Gremlin, AWS Hypothesis Cloud can fail

    :O Environment My Home Duration 2 minutes Load 1 request Observability AWS Console Results ???
  10. Recipe 2 Tools/ Ingredients Gremlin, Local Hypothesis Local can fail

    :O Environment My Home Duration 2 minutes Load 1 request Observability Local Console Results ??? https://www.youtube.com/watch?v=PcwdZB_blLc
  11. More Recipes • Introduce latency on security controls. • Disable

    service event logging. • API gateway shutdown. • Unencrypted S3 Bucket. • Disable MFA. • Permission collision in a shared IAM role policy.
  12. Disaster Piece Whenever they launch features or make changes, we

    test the fault tolerance of that new code! In January of 2018, they started a rigorous process of identifying failures that are likely to happen and that we must be able to tolerate, and then purposely causing them to happen in production. This isn’t Chaos Engineering as practiced and evangelized by Netflix. It’s the first step; we call it Disasterpiece Theater. Taken from Chaos Engineering Book 2020
  13. Security Chaos Engineering It is the identification of security control

    failures through proactive experimentation to build confidence in the system’s ability to defend against malicious conditions in production. Security Chaos Engineering Book
  14. There is an ancient proverb that says: It's very difficult

    to find a black cat in a dark room, especially when there is no cat!