Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Yury Niño - Chaos Engineering: Building Immunity in Distributed Systems

Yury Niño - Chaos Engineering: Building Immunity in Distributed Systems

DevOps Days GDL 2020 - February 20th

DevOpsDays GDL

February 21, 2020
Tweet

More Decks by DevOpsDays GDL

Other Decks in Technology

Transcript

  1. Why some survive to pandemics? What is the Immune System?

    Artificial Immune Software Systems. Injecting Chaos to build Immunity. Chaos Engineering: What, Why, Who and How? Chaos Example. Agenda
  2. Studies suggest that viral factors and medicines most likely contributed

    to reducing the number of deaths, however the most likely associated with the host's immune system.
  3. The immune system is a complex adaptive network of cells

    and proteins that defends the body against infection. The immune system keeps a record of every germ that has ever defeated so it can recognise and destroy it. Vaccines!
  4. An artificial immune system is an intelligent system that learns

    to recognize relevant patterns that have been seen previously. Using computational techniques, these systems are able to construct patterns detectors and defend the systems of similar attacks.
  5. An artificial immune system is an intelligent system that recognize

    and learns from faults injected previously. Using computational techniques, these systems are able to use resilience patterns to build confidence in the system's capability to withstand turbulence.
  6. The World is Chaotic! Face them with Resilience! Circuit Breaker

    Pattern Bulkhead Pattern Compensating Transactions Health Endpoint Monitoring
  7. What is Chaos Engineering? It is the discipline of experimenting

    failures in production in order to reveal their weakness and to build confidence in their resilience capability. https://principlesofchaos.org/
  8. What is Chaos Engineering? is a tool that we use

    to build immunity in our software systems by injecting harm, like latency, CPU failure, or network black holes, to find and mitigate potential weaknesses. Gremlin
  9. Why Chaos Engineering? Because testing on DEV/STG is not enough.

    Because unpredictable events are bound to happen ON PROD. You need to know the unknown!
  10. 2008 Chaos Engineering began at Netflix 2010 Chaos Monkey was

    launched 2018 A lot of resources for Chaos Engineering. 2014 Role of Chaos Engineer was created. History of Chaos Engineering Kolton Andrus Why Chaos Engineering?
  11. What my mom thinks I do What my friends thinks

    I do What software engineers think I do What I really do Who is a Chaos Engineer? Help service owners to increase their resilience through education, tools and encouragement.
  12. How Chaos Engineering? Applying Chaos Principles Hypothesize about Steady State

    Run Experiments Vary Real-World Events Automate Experiments
  13. Chaos Days are dedicated days for your entire company to

    focus on building resilience instead of new products.
  14. How Chaos Engineering? Running Gamedays! First on Call member sees,

    triages, and tries to mitigate whatever failure the MoD has caused. Master of Disaster Decides the failure and declares start of incident and attack!!! Team will find and solve the issue in less than 75% of the allocated time. Finally they write up a Postmortem! Inspired in the James Burns’s work
  15. That is why I work for a Lab! • We

    practice Engineering • We practice Science • We practice Methods • We practice Chaos Engineering
  16. Screws fall out all the time!!! The world is an

    imperfect place. We can not control the environment! But we can control how to face the virus, bacterias and the failures. Mine :)