Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Yury Niño - Chaos Engineering: Building Immunity in Distributed Systems

Yury Niño - Chaos Engineering: Building Immunity in Distributed Systems

DevOps Days GDL 2020 - February 20th

DevOpsDays GDL

February 21, 2020
Tweet

More Decks by DevOpsDays GDL

Other Decks in Technology

Transcript

  1. Chaos
    Engineering
    Building Immunity in
    Distributed Systems
    DevOpsDays Guadalajara
    México February 20th

    View Slide

  2. YURY NIÑO
    DevOps Engineer
    Chaos Engineering Advocate
    @yurynino

    View Slide

  3. Why some survive to pandemics?
    What is the Immune System?
    Artificial Immune Software Systems.
    Injecting Chaos to build Immunity.
    Chaos Engineering: What, Why, Who and How?
    Chaos Example.
    Agenda

    View Slide

  4. View Slide

  5. Why some survive to pandemics?

    View Slide

  6. Studies suggest that viral factors
    and medicines most likely
    contributed to reducing the
    number of deaths, however the
    most likely associated with
    the host's immune
    system.

    View Slide

  7. The immune system is a complex
    adaptive network of cells and proteins that
    defends the body against infection.
    The immune system keeps a record of
    every germ that has ever defeated so it can
    recognise and destroy it.
    Vaccines!

    View Slide

  8. Why are you talking about this in a
    Software Conference?

    View Slide

  9. View Slide

  10. What is an Artificial Immune
    Software System?

    View Slide

  11. An artificial immune system
    is an intelligent system that learns
    to recognize relevant patterns that
    have been seen previously.
    Using computational techniques,
    these systems are able to
    construct patterns detectors and
    defend the systems of similar
    attacks.

    View Slide

  12. An artificial immune system
    is an intelligent system that
    recognize and learns from faults
    injected previously.
    Using computational techniques,
    these systems are able to
    use resilience patterns to build
    confidence in the system's
    capability to withstand turbulence.

    View Slide

  13. The World is Chaotic!
    Face them with Resilience!
    Circuit Breaker Pattern
    Bulkhead Pattern
    Compensating Transactions
    Health Endpoint Monitoring

    View Slide

  14. Chaos Engineering
    What, Why, Who and How

    View Slide

  15. What is Chaos Engineering?
    It is the discipline of experimenting failures
    in production in order to reveal their
    weakness and to build confidence in their
    resilience capability.
    https://principlesofchaos.org/

    View Slide

  16. What is Chaos Engineering?
    is a tool that we use to build immunity in
    our software systems by injecting harm, like
    latency, CPU failure, or network black holes,
    to find and mitigate potential weaknesses.
    Gremlin

    View Slide

  17. Why Chaos Engineering?
    Because testing on DEV/STG is not
    enough.
    Because unpredictable events are
    bound to happen ON PROD.
    You need to know the unknown!

    View Slide

  18. 2008
    Chaos Engineering
    began at Netflix
    2010
    Chaos Monkey was
    launched
    2018
    A lot of resources for Chaos
    Engineering.
    2014
    Role of Chaos Engineer
    was created.
    History of Chaos Engineering
    Kolton Andrus
    Why Chaos Engineering?

    View Slide

  19. What my mom thinks I do What my friends thinks I do
    What software engineers think I do What I really do
    Who is a Chaos Engineer?
    Help service owners to
    increase their resilience
    through education, tools and
    encouragement.

    View Slide

  20. Conclusion
    Who are doing Chaos Engineering?

    View Slide

  21. How Chaos Engineering?
    Applying Chaos Principles
    Hypothesize
    about
    Steady State
    Run
    Experiments
    Vary
    Real-World
    Events
    Automate
    Experiments

    View Slide

  22. Chaos Days are dedicated days for
    your entire company to focus on
    building resilience instead of new
    products.

    View Slide

  23. How Chaos Engineering?
    Running Gamedays!
    First on Call
    member sees, triages, and
    tries to mitigate whatever
    failure the MoD has caused.
    Master of Disaster
    Decides the failure and
    declares start of incident and
    attack!!!
    Team
    will find and solve the issue in
    less than 75% of the allocated
    time. Finally they write up a
    Postmortem!
    Inspired in the James Burns’s work

    View Slide

  24. Demo Time!

    View Slide

  25. Chaos Example

    View Slide

  26. Chaos Example

    View Slide

  27. Chaos Example

    View Slide

  28. Configuration

    View Slide

  29. Configuration

    View Slide

  30. Configuration

    View Slide

  31. Observability

    View Slide

  32. That is why I work for a Lab!
    ● We practice Engineering
    ● We practice Science
    ● We practice Methods
    ● We practice Chaos Engineering

    View Slide

  33. How to start with
    Chaos Engineering?

    View Slide

  34. How to start?
    https://chaosengineering.slack.com
    https://github.com/dastergon/awesome-chaos-e
    ngineering
    https://www.infoq.com/chaos-engineering

    View Slide

  35. View Slide

  36. Screws fall out all the time!!!
    The world is an imperfect place.
    We can not control the environment!
    But we can control how to face
    the virus, bacterias and the failures.
    Mine :)

    View Slide

  37. Thanks for coming!!!
    @yurynino

    View Slide