Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2019-02 Inject Some Chaos Into Your Applications

2019-02 Inject Some Chaos Into Your Applications

We live in an era where technology is evolving fast, the Cloud platforms and tools allow us to deliver more applications and more features faster than ever. How resilient is your distributed system? How to prevent failures? What about a cascade of failures? How to prevent it from happening again and again?

One such approach is Chaos Engineering. In this session, we will review the principles of chaos engineering, review several tools and use them to inject random failures into a demo application and Cloud infrastructure. Our goal is to detect and identify potential issues, make our application resilient and tolerate failures.

About the speaker:

Christophe has over 20+ years of experience as an enterprise application developer. He is co-founder of Phlyt, a cloud native development company. Before Phlyt, he was a software architect, consultant, developer at Helpful, Pivotal and other companies. His hobbies include hiking, traveling and scuba diving.

Toronto Java Users Group

February 28, 2019
Tweet

More Decks by Toronto Java Users Group

Other Decks in Education

Transcript

  1. Intro Christophe Fargette Co founder Phlyt & Cloud Native Developer

    Phlyt Inc. [email protected] www.phlyt.io Github: christophe-f Twitter: _christophe_f
  2. Grafana news • Grafana v6.0 release • New graph components

    • Improved existing graph • React plugin • New data sources (Azure, GCP, Loki)
  3. Agenda • What is Chaos Engineering? • How to inject

    failures safely? • Type of chaos • Review Architecture • Demo ◦ Experiment 1 - Inject Latency ◦ Experiment 2 - Inject Exceptions • Questions
  4. What is Chaos Engineering? • Art of breaking things in

    purpose • Prevent/minimise downtime • Reproduce outage • Test new infrastructure • Test partial deleting Kafka topics • DNS unavailability • Random I/O errors • Maxing out CPU cores
  5. Best practices. How to experiment safely? • Having a resilient

    system • Monitoring & alerts • Start in non prod environment • Plan your experiment (small scope) • Have a rollback plan • Communication • & Communication
  6. Types of chaos • Chaos Monkey (Instances) • Chaos Lemur

    (Bosh VMs) • Chaos Gorilla (AZ) • Chaos Kong (Region) • Security Monkey • More ...
  7. Tooling • Chaos Monkey for Spring Boot • Chaos Toolkit

    (https://chaostoolkit.org/) • Gremlin (https://www.gremlin.com/) • More ...
  8. Experiment 1 - Inject latency • Test the system under

    normal circumstance • Inject latency into Beer Service • Make sure client service is not timing out • Fix it if the experiment fails
  9. Experiment 2 - Inject exceptions • Test the system under

    normal circumstance • Inject exceptions into Beer Service • Make sure client service is resilient • Fix it if the experiment fails
  10. Thanks! Phlyt Inc. [email protected] www.phlyt.io Github: christophe-f Twitter: _christophe_f Github

    Demo: https://github.com/christo phe-f/chaos-monkey “It’s helpful to think of a vaccine or a flu shot. While seemingly counterintuitive, you inject yourself with something harmful in order to prevent a future issue” - From Kolton Andrus, CEO of Gremlin