Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[NGINX Meetup] Practicing Thoughtful Controlled Chaos Engineering

Ho Ming Li
February 20, 2019

[NGINX Meetup] Practicing Thoughtful Controlled Chaos Engineering

Why practice Chaos Engineering?
How to practice thoughtful controlled Chaos Engineering?
Also sharing a few findings from GameDays with customers.
#ChaosEngineering

Ho Ming Li

February 20, 2019
Tweet

More Decks by Ho Ming Li

Other Decks in Technology

Transcript

  1. “Computers aren’t the thing. They’re the thing that gets us

    to the thing.” - Halt and Catch Fire Chaos Engineering isn’t the thing. It’s the thing that gets us to Resilience.
  2. Prime Down Amazon’s sale day turns into fail day TechCrunch

    Delta Outage Computer malfunction results in nationwide ground stop NBC Slack Outage Connectivity issues hit workplaces WSJ
  3. GameDay Anatomy of a GameDay Experiment #1 Experiment #2 Attack

    (Inject Failure) Attack Attack Attack ... ... Experiment #3 Attack Attack ...
  4. Observability Get rid of the Fog of War so you

    can clearly see the map and strategize accordingly. Gain Deep Insight with: - Metrics - Logging - Request Tracing
  5. Hypothesis Results Next Step Resilient Resilient Automate Fail Fail Improve

    Resilient Fail Dig Deeper Fail Resilient Dig Deeper
  6. Sort of Expected Attack - Can’t connect to DynamoDB. Expectation

    - Frontend gets a 5XX Error from Backend.
  7. Magnified Wait Attack - Inject small amount of latency between

    app and database Expectation - Users experience delay roughly same as injected latency
  8. I can see this, but I can’t see that Attack

    - Consumer cannot connect to Database Expectation - Consumer can no longer process messages
  9. “An Application” “Edge” DNS, CDN “Front End” LB, API “Back

    End” App/Web Server Queue, RDB, KV DB Search Index “Infrastructure”: Container Kubernetes Virtual Machine Physical Server Storage Network Data Center Geography
  10. • A simple exercise or “box to check” • an

    opportunity to maliciously expose faulty services • A one time event • A high-risk endeavor What a GameDay isn’t:
  11. What a GameDay is and can be • A dedicated

    time to come together to gain insights • The execution of one or more experiments • The proof or disproof of a hypothesis • A time to test, sometimes destructively, the resilience of your application and architecture