[Chaos Meetup] Running a Successful GameDay

9fccf1fe0a5da1402f23e0566cb7c2ae?s=47 Ho Ming Li
January 24, 2019

[Chaos Meetup] Running a Successful GameDay

Delivered during the meetup "Silicon Valley Chaos Engineering Community > Chaos Engineering at Twilio" at Microsoft Reactor.

9fccf1fe0a5da1402f23e0566cb7c2ae?s=128

Ho Ming Li

January 24, 2019
Tweet

Transcript

  1. None
  2. Ho Ming Li @HoReaL @GremlinInc

  3. None
  4. Chaos Engineering Thoughtful, planned experiments designed to reveal the weakness

    in our systems.
  5. immunity.

  6. GameDay Dedicated time for teams to collaboratively run Chaos Experiments

    to reveal weaknesses in your systems
  7. None
  8. OBJECTIVE

  9. None
  10. CHAOS you resilience

  11. Chaos Engineering isn’t the thing. It’s the thing that gets

    us to Resilience.
  12. Prime Down Amazon’s sale day turns into fail day TechCrunch

    Delta Outage Computer malfunction results in nationwide ground stop NBC Slack Outage Connectivity issues hit workplaces WSJ
  13. $$$$$$$$$$$$$$$$$$$$ Reputation CX Employee Burnout

  14. RESILIENCE

  15. Start your practice Now Run a GameDay

  16. ACTION

  17. None
  18. None
  19. None
  20. None
  21. GameDay Experiment #1 Experiment #2 Attack (Inject Failure) Attack Attack

    Attack ... ... Experiment #3 Attack Attack ...
  22. None
  23. Get rid of the Fog of War so you can

    clearly see the map and strategize accordingly. Gain Deep Insight with: - Metrics - Logging - Request Tracing
  24. None
  25. GameDay is not just a one time event Think about

    the next GameDay Track and Measure Success over time
  26. None
  27. Attack - Can’t connect to DynamoDB. Expectation - Frontend gets

    a 5XX Error from Backend.
  28. Attack - Inject small amount of latency between app and

    database Expectation - Users experience delay roughly same as injected latency
  29. Attack - Consumer cannot connect to Database Expectation - Consumer

    can no longer process messages
  30. Attack - Container dies Expectation - Orchestrator will spawn new

    container
  31. “Edge” DNS, CDN “Front End” LB, API “Back End” App/Web

    Server Queue, RDB, KV DB Search Index “Infrastructure”: Container Kubernetes Virtual Machine Physical Server Storage Network Data Center Geography
  32. Don’t Forget the Human

  33. Reliably Yours tinyurl.com/chaoseng meetup.com/pro/chaos