[PD Summit 2019] Best Practices to Kickstart your Chaos Engineering Journey

9fccf1fe0a5da1402f23e0566cb7c2ae?s=47 Ho Ming Li
September 25, 2019

[PD Summit 2019] Best Practices to Kickstart your Chaos Engineering Journey

Debunk Myths:
Chaos is in the name, but it can be controlled.
It doesn’t have to be dangerous. Can be thoughtful and safe.
It’s useful not only in production, but also good in staging and other earlier environments.
It doesn’t have to be a giant cross-team entire company exercise.
You can start small.

Experiments:
Shutdown your stateless hosts. See that they come back up healthy.
What happens when you drop connection to your data store? Let’s verify the retry and timeout behaviors.
Auto-scaling may sound easy, but there are nuances that you have to really experience it in order to know.
Ensure alerts are triggered appropriately, and that the receiver has sufficient information to work with. Take signal to noise into consideration.
Past Incidents, with your team, with a third party dependency. Are good lessons to share with your team. Re-run those scenarios.

9fccf1fe0a5da1402f23e0566cb7c2ae?s=128

Ho Ming Li

September 25, 2019
Tweet