During this talk, Gremlin's Jacob Plicque (Chaos Engineer & Resilience Coach, former Senior SRE @ Fanatics) will answer the 5 W's (Who, What, When, Where & Why) of Chaos Engineering.
You’ll Learn:
* The systematic way to begin Chaos Engineering
* The value of running chaos experiments to build more reliable systems and confidence in your remediation processes.
* How other companies are using Chaos Engineering—and the positive results they’ve seen creating reliable distributed systems with CE
Chaos Engineering is NOT:
* Applying failure modes randomly
* Applying failures to your entire infrastructure straight away
* Applying failure on systems without communication
* Creating a one-off fix to be run once and then abandoned
Chaos Engineering IS:
* Applying failures carefully, and with an explicit hypothesis
* Starting small and growing the blast radius
* Communicating plans clearly with all stakeholders
* Designing a well-defined practice that requires constant attention
Kubernetes:
How to improve the availability and reliability of Kubernetes clusters using the discipline of Chaos Engineering
How to use Chaos Engineering to safely inject failure into your applications and nodes in order to detect weaknesses.
Specific Chaos Experiments for you to run on Kubernetes to ensure you’ve designed a reliable system