Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Training Teams with Chaos Engineering On AWS F...

Training Teams with Chaos Engineering On AWS Fargate

Yury Nino

July 05, 2020
Tweet

More Decks by Yury Nino

Other Decks in Technology

Transcript

  1. Motivations • Incident Management • Training Engineering Teams Chaos Engineering

    • Definition and Tools. • History and References. Chaos GameDays with AWS • Chaos Monkey for Spring Boot. • Use case with AWS ECR & Fargate. AGENDA Topics will be covered
  2. Resilience Engineering 4 Essential Capabilities 4 Sets of answers to

    construct resilience profile Actual Respond Factual Learn Critical Monitor Potential Anticipate
  3. Training Engineering Teams • To be able to construct a

    mental representation. • To be able to assess risks and threats as relevant. • To be able to switch from a situation under control. • To be able to maintain a relevant level of confidence. • To be able to make a decision in a complex situation.
  4. Training Engineering Teams • To be able to make an

    intelligent usage of procedures. • To be able to use available technical and human resources. • To be able to manage time and time pressure. • To be able to cooperate with, crew members and other staff. • To be able to properly use and manage information.
  5. What is Chaos Engineering? It is the discipline of experimenting

    failures in production in order to reveal their weakness and to build confidence in their resilience capability. https://principlesofchaos.org/
  6. Chaos Engineering History 2008 Chaos Engineering began at Netflix 2010

    Chaos Monkey & Simian Army were launched 2016 Gremlin born 2019 1 Book Chaos massification 2017 SRE Usenix Chaos IQ born ChaosConf 2018 1 Book Chaos Monkey for Spring Boot 2020 1 Book was published
  7. You can measure your Success with Chaos Engineering by counting

    the number of vulnerabilities Nora Jones
  8. Chaos GameDays GameDays are an interactive, real-world and learning exercises.

    They are designed to give players a chance to put their skills in a technology to test. GameDays were created by Jesse Robbins inspired by his experience & training as a firefighter.
  9. Chaos GameDays First on Call Monitors, triages, and tries to

    mitigate failures caused by the Master of Disaster. Master of Disaster Decides the failure and declares start of incident and attack!!! Team Find and solve the exhibited issues, and write up postmortem.