Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chaos Engineering on AWS

Chaos Engineering on AWS

How to perform chaos engineering on your applications in order to find faults and improve performance and resiliency

Veliswa Boya

July 22, 2021
Tweet

More Decks by Veliswa Boya

Other Decks in Technology

Transcript

  1. Chaos Engineering on AWS Improve resiliency and performance with controlled

    chaos Engineering
  2. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Improve resiliency and performance with controlled chaos Engineering T e c h T a l k : Veliswa Boya Senior Developer Advocate, EMEA (Sub-Saharan Africa) Amazon Web Services
  3. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Agenda Challenges with distributed systems What is chaos engineering and why it is hard Introducing AWS Fault Injection Simulator (FIS) Key features and use cases AWS FIS demo walk-through
  4. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What’s your story?
  5. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Chaos Engineering
  6. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Challenges with distributed systems
  7. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Distributed systems are complex Message Message Reply Reply Server Network Client
  8. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Traditional testing is not enough Unit testing of components Tested in isolation to ensure function meets expectations Functional testing of integrations Each execution path tested to assure expected results TESTING = VERIFYING A KNOWN CONDITION
  9. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S O I T R E S S B S E R V E M P R O V E Chaos Improve resilience and performance Uncover hidden issues Expose blind spots Monitoring, observability, and alarm And more
  10. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Phases of chaos engineering Steady state Hypothesis Run experiment Verify Improve https://docs.aws.amazon.com/fis/latest/userguide/getting-started-planning.html
  11. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Why is chaos engineering difficult? Difficult to ensure safety Stitch together different tools and homemade scripts 1 Agents or libraries required to get started 3 2 Difficult to reproduce “real-world” events (multiple failures at once) 4
  12. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS Fault Injection Simulator
  13. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. chaos engineering Safeguards Real-world conditions Easy to get started
  14. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. No need to integrate multiple tools and homemade scripts or install agents Use the AWS Management Console, AWS API or the AWS CLI Use pre-existing experiment templates and get started in minutes Easily share it with others Easy to get started
  15. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Real-world conditions Run experiments in sequence of events or in parallel Target all levels of the system (host, infrastructure, network, etc.) Real faults injected at the service control plane level!
  16. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Safeguards “Stop conditions” alarms Integration with Amazon CloudWatch Built-in rollbacks Fine-grain IAM controls
  17. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Supported targets for fault injections Amazon EC2 instances Amazon ECS API throttling Amazon EKS Amazon RDS And more to come………
  18. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS Fault Injection Simulator O V E R V I E W AWS Fault Injection Simulator Experiment template AWS Command Line Interface AWS Management Console AWS Identity and Access Management FIS safeguards FIS engine Compute Start experiment Third party AWS Amazon EventBridge Amazon CloudWatch alarms AWS resources Databases Networking Storage Compute Monitoring Stop experiment
  19. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Creating an experiment template ❶
  20. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Creating an experiment template ❷
  21. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Stop instance
  22. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Demo Stop instance with alarms
  23. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS Fault Injection Simulator U S E C A S E – A S P A R T O F A C I / C D P I P E L I N E
  24. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. References https://aws.amazon.com/builders-library/challenges- with-distributed-systems/ https://docs.aws.amazon.com/fis/latest/userguide/what- is.html https://docs.aws.amazon.com/fis/latest/userguide/getting-started-tutorial.html https://docs.aws.amazon.com/fis/latest/userguide/getting-started-iam.html https://medium.com/the-cloud-architect/chaos-engineering- ab0cc9fbd12a
  25. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved.