Performing chaos engineering in a serverless world (CMY301) - AWS re:Invent Las Vegas December 2 2019

B2fefbb30aba7c25bbe0c8819791631a?s=47 Gunnar Grosch
December 02, 2019

Performing chaos engineering in a serverless world (CMY301) - AWS re:Invent Las Vegas December 2 2019

Presented at AWS re:Invent December 2nd 2019.

@gunnargrosch

Link to session recording: https://youtu.be/vbyjpMeYitA

The principles of chaos engineering have been battle-tested for years using traditional infrastructure and containerized microservices. But how do they work with serverless functions and managed services? In this session, we cover the motivations behind chaos engineering, how we perform chaos experiments, and what some of the common weaknesses are that we can test for in our serverless applications. We also run some actual experiments in a serverless AWS environment. Join us as we move from talking about principles to performing real chaos-engineering experiments for serverless.

B2fefbb30aba7c25bbe0c8819791631a?s=128

Gunnar Grosch

December 02, 2019
Tweet

Transcript

  1. 1.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Performing chaos engineering in a serverless world Gunnar Grosch C M Y 3 0 1 Evangelist and Cofounder @gunnargrosch
  2. 2.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. The principles of chaos engineering have been battle- tested for years using traditional infrastructure and containerized microservices. But how do they work with serverless functions and managed services?
  3. 3.

    Agenda What is chaos engineering? Motivations behind chaos engineering Running

    chaos experiments Challenges with serverless Serverless chaos experiments
  4. 4.

    About me Evangelist and cofounder Opsio Background in development, operations,

    and management Organizer of user groups and conferences Advocate for serverless and chaos engineering Father of three
  5. 10.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. “Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” principlesofchaos.org
  6. 12.

    What is chaos engineering? Chaos engineering is about finding the

    weaknesses in a system and fixing them before they break
  7. 15.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. “Nines don’t matter if users aren’t happy” Charity Majors CTO, Honeycomb
  8. 16.

    Motivations behind chaos engineering Are your customers getting the experience

    they should? Is downtime or issues costing you money? Are you confident in your monitoring and alerting? Is your organization ready to handle outages? Are you learning from incidents?
  9. 17.
  10. 18.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. “Everything fails, all the time!” Werner Vogels CTO, Amazon
  11. 19.

    Motivations behind chaos engineering Don’t ask what happens if a

    system fails; ask what happens when it fails
  12. 21.

    Step 1: Define steady state The normal behavior of a

    system over time Business metrics are usually more useful Steady state is not necessarily continuous System metrics and business metrics
  13. 22.

    Step 2: Form your hypothesis Use what ifs to find

    it Scientific ”If… then…” method Always fix known problems first Chaos can be injected at any layer of the stack
  14. 23.

    Step 3: Plan and run your experiment Whiteboard the experiment

    in detail Notify the organization Have a “stop” button ready Contain the blast radius
  15. 24.

    Step 4: Measure and learn Use metrics to prove or

    disprove the hypothesis Did anything unexpected happen? Share your progress and success Was the system resilient to the injected failure?
  16. 25.

    Step 5: Scale up or abort and fix Use the

    learnings to improve Increased scope can reveal new effects With confidence you can scale up
  17. 27.

    Challenges with serverless “Serverless allows you to build and run

    applications and services without thinking about servers”
  18. 29.

    Challenges with serverless No servers to manage Less heavy lifting

    Lots of services to choose from Per function and service configuration More granular architectures
  19. 30.

    © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. “Chaos engineering is a perfect fit for serverless” Gunnar Grosch Thought leader
  20. 33.

    Serverless chaos experiments Inject errors into your code Remove downstream

    services Alter the concurrency of functions Restrict the capacity of tables Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda
  21. 34.

    Serverless chaos experiments Security policy errors CORS configuration errors Service

    configuration errors Function disk space failure Client Amazon S3 Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda
  22. 35.

    Serverless chaos experiments Add latency to your functions • Cold

    starts • Cloud provider issues • Runtime or code issues • Integration issues • Timeouts Client Amazon S3 Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda
  23. 38.

    Serverless chaos demo Client Amazon S3 Amazon API Gateway AWS

    Lambda Amazon DynamoDB AWS Lambda AWS Lambda
  24. 39.

    Client Amazon S3 Amazon API Gateway AWS Lambda Amazon DynamoDB

    AWS Lambda AWS Lambda Serverless chaos demo What if my function takes an extra 300 ms for each invocation? What if my function returns an error code? What if there is an exception in the code? Hypothesis: If we inject failure to functions then my application will use graceful degradation.
  25. 41.

    Summary Everything fails, all the time Serverless doesn’t make your

    application resilient Chaos engineering helps us find weaknesses and fix them Chaos engineering is about building confidence It’s not rocket science; you can do it!
  26. 42.

    Do you want more? Follow @gunnargrosch and @serverlesschaos on Twitter

    Try the Serverless Chaos Demo app: https://demo.serverlesschaos.com YouTube videos and repositories: https://grosch.se Join the chaos engineering slack: http://bit.ly/chaos-eng-slack Visit chaos engineering meetups
  27. 43.

    Thank you! © 2019, Amazon Web Services, Inc. or its

    affiliates. All rights reserved. Gunnar Grosch @gunnargrosch