Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performing chaos in a serverless world - Agile India 2020 October 16 2020

Performing chaos in a serverless world - Agile India 2020 October 16 2020

Presented at Agile India 2020, October 16th, 2020.

@gunnargrosch
Serverless Chaos Demo
failure-lambda
aws-lambda-chaos-injection
circuitbreaker-lambda

Chaos engineering is the practice of hypothesis testing through planned experiments to gain a better understanding of a system’s behavior. The principles of chaos engineering have been around for years, and we have now reached the point where chaos engineering has gone from just being a buzzword and practice used by a few large organizations in very specific fields, to it being put in to use by companies of all sizes and industries.

Planning and performing chaos experiments on traditional infrastructure with virtual machines and microservices using containers has been battle-tested by many large organizations, but serverless functions and managed services present different failure modes and level of abstraction.

In this talk we focus on how to apply the principles of chaos engineering to serverless, both for serverless functions and managed services. This covers how hypothesis can be formed to fit serverless, what the experiments can achieve and how to practically perform them.

Join as we move from talking about the principles to performing real chaos in a serverless world!

Gunnar Grosch

October 16, 2020
Tweet

More Decks by Gunnar Grosch

Other Decks in Technology

Transcript

  1. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Gunnar Grosch @gunnargrosch October 16, 2020 Performing chaos in a serverless world Agile India 2020
  2. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Abstract The principles of chaos engineering have been battletested for years using traditional infrastructure and containerized microservices. But how do they work with serverless functions and managed services?
  3. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Agenda • What is chaos engineering? • Motivations behind chaos engineering • Serverless chaos experiments • Demos, demos, demos
  4. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. About me Senior Developer Advocate Background in development, operations, and management Community builder Father of three
  5. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What is chaos engineering?
  6. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What is chaos engineering? Chaos engineering is not about breaking things
  7. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What is chaos engineering? Chaos engineering is about finding the weaknesses in a system and fixing them before they break
  8. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What is chaos engineering? Chaos engineering is about building confidence in your system and in your organization
  9. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Motivations behind chaos engineering
  10. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Motivations behind chaos engineering Are your customers getting the experience they should? Is downtime or issues costing you money? Are you confident in your monitoring and alerting? Is your organization ready to handle outages? Are you learning from incidents?
  11. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Motivations behind chaos engineering Don’t ask what happens if a system fails; ask what happens when it fails
  12. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Motivations behind chaos engineering “Chaos engineering should be done regularly” Reliability Pillar AWS Well-Architected Framework
  13. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos experiments
  14. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos experiments Errors Failovers Fallbacks Timeouts Events
  15. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos experiments Inject errors into your code Remove downstream services Alter the concurrency of functions Restrict the capacity of tables Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  16. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos experiments Security policy errors CORS configuration errors Service configuration errors Function disk space failure Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  17. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos experiments Add latency to your functions • Cold starts • Runtime or code issues • Integration issues • Timeouts Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  18. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Failure-lambda NodeJS NPM package for NodeJS Lambdas https://github.com/gunnargrosch/failure-lambda Configuration using Parameter Store Several failure modes • Latency • Status code • Exception • Disk space • Denylist const failureLambda = require('failure-lambda’) exports.handler = failureLambda(async (event, context) => { ... }) { "isEnabled": false, "failureMode": "latency", "rate": 1, "minLatency": 100, "maxLatency": 400, "exceptionMsg": "Exception message!", "statusCode": 404, "diskSpace": 100, “denylist": [ "s3.*.amazonaws.com", "dynamodb.*.amazonaws.com" ] }
  19. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos demo
  20. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos demo
  21. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos demo Client Amazon S3 Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda AWS Lambda
  22. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Client Amazon S3 Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda AWS Lambda Serverless chaos demo • What if my function takes an extra 300 ms for each invocation? • What if my function returns an error code? • What if I can’t get data from DynamoDB?
  23. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos demo
  24. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos demo
  25. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos demo
  26. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What’s next? “Chaos engineering should be done regularly” Reliability Pillar AWS Well-Architected Framework
  27. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What’s next? “Chaos engineering should be done regularly, and be part of your CI/CD cycle” Reliability Pillar AWS Well-Architected Framework
  28. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos CI/CD demo
  29. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless chaos CI/CD demo • What if my function takes an extra 300 ms for each invocation? • What if my function returns an error code? • What if I can’t get data from DynamoDB? Default deploy
  30. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Canary deploy Serverless chaos CI/CD demo • What if my function takes an extra 300 ms for each invocation? • What if my function returns an error code? • What if I can’t get data from DynamoDB?
  31. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Feature flag Serverless chaos CI/CD demo • What if my function takes an extra 300 ms for each invocation? • What if my function returns an error code? • What if I can’t get data from DynamoDB?
  32. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Summary Chaos engineering helps us find weaknesses and fix them Chaos engineering is about building confidence Chaos engineering should be done regularly Chaos engineering should be part of your CI/CD It’s not rocket science; you can do it!
  33. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Do you want more? Reliability pillar https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/welcome.html Serverless Chaos Demo app https://demo.serverlesschaos.com Failure-lambda https://github.com/gunnargrosch/failure-lambda Chaos-lambda https://github.com/adhorn/aws-lambda-chaos-injection/ Circuitbreaker-lambda https://github.com/gunnargrosch/circuitbreaker-lambda Serverless chaos lab https://github.com/jpbarto/serverless-chaos-lab
  34. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Thank you! Gunnar Grosch @gunnargrosch