Performing chaos in a serverless world - ServerlessDays Hamburg October 2 2020

Performing chaos in a serverless world - ServerlessDays Hamburg October 2 2020

Presented at ServerlessDays Hamburg, October 2nd, 2020.

@gunnargrosch
Serverless Chaos Demo
failure-lambda
failure-azurefunctions
failure-cloudfunctions

Chaos engineering is the practice of hypothesis testing through planned experiments to gain a better understanding of a system’s behavior. The principles of chaos engineering have been around for years, and we have now reached the point where chaos engineering has gone from just being a buzzword and practice used by a few large organizations in very specific fields, to it being put in to use by companies of all sizes and industries.

Planning and performing chaos experiments on traditional infrastructure with virtual machines and microservices using containers has been battle-tested by many large organizations, but serverless functions and managed services present different failure modes and level of abstraction.

In this talk we focus on how to apply the principles of chaos engineering to serverless, both for serverless functions and managed services. This covers how hypothesis can be formed to fit serverless, what the experiments can achieve and how to practically perform them.

With tools for chaos engineering, both commercial and open-source, getting more mature most of them still have focus primarily on virtual machines and containers. We’ll look at what tools are out there to help with chaos experiments for serverless and managed services, but also how you can build your own.

Join as we move from talking about the principles to performing real chaos in a serverless world!

B2fefbb30aba7c25bbe0c8819791631a?s=128

Gunnar Grosch

October 02, 2020
Tweet

Transcript

  1. @gunnargrosch Gunnar Grosch October 2, 2020 Performing chaos in a

    serverless world ServerlessDays Hamburg
  2. @gunnargrosch Abstract The principles of chaos engineering have been battletested

    for years using traditional infrastructure and containerized microservices. But how do they work with serverless functions and managed services?
  3. @gunnargrosch Agenda • What is chaos engineering? • Motivations behind

    chaos engineering • Running chaos experiments • Challenges with serverless • Serverless chaos experiments
  4. @gunnargrosch About me Background in development, operations, and management Organizer

    of user groups and conferences AWS Serverless Hero Father of three
  5. @gunnargrosch What is chaos engineering?

  6. @gunnargrosch What is chaos engineering? Chaos engineering is not about

    breaking things
  7. @gunnargrosch What is chaos engineering? Chaos engineering is not only

    for production
  8. @gunnargrosch What is chaos engineering? “Chaos Engineering is the discipline

    of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production” principlesofchaos.org
  9. @gunnargrosch What is chaos engineering? Chaos engineering is about performing

    controlled experiments to inject failures
  10. @gunnargrosch What is chaos engineering? Chaos engineering is about finding

    the weaknesses in a system and fixing them before they break
  11. @gunnargrosch What is chaos engineering? Chaos engineering is about building

    confidence in your system and in your organization
  12. @gunnargrosch Motivations behind chaos engineering

  13. @gunnargrosch Motivations behind chaos engineering Are your customers getting the

    experience they should? Is downtime or issues costing you money? Are you confident in your monitoring and alerting? Is your organization ready to handle outages? Are you learning from incidents?
  14. @gunnargrosch Motivations behind chaos engineering Don’t ask what happens if

    a system fails; ask what happens when it fails
  15. @gunnargrosch Motivations behind chaos engineering “Chaos engineering should be done

    regularly” Reliability Pillar AWS Well-Architected Framework
  16. @gunnargrosch Running chaos experiments

  17. @gunnargrosch Running chaos experiments Define steady state Form your hypothesis

    Plan and run your experiment Measure and learn
  18. @gunnargrosch Challenges with serverless

  19. @gunnargrosch Challenges with serverless “Serverless allows you to build and

    run applications and services without thinking about servers” Amazon Web Services (AWS)
  20. @gunnargrosch Challenges with serverless “There are still servers in serverless”

    Serverhuggers
  21. @gunnargrosch Challenges with serverless No servers to manage Less heavy

    lifting Lots of services to choose from Per function and service configuration More granular architectures
  22. @gunnargrosch Serverless chaos experiments

  23. @gunnargrosch Serverless chaos experiments Inject errors into your code Remove

    downstream services Alter the concurrency of functions Restrict the capacity of tables Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  24. @gunnargrosch Serverless chaos experiments Security policy errors CORS configuration errors

    Service configuration errors Function disk space failure Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  25. @gunnargrosch Serverless chaos experiments Add latency to your functions Cold

    starts Cloud provider issues Runtime or code issues Integration issues Timeouts Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  26. @gunnargrosch Failure-lambda NodeJS NPM package for NodeJS Lambdas https://github.com/gunnargrosch/failure-lambda Configuration

    using Parameter Store Several failure modes Latency Status code Exception Disk space Denylist const failureLambda = require('failure-lambda’) exports.handler = failureLambda(async (event, context) => { ... }) { "isEnabled": false, "failureMode": "latency", "rate": 1, "minLatency": 100, "maxLatency": 400, "exceptionMsg": "Exception message!", "statusCode": 404, "diskSpace": 100, “denylist": [ "s3.*.amazonaws.com", "dynamodb.*.amazonaws.com" ] }
  27. @gunnargrosch Serverless chaos demo

  28. @gunnargrosch Serverless chaos demo

  29. @gunnargrosch Serverless chaos demo Client Amazon S3 Amazon API Gateway

    AWS Lambda Amazon DynamoDB AWS Lambda AWS Lambda
  30. @gunnargrosch Client Amazon S3 Amazon API Gateway AWS Lambda Amazon

    DynamoDB AWS Lambda AWS Lambda Serverless chaos demo What if my function takes an extra 300 ms for each invocation? What if my function returns an error code? What if I can’t get data from DynamoDB? Hypothesis: If we inject failure to functions then my application will use graceful degradation.
  31. @gunnargrosch Demo

  32. @gunnargrosch What’s next? “Chaos engineering should be done regularly” Reliability

    Pillar AWS Well-Architected Framework
  33. @gunnargrosch What’s next? “Chaos engineering should be done regularly, and

    be part of your CI/CD cycle” Reliability Pillar AWS Well-Architected Framework
  34. @gunnargrosch Demo

  35. @gunnargrosch What’s next? “Chaos engineering should be done regularly” Reliability

    Pillar AWS Well-Architected Framework
  36. @gunnargrosch Summary Serverless doesn’t make your application resilient Chaos engineering

    helps us find weaknesses and fix them Chaos engineering is about building confidence Chaos engineering should be done regularly It’s not rocket science; you can do it!
  37. @gunnargrosch Do you want more? Follow @serverlesschaos on Twitter Serverless

    Chaos Demo app: https://demo.serverlesschaos.com Failure-lambda: https://github.com/gunnargrosch/failure-lambda Failure-cloudfunctions: https://github.com/gunnargrosch/failure-cloudfunctions Failure-azurefunctions: https://github.com/gunnargrosch/failure-azurefunctions Chaos-lambda: https://github.com/adhorn/aws-lambda-chaos-injection/ Serverless chaos lab: https://github.com/jpbarto/serverless-chaos-lab YouTube videos and repositories: https://grosch.se
  38. @gunnargrosch Thank you! Gunnar Grosch @gunnargrosch