Slide 1

Slide 1 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gunnar Grosch @gunnargrosch October 27-28, 2020 Chaos engineering for serverless applications AWS Resiliency and Chaos Engineering

Slide 2

Slide 2 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Motivations behind chaos engineering

Slide 3

Slide 3 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Motivations behind chaos engineering Downtime or issues Happy users Learning from incidents Monitoring and alerting Organizational confidence

Slide 4

Slide 4 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos experiments

Slide 5

Slide 5 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is serverless? No infrastructure provisioning, no management Automatic scaling Pay for value Highly available and secure

Slide 6

Slide 6 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos experiments Errors Fallbacks Timeouts Events

Slide 7

Slide 7 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos experiments Inject errors into your code Remove downstream services Alter the concurrency of functions Restrict the capacity of tables Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)

Slide 8

Slide 8 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos experiments Security policy errors CORS configuration errors Service configuration errors Function disk space failure Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)

Slide 9

Slide 9 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos experiments Add latency to your functions • Cold starts • Runtime or code issues • Integration issues • Timeouts Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)

Slide 10

Slide 10 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Chaos engineering for serverless tools Chaos-lambda Python Failure-lambda NodeJS

Slide 11

Slide 11 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Failure-lambda NodeJS NPM package for NodeJS Lambdas Configuration using Parameter Store or AWS AppConfig Several failure modes • Latency • Status code • Exception • Disk space • Denylist const failureLambda = require('failure-lambda’) exports.handler = failureLambda(async (event, context) => { ... }) { "isEnabled": false, "failureMode": "latency", "rate": 1, "minLatency": 100, "maxLatency": 400, "exceptionMsg": "Exception message!", "statusCode": 404, "diskSpace": 100, “denylist": [ "s3.*.amazonaws.com", "dynamodb.*.amazonaws.com" ] }

Slide 12

Slide 12 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos demo

Slide 13

Slide 13 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos demo

Slide 14

Slide 14 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos demo Client Amazon S3 Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda AWS Lambda

Slide 15

Slide 15 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Client Amazon S3 Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda AWS Lambda Serverless chaos demo • What if my function takes an extra 300 ms for each invocation? • What if my function returns an error code? • What if I can’t get data from DynamoDB?

Slide 16

Slide 16 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 17

Slide 17 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s next?

Slide 18

Slide 18 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos CI/CD demo

Slide 19

Slide 19 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos CI/CD demo • What if my function takes an extra 300 ms for each invocation? • What if my function returns an error code? • What if I can’t get data from DynamoDB? Default deploy

Slide 20

Slide 20 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Feature flag Serverless chaos CI/CD demo • What if my function takes an extra 300 ms for each invocation? • What if my function returns an error code? • What if I can’t get data from DynamoDB?

Slide 21

Slide 21 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Canary deploy Serverless chaos CI/CD demo • What if my function takes an extra 300 ms for each invocation? • What if my function returns an error code? • What if I can’t get data from DynamoDB?

Slide 22

Slide 22 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 23

Slide 23 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application configuration control

Slide 24

Slide 24 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Adding application configuration control AWS AppConfig allows us to create, manage, and quickly deploy config Features: • Validate failure configuration • Deploy configuration using gradual or non- gradual deploy strategy • Monitor deployed configuration with automatical rollback

Slide 25

Slide 25 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless chaos AppConfig demo

Slide 26

Slide 26 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 27

Slide 27 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary Use chaos engineering to find weaknesses and fix them Use chaos engineering to verify that your system behaves as expected Use chaos engineering to build confidence Chaos engineering should be done regularly It’s easy to get started.

Slide 28

Slide 28 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! Gunnar Grosch @gunnargrosch