Slide 1

Slide 1 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gunnar Grosch @gunnargrosch October 13, 2020 Building reliable serverless applications AWS Community Day Dublin

Slide 2

Slide 2 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 3

Slide 3 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 4

Slide 4 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 5

Slide 5 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 6

Slide 6 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. About me Senior Developer Advocate Background in development, operations, and management Community builder Father of three

Slide 7

Slide 7 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 8

Slide 8 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 9

Slide 9 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless “Serverless allows you to build and run applications and services without thinking about servers” Amazon Web Services (AWS)

Slide 10

Slide 10 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless “without thinking about servers”

Slide 11

Slide 11 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless A distributed system has multiple components located on different machines that communicate and coordinate actions in order to appear as a single coherent system to the end-user

Slide 12

Slide 12 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless – Execution models Synchronous (push) Amazon API Gateway AWS Lambda /login Asynchronous (event) AWS Lambda Amazon SNS Amazon S3 AWS Lambda AWS Lambda Stream-based AWS Lambda Amazon DynamoDB Amazon Kinesis Poll-based Amazon SQS AWS Lambda

Slide 13

Slide 13 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless – Simple web service Amazon API Gateway AWS Lambda Client Amazon DynamoDB

Slide 14

Slide 14 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 15

Slide 15 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reliability Reliability is the probability that a product, system, or service will perform its intended function adequately for a specified period

Slide 16

Slide 16 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reliability Reliability is not the same as quality

Slide 17

Slide 17 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Robustness Robustness is the ability of a computer system to cope with errors during execution

Slide 18

Slide 18 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Resilience A resilient system can adjust its functioning prior to, during, or following events, and thereby sustain required operations under both expected and unexpected conditions

Slide 19

Slide 19 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Resilience “Resilience isn’t something you have, it’s something you do” Dr. David Woods

Slide 20

Slide 20 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 21

Slide 21 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Don’t ask what happens if a system fails; ask what happens when it fails

Slide 22

Slide 22 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Decouple architectures

Slide 23

Slide 23 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Decouple architectures Amazon Simple Storage Service AWS Lambda Client AWS Lambda Amazon Simple Queue Service Amazon SNS

Slide 24

Slide 24 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Decouple architectures AWS Step Functions Amazon EventBridge AWS Lambda SaaS Provider Amazon Simple Queue Service AWS Lambda Amazon DynamoDB

Slide 25

Slide 25 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Let AWS handle errors and retries

Slide 26

Slide 26 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Error handling Amazon Simple Storage Service AWS Lambda Amazon Kinesis Fail up the stack

Slide 27

Slide 27 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Aim for single- purpose functions

Slide 28

Slide 28 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Retries are good, right?

Slide 29

Slide 29 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Retries • Synchronous Lambdas return failures to the invoker for retry • Asynchronous Lambdas will retry up to 2 times and 6 hours • Stream-based Lambdas will store events up to 7 days for up to 10 000 retry attempts • Poll-based Lambdas will return the message to the queue up to 1 000 times for retry • SDK retries differs between runtimes and services • Beware the retry storm

Slide 30

Slide 30 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Make use of the circuit breaker pattern

Slide 31

Slide 31 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Circuit breaker pattern AWS Lambda Amazon DynamoDB Down stream service Status check

Slide 32

Slide 32 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Circuit breaker pattern

Slide 33

Slide 33 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Use dead letter queues to capture events that fail to process

Slide 34

Slide 34 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Dead letter queues Amazon SNS AWS Lambda Amazon SQS (DLQ) Client Amazon Simple Storage Service

Slide 35

Slide 35 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Dead letter queues • You can add dead letter queues to • SQS • SNS • Lambda • EventBridge • SQS uses SQS as DLQ • SNS uses SQS as DLQ • Lambda uses SQS or SNS as DLQ • EventBridge uses SQS as DLQ

Slide 36

Slide 36 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Use Lambda destinations to capture function failures

Slide 37

Slide 37 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Lambda destinations • For asynchronous Lambda executions • Route execution record based on function result • Success • Failure • Use another Lambda function, SQS, SNS or EventBridge as destination • Record contains details about request and response in JSON

Slide 38

Slide 38 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Lambda destinations Amazon Simple Storage Service Amazon Simple Storage Service AWS Lambda Amazon SQS On failure

Slide 39

Slide 39 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Lambda Building – Lambda destinations Amazon Simple Storage Service Amazon Simple Storage Service AWS Lambda On success

Slide 40

Slide 40 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Fallbacks are great – when used often

Slide 41

Slide 41 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Fallbacks “Such attempts at a completely different mechanism to try to achieve the same result are called fallback behavior, and are an anti-pattern to be avoided” Reliability Pillar AWS Well-Architected Framework

Slide 42

Slide 42 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Use chaos engineering to find weaknesses in your system

Slide 43

Slide 43 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering “Chaos engineering should be done regularly” Reliability Pillar AWS Well-Architected Framework

Slide 44

Slide 44 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering What if my function takes an extra 300 ms for each invocation? What if my function returns an error code? What if I can’t get data from DynamoDB?

Slide 45

Slide 45 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering

Slide 46

Slide 46 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering “Chaos engineering should be done regularly, and be part of your CI/CD cycle” Reliability Pillar AWS Well-Architected Framework

Slide 47

Slide 47 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering What if my function takes an extra 300 ms for each invocation? What if my function returns an error code? What if I can’t get data from DynamoDB? Failure injected through CI/CD

Slide 48

Slide 48 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary Resilience isn’t something you have, it’s something you do Decouple architectures Let AWS handle errors and retries Use the circuit breaker pattern to avoid retry storms Use dead letter queues to capture events that fail to process Use Lambda destinations for function failures Fallbacks are great – when used, tested and verified often Chaos engineering helps us find weaknesses and fix them

Slide 49

Slide 49 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Do you want more? Patterns and practices for building resilient Serverless applications (Yan Cui) https://www.slideshare.net/theburningmonk/patterns-and-practices-for-building-resilient-serverless-applications Serverless Microservice Patterns for AWS (Jeremy Daly) https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/ The Amazon Builders' Library https://aws.amazon.com/builders-library/ Timeouts, retries and backoff with jitter (Yan Cui) https://lumigo.io/blog/amazon-builders-library-in-focus-1-timeouts-retries-and-backoff-with-jitter/ Serverless Chaos Demo app https://demo.serverlesschaos.com Serverless Chaos Demo Circuit Breaker app https://circuitbreaker.serverlesschaos.com

Slide 50

Slide 50 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! Gunnar Grosch @gunnargrosch