Slide 1

Slide 1 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gunnar Grosch @gunnargrosch Building reliable serverless applications AWS Portsmouth User Group

Slide 2

Slide 2 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 3

Slide 3 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 4

Slide 4 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 5

Slide 5 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 6

Slide 6 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. About me Senior Developer Advocate Background in development, operations, and management Builder of communities Father of three

Slide 7

Slide 7 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 8

Slide 8 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 9

Slide 9 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is serverless? No infrastructure provisioning, no management Automatic scaling Pay for value Highly available and secure

Slide 10

Slide 10 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless applications with Lambda Event source Services Changes in data state Requests to endpoints Changes in resource state Function Node.js Python Java C# Go Ruby Runtime API

Slide 11

Slide 11 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless – Simple web service Amazon API Gateway AWS Lambda Client Amazon DynamoDB

Slide 12

Slide 12 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless A distributed system has multiple components located on different machines that communicate and coordinate actions in order to appear as a single coherent system to the end-user

Slide 13

Slide 13 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda can be invoked via three different methods Lambda Function Amazon SNS Amazon S3 reqs Lambda Function Asynchronous /order Amazon API Gateway Lambda Function Synchronous Amazon DynamoDB Amazon Kinesis Changes AWS Lambda Service Poll-Based

Slide 14

Slide 14 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Consider how you integrate applications and services Synchronous: API-based Asynchronous: event-driven Inter/intra-service Common for communication between apps Common for communication within apps Scalability Tools required to manage point-to-point connections Nearly infinitely scalable Latency Can be very low Higher in theory—but latency requirements are rarely as low as you think (think about P50, P99, etc.) Agility Easy to get started; hard to use point-to-point in large scale Decoupled systems increase agility dramatically

Slide 15

Slide 15 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 16

Slide 16 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reliability Reliability is the probability that a product, system, or service will perform its intended function adequately for a specified period

Slide 17

Slide 17 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reliability Reliability is not the same as quality

Slide 18

Slide 18 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Robustness Robustness is the ability of a computer system to cope with errors during execution

Slide 19

Slide 19 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Resilience A resilient system can adjust its functioning prior to, during, or following events, and thereby sustain required operations under both expected and unexpected conditions

Slide 20

Slide 20 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Resilience “Resilience isn’t something you have, it’s something you do” Dr. David Woods

Slide 21

Slide 21 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building reliable serverless applications

Slide 22

Slide 22 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Don’t ask what happens if a system fails; ask what happens when it fails

Slide 23

Slide 23 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Decouple architectures

Slide 24

Slide 24 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Contract decoupling • Decouple through a contract (API, message) • Allow changes in implementation API API Decoupling Runtime decoupling • Decouple through asynchronous invocations • Reduce the risk of cascading failures Service A Service B Message Message Service A Service B

Slide 25

Slide 25 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Simple Queue Service Building – Decouple architectures AWS Lambda Upstream Amazon DynamoDB

Slide 26

Slide 26 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Decouple architectures AWS Step Functions Amazon EventBridge AWS Lambda Upstream Amazon Simple Queue Service AWS Lambda Amazon DynamoDB

Slide 27

Slide 27 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Let AWS handle errors and retries

Slide 28

Slide 28 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Error handling Amazon Simple Storage Service AWS Lambda Amazon Kinesis Fail up the stack

Slide 29

Slide 29 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Aim for single- purpose functions

Slide 30

Slide 30 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Choose wisely between Standard and FIFO

Slide 31

Slide 31 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Standard and FIFO • Choose between Standard and FIFO for: • SQS • SNS • Use FIFO when you need strict message ordering and/or only once message processing. • Message throughput is nearly unlimited for SQS Standard queues and up to 3000 transactions per second for FIFO queues with batching. • Publish API is 300 transactions per second or 10 MB per second for SNS FIFO topics and up to 30000 transactions per second (dependent of region) for Standard topics.

Slide 32

Slide 32 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Take notice of retry behavior

Slide 33

Slide 33 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Retries • Synchronous invocation – Lambda returns failure to the invoker for retry • Asynchronous invocation – Lambda retries function errors twice • Event source mappings that read from streams – Lambda retries the entire batch of items • Event source mappings that read from queues – Lambda retries the entire batch of items • SDK retries differs between runtimes and services • Beware the retry storm

Slide 34

Slide 34 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Make use of the circuit breaker pattern

Slide 35

Slide 35 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Circuit breaker pattern

Slide 36

Slide 36 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Circuit breaker pattern

Slide 37

Slide 37 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Use dead letter queues to capture events that fail to process

Slide 38

Slide 38 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Dead letter queues Amazon SNS AWS Lambda Amazon SQS (DLQ) Client Amazon Simple Storage Service Reprocess

Slide 39

Slide 39 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Dead letter queues • You can add dead letter queues to • SQS • SNS • Lambda • EventBridge • SQS uses SQS as DLQ • SNS uses SQS as DLQ • Lambda uses SQS or SNS as DLQ • EventBridge uses SQS as DLQ

Slide 40

Slide 40 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Use Lambda destinations to capture function failures

Slide 41

Slide 41 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Lambda Event Destinations • For asynchronous Lambda executions • Record contains details about request and response in JSON • Route execution record based on function result • Success • Failure • Use another Lambda function, SQS, SNS or EventBridge as destination AWS Lambda AWS Lambda Amazon EventBridge Amazon SNS Amazon SQS

Slide 42

Slide 42 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Lambda destinations Amazon Simple Storage Service Amazon Simple Storage Service AWS Lambda Amazon SQS On failure AWS Lambda On success

Slide 43

Slide 43 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Separate configuration from your code

Slide 44

Slide 44 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Separate configuration from your code • Use AWS AppConfig to separate your configuration from your code • Allows you to: • Validate the configuration prior to deploy to make sure it is syntactically valid and semantically correct • Deploy using a gradual or non-gradual deploy strategy • Monitor the most recently-deployed configuration and automatically rollback if you configure CloudWatch alarms

Slide 45

Slide 45 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Archive and replay events with EventBridge

Slide 46

Slide 46 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Archive and replay events with EventBridge • Create an encrypted archive of the events published to an event bus • Archive all events, or filter them using pattern matching • Store event indefinitely or set up a retention period • Replay the events stored in an archive • Events are replayed to all rules defined for the event bus or to the rules you specify • Replayed events contain an extra replay-name field • Currently, you can only replay events to the same event bus • Works with all events process by EventBridge, including events from the AWS platform, from SaaS integrations, and your own custom events • During replays, your current event throughput is unaffected

Slide 47

Slide 47 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Fallbacks are great – when used often

Slide 48

Slide 48 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Fallbacks “Such attempts at a completely different mechanism to try to achieve the same result are called fallback behavior, and are an anti-pattern to be avoided” Reliability Pillar AWS Well-Architected Framework

Slide 49

Slide 49 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Use chaos engineering to find weaknesses in your system

Slide 50

Slide 50 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering “Chaos engineering should be done regularly” Reliability Pillar AWS Well-Architected Framework

Slide 51

Slide 51 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering Add latency Inject errors Create exceptions Fill disk space Block network connections Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)

Slide 52

Slide 52 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering

Slide 53

Slide 53 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering “Chaos engineering should be done regularly, and be part of your CI/CD cycle” Reliability Pillar AWS Well-Architected Framework

Slide 54

Slide 54 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building – Chaos engineering What if my function takes an extra 300 ms for each invocation? What if my function returns an error code? What if I can’t get data from DynamoDB? Failure injected through CI/CD

Slide 55

Slide 55 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary Resilience isn’t something you have, it’s something you do Decouple architectures Let AWS handle errors and retries Use the circuit breaker pattern to avoid retry storms Use dead letter queues to capture events that fail to process Use Lambda destinations for function failures Separate configuration from your code Fallbacks are great – when used, tested and verified often Chaos engineering helps us find weaknesses and fix them

Slide 56

Slide 56 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Do you want more? AWS Well-Architected Framework Reliability Pillar https://d1.awsstatic.com/whitepapers/architecture/AWS-Reliability-Pillar.pdf Patterns and practices for building resilient Serverless applications (Yan Cui) https://www.slideshare.net/theburningmonk/patterns-and-practices-for-building-resilient-serverless-applications Serverless Microservice Patterns for AWS (Jeremy Daly) https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/ The Amazon Builders' Library https://aws.amazon.com/builders-library/ Timeouts, retries and backoff with jitter (Yan Cui) https://lumigo.io/blog/amazon-builders-library-in-focus-1-timeouts-retries-and-backoff-with-jitter/ Failure-lambda https://github.com/gunnargrosch/failure-lambda Circuitbreaker-lambda https://github.com/gunnargrosch/circuitbreaker-lambda

Slide 57

Slide 57 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! Gunnar Grosch @gunnargrosch