Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building reliable serverless applications - AWS Portsmouth User Group November 10 2020

Building reliable serverless applications - AWS Portsmouth User Group November 10 2020

Presented at AWS Portsmouth User Group, November 10th, 2020.

@gunnargrosch
failure-lambda
circuitbreaker-lambda

Serverless and fully managed services give you high availability and robustness out of the box, but even though every piece of your architecture might be resilient to failure you still need to use well-architected patterns and practices to make your application reliable. In this session we'll dive head first into the world of robustness, reliability and resilience to examine some of the patterns and practices we use to build battle-tested serverless applications.

Gunnar Grosch

November 10, 2020
Tweet

More Decks by Gunnar Grosch

Other Decks in Technology

Transcript

  1. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Gunnar Grosch @gunnargrosch Building reliable serverless applications AWS Portsmouth User Group
  2. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  3. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  4. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  5. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  6. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. About me Senior Developer Advocate Background in development, operations, and management Builder of communities Father of three
  7. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  8. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  9. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What is serverless? No infrastructure provisioning, no management Automatic scaling Pay for value Highly available and secure
  10. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless applications with Lambda Event source Services Changes in data state Requests to endpoints Changes in resource state Function Node.js Python Java C# Go Ruby Runtime API
  11. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless – Simple web service Amazon API Gateway AWS Lambda Client Amazon DynamoDB
  12. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless A distributed system has multiple components located on different machines that communicate and coordinate actions in order to appear as a single coherent system to the end-user
  13. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Lambda can be invoked via three different methods Lambda Function Amazon SNS Amazon S3 reqs Lambda Function Asynchronous /order Amazon API Gateway Lambda Function Synchronous Amazon DynamoDB Amazon Kinesis Changes AWS Lambda Service Poll-Based
  14. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Consider how you integrate applications and services Synchronous: API-based Asynchronous: event-driven Inter/intra-service Common for communication between apps Common for communication within apps Scalability Tools required to manage point-to-point connections Nearly infinitely scalable Latency Can be very low Higher in theory—but latency requirements are rarely as low as you think (think about P50, P99, etc.) Agility Easy to get started; hard to use point-to-point in large scale Decoupled systems increase agility dramatically
  15. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  16. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Reliability Reliability is the probability that a product, system, or service will perform its intended function adequately for a specified period
  17. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Reliability Reliability is not the same as quality
  18. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Robustness Robustness is the ability of a computer system to cope with errors during execution
  19. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Resilience A resilient system can adjust its functioning prior to, during, or following events, and thereby sustain required operations under both expected and unexpected conditions
  20. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Resilience “Resilience isn’t something you have, it’s something you do” Dr. David Woods
  21. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  22. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Don’t ask what happens if a system fails; ask what happens when it fails
  23. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Decouple architectures
  24. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Contract decoupling • Decouple through a contract (API, message) • Allow changes in implementation API API Decoupling Runtime decoupling • Decouple through asynchronous invocations • Reduce the risk of cascading failures Service A Service B Message Message Service A Service B
  25. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Amazon Simple Queue Service Building – Decouple architectures AWS Lambda Upstream Amazon DynamoDB
  26. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Decouple architectures AWS Step Functions Amazon EventBridge AWS Lambda Upstream Amazon Simple Queue Service AWS Lambda Amazon DynamoDB
  27. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Let AWS handle errors and retries
  28. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Error handling Amazon Simple Storage Service AWS Lambda Amazon Kinesis Fail up the stack
  29. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Aim for single- purpose functions
  30. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Choose wisely between Standard and FIFO
  31. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Standard and FIFO • Choose between Standard and FIFO for: • SQS • SNS • Use FIFO when you need strict message ordering and/or only once message processing. • Message throughput is nearly unlimited for SQS Standard queues and up to 3000 transactions per second for FIFO queues with batching. • Publish API is 300 transactions per second or 10 MB per second for SNS FIFO topics and up to 30000 transactions per second (dependent of region) for Standard topics.
  32. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Take notice of retry behavior
  33. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Retries • Synchronous invocation – Lambda returns failure to the invoker for retry • Asynchronous invocation – Lambda retries function errors twice • Event source mappings that read from streams – Lambda retries the entire batch of items • Event source mappings that read from queues – Lambda retries the entire batch of items • SDK retries differs between runtimes and services • Beware the retry storm
  34. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Make use of the circuit breaker pattern
  35. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Circuit breaker pattern
  36. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Circuit breaker pattern
  37. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Use dead letter queues to capture events that fail to process
  38. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Dead letter queues Amazon SNS AWS Lambda Amazon SQS (DLQ) Client Amazon Simple Storage Service Reprocess
  39. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Dead letter queues • You can add dead letter queues to • SQS • SNS • Lambda • EventBridge • SQS uses SQS as DLQ • SNS uses SQS as DLQ • Lambda uses SQS or SNS as DLQ • EventBridge uses SQS as DLQ
  40. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Use Lambda destinations to capture function failures
  41. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. AWS Lambda Event Destinations • For asynchronous Lambda executions • Record contains details about request and response in JSON • Route execution record based on function result • Success • Failure • Use another Lambda function, SQS, SNS or EventBridge as destination AWS Lambda AWS Lambda Amazon EventBridge Amazon SNS Amazon SQS
  42. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Lambda destinations Amazon Simple Storage Service Amazon Simple Storage Service AWS Lambda Amazon SQS On failure AWS Lambda On success
  43. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Separate configuration from your code
  44. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Separate configuration from your code • Use AWS AppConfig to separate your configuration from your code • Allows you to: • Validate the configuration prior to deploy to make sure it is syntactically valid and semantically correct • Deploy using a gradual or non-gradual deploy strategy • Monitor the most recently-deployed configuration and automatically rollback if you configure CloudWatch alarms
  45. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Archive and replay events with EventBridge
  46. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Archive and replay events with EventBridge • Create an encrypted archive of the events published to an event bus • Archive all events, or filter them using pattern matching • Store event indefinitely or set up a retention period • Replay the events stored in an archive • Events are replayed to all rules defined for the event bus or to the rules you specify • Replayed events contain an extra replay-name field • Currently, you can only replay events to the same event bus • Works with all events process by EventBridge, including events from the AWS platform, from SaaS integrations, and your own custom events • During replays, your current event throughput is unaffected
  47. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Fallbacks are great – when used often
  48. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Fallbacks “Such attempts at a completely different mechanism to try to achieve the same result are called fallback behavior, and are an anti-pattern to be avoided” Reliability Pillar AWS Well-Architected Framework
  49. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Use chaos engineering to find weaknesses in your system
  50. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering “Chaos engineering should be done regularly” Reliability Pillar AWS Well-Architected Framework
  51. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering Add latency Inject errors Create exceptions Fill disk space Block network connections Client Amazon Simple Storage Service (Amazon S3) Amazon API Gateway AWS Lambda Amazon DynamoDB AWS Lambda Amazon Simple Storage Service (Amazon S3)
  52. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering
  53. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering “Chaos engineering should be done regularly, and be part of your CI/CD cycle” Reliability Pillar AWS Well-Architected Framework
  54. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering What if my function takes an extra 300 ms for each invocation? What if my function returns an error code? What if I can’t get data from DynamoDB? Failure injected through CI/CD
  55. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Summary Resilience isn’t something you have, it’s something you do Decouple architectures Let AWS handle errors and retries Use the circuit breaker pattern to avoid retry storms Use dead letter queues to capture events that fail to process Use Lambda destinations for function failures Separate configuration from your code Fallbacks are great – when used, tested and verified often Chaos engineering helps us find weaknesses and fix them
  56. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Do you want more? AWS Well-Architected Framework Reliability Pillar https://d1.awsstatic.com/whitepapers/architecture/AWS-Reliability-Pillar.pdf Patterns and practices for building resilient Serverless applications (Yan Cui) https://www.slideshare.net/theburningmonk/patterns-and-practices-for-building-resilient-serverless-applications Serverless Microservice Patterns for AWS (Jeremy Daly) https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/ The Amazon Builders' Library https://aws.amazon.com/builders-library/ Timeouts, retries and backoff with jitter (Yan Cui) https://lumigo.io/blog/amazon-builders-library-in-focus-1-timeouts-retries-and-backoff-with-jitter/ Failure-lambda https://github.com/gunnargrosch/failure-lambda Circuitbreaker-lambda https://github.com/gunnargrosch/circuitbreaker-lambda
  57. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Thank you! Gunnar Grosch @gunnargrosch