Building reliable serverless applications - AWS Community Day Dublin October 13 2020

Building reliable serverless applications - AWS Community Day Dublin October 13 2020

Presented at AWS Community Day Dublin, October 13th, 2020.

@gunnargrosch
Serverless Chaos Demo
failure-lambda
Serverless Chaos Circuit Breaker Demo
circuitbreaker-lambda

Serverless and fully managed services give you high availability and robustness out of the box, but even though every piece of your architecture might be resilient to failure you still need to use well-architected patterns and practices to make your application reliable. In this session we'll dive head first into the world of robustness, reliability and resilience to examine some of the patterns and practices we use to build battle-tested serverless applications. We'll also look at how to verify the output of the system through chaos engineering and the advantages established by CI/CD.

B2fefbb30aba7c25bbe0c8819791631a?s=128

Gunnar Grosch

October 13, 2020
Tweet

Transcript

  1. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Gunnar Grosch @gunnargrosch October 13, 2020 Building reliable serverless applications AWS Community Day Dublin
  2. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  3. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  4. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  5. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  6. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. About me Senior Developer Advocate Background in development, operations, and management Community builder Father of three
  7. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  8. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  9. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless “Serverless allows you to build and run applications and services without thinking about servers” Amazon Web Services (AWS)
  10. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless “without thinking about servers”
  11. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless A distributed system has multiple components located on different machines that communicate and coordinate actions in order to appear as a single coherent system to the end-user
  12. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless – Execution models Synchronous (push) Amazon API Gateway AWS Lambda /login Asynchronous (event) AWS Lambda Amazon SNS Amazon S3 AWS Lambda AWS Lambda Stream-based AWS Lambda Amazon DynamoDB Amazon Kinesis Poll-based Amazon SQS AWS Lambda
  13. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Serverless – Simple web service Amazon API Gateway AWS Lambda Client Amazon DynamoDB
  14. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  15. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Reliability Reliability is the probability that a product, system, or service will perform its intended function adequately for a specified period
  16. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Reliability Reliability is not the same as quality
  17. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Robustness Robustness is the ability of a computer system to cope with errors during execution
  18. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Resilience A resilient system can adjust its functioning prior to, during, or following events, and thereby sustain required operations under both expected and unexpected conditions
  19. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Resilience “Resilience isn’t something you have, it’s something you do” Dr. David Woods
  20. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building reliable serverless applications
  21. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Don’t ask what happens if a system fails; ask what happens when it fails
  22. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Decouple architectures
  23. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Decouple architectures Amazon Simple Storage Service AWS Lambda Client AWS Lambda Amazon Simple Queue Service Amazon SNS
  24. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Decouple architectures AWS Step Functions Amazon EventBridge AWS Lambda SaaS Provider Amazon Simple Queue Service AWS Lambda Amazon DynamoDB
  25. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Let AWS handle errors and retries
  26. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Error handling Amazon Simple Storage Service AWS Lambda Amazon Kinesis Fail up the stack
  27. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Aim for single- purpose functions
  28. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Retries are good, right?
  29. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Retries • Synchronous Lambdas return failures to the invoker for retry • Asynchronous Lambdas will retry up to 2 times and 6 hours • Stream-based Lambdas will store events up to 7 days for up to 10 000 retry attempts • Poll-based Lambdas will return the message to the queue up to 1 000 times for retry • SDK retries differs between runtimes and services • Beware the retry storm
  30. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Make use of the circuit breaker pattern
  31. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Circuit breaker pattern AWS Lambda Amazon DynamoDB Down stream service Status check
  32. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Circuit breaker pattern
  33. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Use dead letter queues to capture events that fail to process
  34. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Dead letter queues Amazon SNS AWS Lambda Amazon SQS (DLQ) Client Amazon Simple Storage Service
  35. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Dead letter queues • You can add dead letter queues to • SQS • SNS • Lambda • EventBridge • SQS uses SQS as DLQ • SNS uses SQS as DLQ • Lambda uses SQS or SNS as DLQ • EventBridge uses SQS as DLQ
  36. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Use Lambda destinations to capture function failures
  37. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Lambda destinations • For asynchronous Lambda executions • Route execution record based on function result • Success • Failure • Use another Lambda function, SQS, SNS or EventBridge as destination • Record contains details about request and response in JSON
  38. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Lambda destinations Amazon Simple Storage Service Amazon Simple Storage Service AWS Lambda Amazon SQS On failure
  39. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. AWS Lambda Building – Lambda destinations Amazon Simple Storage Service Amazon Simple Storage Service AWS Lambda On success
  40. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Fallbacks are great – when used often
  41. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Fallbacks “Such attempts at a completely different mechanism to try to achieve the same result are called fallback behavior, and are an anti-pattern to be avoided” Reliability Pillar AWS Well-Architected Framework
  42. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building Use chaos engineering to find weaknesses in your system
  43. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering “Chaos engineering should be done regularly” Reliability Pillar AWS Well-Architected Framework
  44. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering What if my function takes an extra 300 ms for each invocation? What if my function returns an error code? What if I can’t get data from DynamoDB?
  45. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering
  46. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering “Chaos engineering should be done regularly, and be part of your CI/CD cycle” Reliability Pillar AWS Well-Architected Framework
  47. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Building – Chaos engineering What if my function takes an extra 300 ms for each invocation? What if my function returns an error code? What if I can’t get data from DynamoDB? Failure injected through CI/CD
  48. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Summary Resilience isn’t something you have, it’s something you do Decouple architectures Let AWS handle errors and retries Use the circuit breaker pattern to avoid retry storms Use dead letter queues to capture events that fail to process Use Lambda destinations for function failures Fallbacks are great – when used, tested and verified often Chaos engineering helps us find weaknesses and fix them
  49. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Do you want more? Patterns and practices for building resilient Serverless applications (Yan Cui) https://www.slideshare.net/theburningmonk/patterns-and-practices-for-building-resilient-serverless-applications Serverless Microservice Patterns for AWS (Jeremy Daly) https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/ The Amazon Builders' Library https://aws.amazon.com/builders-library/ Timeouts, retries and backoff with jitter (Yan Cui) https://lumigo.io/blog/amazon-builders-library-in-focus-1-timeouts-retries-and-backoff-with-jitter/ Serverless Chaos Demo app https://demo.serverlesschaos.com Serverless Chaos Demo Circuit Breaker app https://circuitbreaker.serverlesschaos.com
  50. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Thank you! Gunnar Grosch @gunnargrosch