Public talk presented at the AWS Community Summit Online 2020 conference.
Speakers:
Sara Gerion, @sarutule
Yan Cui, @theburningmonk
Lambda gives you a lot of scalability and multi-AZ out-of-the-box, but still, things can go wrong in production.
There are region-wide outages, and performance degradation in services your function depends on can cause it to time out or error. And what if you're dealing with downstream systems that just aren't as scalable and can't handle the load you put on them?
The bottom line is many things can go wrong and they often do at the worst times. The goal of building resilient systems is not to prevent failures, but to build systems that can withstand these failures. In this talk, we will look at a number of practices and architectural patterns that can help you build more resilient serverless applications. Such as multi-region, active-active, employing DLQs and surge queues, and using chaos experiments to identify failure modes before they manifest in production.
Recording available here:
https://www.youtube.com/watch?v=elVeOYYtLM0