Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performing chaos in a serverless world - Chaos Community Day London September 13 2019

Gunnar Grosch
September 13, 2019

Performing chaos in a serverless world - Chaos Community Day London September 13 2019

The principles of chaos engineering have been battle-tested for years using traditional infrastructure and containerized microservices, but how do they work with serverless functions and managed services? In this session we'll cover what differs, how we perform chaos experiments and what some of the common weaknesses we can test for in our serverless applications are.

Presented at Chaos Community Day London September 13, 2019.

Gunnar Grosch

September 13, 2019
Tweet

More Decks by Gunnar Grosch

Other Decks in Technology

Transcript

  1. Chaos Community Day London 2019 @gunnargrosch Chaos Engineering has been

    battle-tested for years using traditional infrastructure and containerized microservices, but how does it work with serverless functions and managed services?
  2. Chaos Community Day London 2019 @gunnargrosch What is serverless? What

    we’ll cover Challenges for serverless Serverless chaos experiments
  3. Chaos Community Day London 2019 @gunnargrosch A resilient system is

    a highly available and durable system that can maintain an acceptable level of service in the face of failure.
  4. Chaos Community Day London 2019 @gunnargrosch About me Evangelist and

    co-founder at Opsio Background in development and operations Organizer of AWS User Groups and AWS Community Day Nordics ServerlessDays Stockholm and Serverless Meetups organizer Father of three chaos monkeys
  5. Chaos Community Day London 2019 @gunnargrosch “Serverless allows you to

    build and run applications and services without thinking about servers. It eliminates infrastructure management tasks such as server or cluster provisioning, patching, operating system maintenance, and capacity provisioning.” Amazon Web Services
  6. Chaos Community Day London 2019 @gunnargrosch “Everything fails, all the

    time!” Werner Vogels, CTO Amazon Source: HDMI No Signal To display Help, press the ? button
  7. Chaos Community Day London 2019 @gunnargrosch Don’t ask what happens

    if a system fails, but ask what happens when it fails.
  8. Chaos Community Day London 2019 @gunnargrosch Pros are also cons

    No servers to manage Less heavy lifting Lots of services to choose from Per function configuration and security More granular architectures
  9. Chaos Community Day London 2019 @gunnargrosch Common serverless weaknesses Missing

    error handling Wrong timeout values Missing fallback Missing regional failover
  10. Chaos Community Day London 2019 @gunnargrosch Serverless chaos experiments •

    Inject errors in your code • One in X requests throws an error • Turn on and off using parameter or variable • Remove downstream services • Alter the concurrency of your functions • Restrict the capacity of your DynamoDB table • Add configuration errors • Security policies • CORS configuration • Function disk space failures
  11. Chaos Community Day London 2019 @gunnargrosch Serverless chaos experiments •

    Add latency to your functions • Cold starts • Cloud provider issues • Runtime or code issues • Integration issues • Timeouts • Yan Cui wrote an article and published sample code. • Adrian Hornsby built a Lambda Layer around these ideas (and now a Python library).
  12. Chaos Community Day London 2019 @gunnargrosch Tools for serverless chaos

    experiments Gremlin gremlin.com Chaos Toolkit chaostoolkit.org Thundra thundra.io Build Your Own 127.0.0.1
  13. Chaos Community Day London 2019 @gunnargrosch ”This is cool but

    now on to what we normally do.” Most serverless devs
  14. Chaos Community Day London 2019 @gunnargrosch Failure injection • What

    if my function take 300 ms extra for each invocation? • What if my function returns an error code? • What if there is an exception in the code? • Hypothesis: My app can handle that failure is injected on a function level. • Let’s do it!
  15. Chaos Community Day London 2019 @gunnargrosch Summary • Everything fails,

    all the time. • A resilient system maintains an acceptable level of service in the face of failure. • Serverless has challenges but is a perfect fit for chaos engineering. • Focus your experiments closer to the user. • Use graceful degradation. • You can do it!
  16. Chaos Community Day London 2019 @gunnargrosch Do you want more?

    • Follow @serverlesschaos on Twitter • Try the Serverless Chaos Demo app: https://demo.serverlesschaos.com • YouTube videos and repositories: https://grosch.se • Yan Cui’s article on chaos engineering for Lambda: https://hackernoon.com/how-can-we-apply-the-principles-of-chaos-engineering-to-aws- lambda-80f87e3237e2 • Adrian Hornsby’s article on failure injection: https://medium.com/@adhorn/failure-injection-gain-confidence-in-your-serverless- application-ce6c0060f586