Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performing chaos in a serverless world - Server...

Performing chaos in a serverless world - ServerlessDays Milano June 21 2019

The principles of chaos engineering have been battle-tested for years using traditional infrastructure and containerized microservices, but how do they work with serverless functions and managed services? Join as we move from talking about principles to performing real chaos in a serverless world!

Gunnar Grosch

June 21, 2019
Tweet

More Decks by Gunnar Grosch

Other Decks in Technology

Transcript

  1. ServerlessDays Milano 2019 @gunnargrosch Chaos Engineering has been battle-tested for

    years using traditional infrastructure and containerized microservices, but how does it work with serverless functions and managed services?
  2. ServerlessDays Milano 2019 @gunnargrosch What we’ll cover What is Chaos

    Engineering? Running chaos experiments Challenges when using Chaos Engineering for serverless Serverless chaos experiments
  3. ServerlessDays Milano 2019 @gunnargrosch A resilient system is a highly

    available and durable system that can maintain an acceptable level of service in the face of failure.
  4. ServerlessDays Milano 2019 @gunnargrosch About me Evangelist and co-founder at

    Opsio Background in development and operations Organizer of AWS User Groups and Serverless Meetups ServerlessDays Stockholm and AWS Community Day Nordics organizer Father of three chaos monkeys
  5. ServerlessDays Milano 2019 @gunnargrosch Chaos Engineering is the discipline of

    experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production. principlesofchaos.org
  6. ServerlessDays Milano 2019 @gunnargrosch Chaos Engineering is about finding the

    weaknesses in a system and fixing them before they break
  7. ServerlessDays Milano 2019 @gunnargrosch “Everything fails, all the time!” Werner

    Vogels, CTO Amazon Source: HDMI No Signal To display Help, press the ? button
  8. ServerlessDays Milano 2019 @gunnargrosch Don’t ask what happens if a

    system fails, but ask what happens when it fails.
  9. ServerlessDays Milano 2019 @gunnargrosch Why run experiments? Are your customers

    getting the experience they should? Is downtime or issues costing you money? Are you confident in your monitoring and alerting? Is your organization ready to handle outages?
  10. ServerlessDays Milano 2019 @gunnargrosch Step 1: Define steady state The

    normal behavior of a system over time System metrics and business metrics Steady state is not necessarily continuous Business metrics are usually more useful
  11. ServerlessDays Milano 2019 @gunnargrosch Step 2: Form your hypothesis Chaos

    can be injected at any layer in the stack Use what if:s Always fix known problems first!
  12. ServerlessDays Milano 2019 @gunnargrosch Step 3: Plan and run your

    experiment Whiteboard the experiment in detail Contain the blast radius Notify the organization Make sure to have a ”stop” button
  13. ServerlessDays Milano 2019 @gunnargrosch Step 4: Measure and learn Use

    metrics to prove or disprove the hypothesis Was the system resilient to the injected failure? Did anything unexpected happen? Share your progress and success!
  14. ServerlessDays Milano 2019 @gunnargrosch Step 5: Scale up or abort

    and fix With confidence you can scale-up Increased scope can reveal new effects
  15. ServerlessDays Milano 2019 @gunnargrosch Serverless means new challenges No servers

    to manage Less heavy lifting Lots of services Per function configuration More granular architectures
  16. ServerlessDays Milano 2019 @gunnargrosch Common serverless weaknesses Missing error handling

    Wrong timeout values Missing fallback Missing regional failover
  17. ServerlessDays Milano 2019 @gunnargrosch Error injection • Inject errors in

    your code • One in X requests throws an error • Turn on and off using parameter or variable • Alter the concurrency of your functions • Restrict the capacity of your DynamoDB table • Add configuration errors • Security policies • CORS configuration
  18. ServerlessDays Milano 2019 @gunnargrosch Latency injection • Add latency to

    your functions • Cold starts • Cloud provider issues • Runtime or code issues • Integration issues • Timeouts • Yan Cui wrote an article and published sample code. • Adrian Hornsby built a Lambda Layer around these ideas.
  19. ServerlessDays Milano 2019 @gunnargrosch Latency injection • What if my

    functions take X ms extra for each invocation? • What if timeouts occur? • Hypothesis: My app can handle that latency is injected on a function level. • Let’s do it!
  20. ServerlessDays Milano 2019 @gunnargrosch Sample tools Gremlin gremlin.com Chaos Toolkit

    chaostoolkit.org Thundra thundra.io Build Your Own 127.0.0.1
  21. ServerlessDays Milano 2019 @gunnargrosch Summary • Chaos Engineering is not

    about breaking things • Chaos Engineering is about building confidence in your system and your organization. • Serverless introduces new challenges for Chaos Engineering. • You can do it!
  22. ServerlessDays Milano 2019 @gunnargrosch Do you want more? • Follow

    @serverlesschaos on Twitter • Chaos Engineering Slack Community: bit.ly/chaos-eng-slack • Chaos Engineering Google Group: https://groups.google.com/forum/#!forum/chaos-community • List of awesome Chaos Engineering resources: https://github.com/dastergon/awesome-chaos-engineering/ • Yan Cui’s latency injection demo: https://github.com/theburningmonk/lambda-latency-injection-demo • Adrian Hornsby’s latency injection layer: https://github.com/adhorn/LatencyInjectionLayer