Slide 1

Slide 1 text

@gunnargrosch Performing chaos in a serverless world Gunnar Grosch ServerlessDays Milano 2019

Slide 2

Slide 2 text

ServerlessDays Milano 2019 @gunnargrosch Chaos Engineering has been battle-tested for years using traditional infrastructure and containerized microservices, but how does it work with serverless functions and managed services?

Slide 3

Slide 3 text

ServerlessDays Milano 2019 @gunnargrosch What we’ll cover What is Chaos Engineering? Running chaos experiments Challenges when using Chaos Engineering for serverless Serverless chaos experiments

Slide 4

Slide 4 text

ServerlessDays Milano 2019 @gunnargrosch A resilient system is a highly available and durable system that can maintain an acceptable level of service in the face of failure.

Slide 5

Slide 5 text

ServerlessDays Milano 2019 @gunnargrosch About me Evangelist and co-founder at Opsio Background in development and operations Organizer of AWS User Groups and Serverless Meetups ServerlessDays Stockholm and AWS Community Day Nordics organizer Father of three chaos monkeys

Slide 6

Slide 6 text

ServerlessDays Milano 2019 @gunnargrosch What is Chaos Engineering?

Slide 7

Slide 7 text

ServerlessDays Milano 2019 @gunnargrosch Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production. principlesofchaos.org

Slide 8

Slide 8 text

ServerlessDays Milano 2019 @gunnargrosch Chaos Engineering is not about breaking things

Slide 9

Slide 9 text

ServerlessDays Milano 2019 @gunnargrosch Chaos Engineering is about performing controlled experiments to inject failures

Slide 10

Slide 10 text

ServerlessDays Milano 2019 @gunnargrosch Chaos Engineering is about finding the weaknesses in a system and fixing them before they break

Slide 11

Slide 11 text

ServerlessDays Milano 2019 @gunnargrosch Chaos Engineering is about building confidence in your system and in your organization

Slide 12

Slide 12 text

ServerlessDays Milano 2019 @gunnargrosch Source: HDMI No Signal To display Help, press the ? button

Slide 13

Slide 13 text

ServerlessDays Milano 2019 @gunnargrosch “Everything fails, all the time!” Werner Vogels, CTO Amazon Source: HDMI No Signal To display Help, press the ? button

Slide 14

Slide 14 text

ServerlessDays Milano 2019 @gunnargrosch Don’t ask what happens if a system fails, but ask what happens when it fails.

Slide 15

Slide 15 text

ServerlessDays Milano 2019 @gunnargrosch Running chaos experiments

Slide 16

Slide 16 text

ServerlessDays Milano 2019 @gunnargrosch Why run experiments? Are your customers getting the experience they should? Is downtime or issues costing you money? Are you confident in your monitoring and alerting? Is your organization ready to handle outages?

Slide 17

Slide 17 text

ServerlessDays Milano 2019 @gunnargrosch Step 1: Define steady state The normal behavior of a system over time System metrics and business metrics Steady state is not necessarily continuous Business metrics are usually more useful

Slide 18

Slide 18 text

ServerlessDays Milano 2019 @gunnargrosch Step 2: Form your hypothesis Chaos can be injected at any layer in the stack Use what if:s Always fix known problems first!

Slide 19

Slide 19 text

ServerlessDays Milano 2019 @gunnargrosch Step 3: Plan and run your experiment Whiteboard the experiment in detail Contain the blast radius Notify the organization Make sure to have a ”stop” button

Slide 20

Slide 20 text

ServerlessDays Milano 2019 @gunnargrosch Step 4: Measure and learn Use metrics to prove or disprove the hypothesis Was the system resilient to the injected failure? Did anything unexpected happen? Share your progress and success!

Slide 21

Slide 21 text

ServerlessDays Milano 2019 @gunnargrosch Step 5: Scale up or abort and fix With confidence you can scale-up Increased scope can reveal new effects

Slide 22

Slide 22 text

ServerlessDays Milano 2019 @gunnargrosch When do we get to the serverless part?

Slide 23

Slide 23 text

ServerlessDays Milano 2019 @gunnargrosch Serverless means new challenges No servers to manage Less heavy lifting Lots of services Per function configuration More granular architectures

Slide 24

Slide 24 text

ServerlessDays Milano 2019 @gunnargrosch Common serverless weaknesses Missing error handling Wrong timeout values Missing fallback Missing regional failover

Slide 25

Slide 25 text

ServerlessDays Milano 2019 @gunnargrosch Serverless chaos experiments

Slide 26

Slide 26 text

ServerlessDays Milano 2019 @gunnargrosch Serverless Chaos Demo app

Slide 27

Slide 27 text

ServerlessDays Milano 2019 @gunnargrosch Error injection • Inject errors in your code • One in X requests throws an error • Turn on and off using parameter or variable • Alter the concurrency of your functions • Restrict the capacity of your DynamoDB table • Add configuration errors • Security policies • CORS configuration

Slide 28

Slide 28 text

ServerlessDays Milano 2019 @gunnargrosch Latency injection • Add latency to your functions • Cold starts • Cloud provider issues • Runtime or code issues • Integration issues • Timeouts • Yan Cui wrote an article and published sample code. • Adrian Hornsby built a Lambda Layer around these ideas.

Slide 29

Slide 29 text

ServerlessDays Milano 2019 @gunnargrosch Latency injection • What if my functions take X ms extra for each invocation? • What if timeouts occur? • Hypothesis: My app can handle that latency is injected on a function level. • Let’s do it!

Slide 30

Slide 30 text

ServerlessDays Milano 2019 @gunnargrosch Sample tools Gremlin gremlin.com Chaos Toolkit chaostoolkit.org Thundra thundra.io Build Your Own 127.0.0.1

Slide 31

Slide 31 text

ServerlessDays Milano 2019 @gunnargrosch Summary • Chaos Engineering is not about breaking things • Chaos Engineering is about building confidence in your system and your organization. • Serverless introduces new challenges for Chaos Engineering. • You can do it!

Slide 32

Slide 32 text

ServerlessDays Milano 2019 @gunnargrosch Do you want more? • Follow @serverlesschaos on Twitter • Chaos Engineering Slack Community: bit.ly/chaos-eng-slack • Chaos Engineering Google Group: https://groups.google.com/forum/#!forum/chaos-community • List of awesome Chaos Engineering resources: https://github.com/dastergon/awesome-chaos-engineering/ • Yan Cui’s latency injection demo: https://github.com/theburningmonk/lambda-latency-injection-demo • Adrian Hornsby’s latency injection layer: https://github.com/adhorn/LatencyInjectionLayer

Slide 33

Slide 33 text

ServerlessDays Milano 2019 @gunnargrosch Inspiration

Slide 34

Slide 34 text

@serverlesschaos #chaosengineering