Slide 1

Slide 1 text

@gunnargrosch Performing chaos in a serverless world Gunnar Grosch Stockholm Serverless Meetup

Slide 2

Slide 2 text

Stockholm Serverless Meetup @gunnargrosch Chaos Engineering has been battle-tested for years using traditional infrastructure and containerized microservices, but how does it work with serverless functions and managed services?

Slide 3

Slide 3 text

Stockholm Serverless Meetup @gunnargrosch What we’ll cover What is Chaos Engineering? Running chaos experiments Challenges when using Chaos Engineering for serverless Serverless chaos experiments

Slide 4

Slide 4 text

Stockholm Serverless Meetup @gunnargrosch A resilient system is a highly available and durable system that can maintain an acceptable level of service in the face of failure.

Slide 5

Slide 5 text

Stockholm Serverless Meetup @gunnargrosch About me Evangelist and co-founder at Opsio Background in development and operations Organizer of AWS User Groups and AWS Community Day Nordics ServerlessDays Stockholm and Serverless Meetups organizer Father of three chaos monkeys

Slide 6

Slide 6 text

Stockholm Serverless Meetup @gunnargrosch What is Chaos Engineering?

Slide 7

Slide 7 text

Stockholm Serverless Meetup @gunnargrosch Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production. principlesofchaos.org

Slide 8

Slide 8 text

Stockholm Serverless Meetup @gunnargrosch Chaos Engineering is not about breaking things

Slide 9

Slide 9 text

Stockholm Serverless Meetup @gunnargrosch Chaos Engineering is about performing controlled experiments to inject failures

Slide 10

Slide 10 text

Stockholm Serverless Meetup @gunnargrosch Chaos Engineering is about finding the weaknesses in a system and fixing them before they break

Slide 11

Slide 11 text

Stockholm Serverless Meetup @gunnargrosch Chaos Engineering is about building confidence in your system and in your organization

Slide 12

Slide 12 text

Stockholm Serverless Meetup @gunnargrosch Source: HDMI No Signal To display Help, press the ? button

Slide 13

Slide 13 text

Stockholm Serverless Meetup @gunnargrosch “Everything fails, all the time!” Werner Vogels, CTO Amazon Source: HDMI No Signal To display Help, press the ? button

Slide 14

Slide 14 text

Stockholm Serverless Meetup @gunnargrosch Don’t ask what happens if a system fails, but ask what happens when it fails.

Slide 15

Slide 15 text

Stockholm Serverless Meetup @gunnargrosch Running chaos experiments

Slide 16

Slide 16 text

Stockholm Serverless Meetup @gunnargrosch Why run experiments? Are your customers getting the experience they should? Is downtime or issues costing you money? Are you confident in your monitoring and alerting? Is your organization ready to handle outages?

Slide 17

Slide 17 text

Stockholm Serverless Meetup @gunnargrosch Step 1: Define steady state The normal behavior of a system over time System metrics and business metrics Steady state is not necessarily continuous Business metrics are usually more useful

Slide 18

Slide 18 text

Stockholm Serverless Meetup @gunnargrosch Step 2: Form your hypothesis Chaos can be injected at any layer in the stack Use what if:s Always fix known problems first!

Slide 19

Slide 19 text

Stockholm Serverless Meetup @gunnargrosch Step 3: Plan and run your experiment Whiteboard the experiment in detail Contain the blast radius Notify the organization Make sure to have a ”stop” button

Slide 20

Slide 20 text

Stockholm Serverless Meetup @gunnargrosch Step 4: Measure and learn Use metrics to prove or disprove the hypothesis Was the system resilient to the injected failure? Did anything unexpected happen? Share your progress and success!

Slide 21

Slide 21 text

Stockholm Serverless Meetup @gunnargrosch Step 5: Scale up or abort and fix With confidence you can scale-up Increased scope can reveal new effects

Slide 22

Slide 22 text

Stockholm Serverless Meetup @gunnargrosch When do we get to the serverless part?

Slide 23

Slide 23 text

Stockholm Serverless Meetup @gunnargrosch Serverless means new challenges No servers to manage Less heavy lifting Lots of services Per function configuration More granular architectures

Slide 24

Slide 24 text

Stockholm Serverless Meetup @gunnargrosch Common serverless weaknesses Missing error handling Wrong timeout values Missing fallback Missing regional failover

Slide 25

Slide 25 text

Stockholm Serverless Meetup @gunnargrosch Serverless chaos experiments

Slide 26

Slide 26 text

Stockholm Serverless Meetup @gunnargrosch Serverless chaos experiments • Inject errors in your code • One in X requests throws an error • Turn on and off using parameter or variable • Remove downstream services • Alter the concurrency of your functions • Restrict the capacity of your DynamoDB table • Add configuration errors • Security policies • CORS configuration

Slide 27

Slide 27 text

Stockholm Serverless Meetup @gunnargrosch Serverless chaos experiments • Add latency to your functions • Cold starts • Cloud provider issues • Runtime or code issues • Integration issues • Timeouts • Yan Cui wrote an article and published sample code. • Adrian Hornsby built a Lambda Layer around these ideas.

Slide 28

Slide 28 text

Stockholm Serverless Meetup @gunnargrosch Tools for serverless chaos experiments Gremlin gremlin.com Chaos Toolkit chaostoolkit.org Thundra thundra.io Build Your Own 127.0.0.1

Slide 29

Slide 29 text

Stockholm Serverless Meetup @gunnargrosch Latency injection

Slide 30

Slide 30 text

Stockholm Serverless Meetup @gunnargrosch Status code injection

Slide 31

Slide 31 text

Stockholm Serverless Meetup @gunnargrosch Exception injection

Slide 32

Slide 32 text

Stockholm Serverless Meetup @gunnargrosch Parameter control

Slide 33

Slide 33 text

Stockholm Serverless Meetup @gunnargrosch Serverless Chaos Demo app

Slide 34

Slide 34 text

Stockholm Serverless Meetup @gunnargrosch Serverless Chaos Demo app

Slide 35

Slide 35 text

Stockholm Serverless Meetup @gunnargrosch Failure injection • What if my function take 300 ms extra for each invocation? • What if my function returns an error code? • What if there is an exception in the code? • Hypothesis: My app can handle that failure is injected on a function level. • Let’s do it!

Slide 36

Slide 36 text

Stockholm Serverless Meetup @gunnargrosch Summary • Everything fails, all the time. • A resilient system maintains an acceptable level of service in the face of failure. • Chaos Engineering is about building confidence in your system and your organization. • Serverless introduces new challenges for Chaos Engineering. • Design the smallest possible experiment to test the system without causing an outage. • Understand how failure plays out, then scale it up as confidence in the system grows. • You can do it!

Slide 37

Slide 37 text

Stockholm Serverless Meetup @gunnargrosch Inspiration

Slide 38

Slide 38 text

Stockholm Serverless Meetup @gunnargrosch Do you want more? • Follow @serverlesschaos on Twitter • Try the Serverless Chaos Demo app: https://demo.serverlesschaos.com • YouTube videos and repositories: https://grosch.se • Chaos Engineering Slack Community: bit.ly/chaos-eng-slack • Chaos Engineering Google Group: https://groups.google.com/forum/#!forum/chaos-community • List of awesome Chaos Engineering resources: https://github.com/dastergon/awesome-chaos-engineering/

Slide 39

Slide 39 text

@serverlesschaos #chaosengineering