Slide 1

Slide 1 text

@gunnargrosch Performing chaos in a serverless world Gunnar Grosch Chaos Community Day London 2019

Slide 2

Slide 2 text

Chaos Community Day London 2019 @gunnargrosch Chaos Engineering has been battle-tested for years using traditional infrastructure and containerized microservices, but how does it work with serverless functions and managed services?

Slide 3

Slide 3 text

Chaos Community Day London 2019 @gunnargrosch What is serverless? What we’ll cover Challenges for serverless Serverless chaos experiments

Slide 4

Slide 4 text

Chaos Community Day London 2019 @gunnargrosch A resilient system is a highly available and durable system that can maintain an acceptable level of service in the face of failure.

Slide 5

Slide 5 text

Chaos Community Day London 2019 @gunnargrosch About me Evangelist and co-founder at Opsio Background in development and operations Organizer of AWS User Groups and AWS Community Day Nordics ServerlessDays Stockholm and Serverless Meetups organizer Father of three chaos monkeys

Slide 6

Slide 6 text

Chaos Community Day London 2019 @gunnargrosch What is serverless?

Slide 7

Slide 7 text

Chaos Community Day London 2019 @gunnargrosch “Serverless allows you to build and run applications and services without thinking about servers. It eliminates infrastructure management tasks such as server or cluster provisioning, patching, operating system maintenance, and capacity provisioning.” Amazon Web Services

Slide 8

Slide 8 text

Chaos Community Day London 2019 @gunnargrosch ”Stop calling everything serverless” Jeremy Daly

Slide 9

Slide 9 text

Chaos Community Day London 2019 @gunnargrosch Serverless is not an execution model

Slide 10

Slide 10 text

Chaos Community Day London 2019 @gunnargrosch Managed services are not serverless

Slide 11

Slide 11 text

Chaos Community Day London 2019 @gunnargrosch Serverless is not an operational construct or a spectrum

Slide 12

Slide 12 text

Chaos Community Day London 2019 @gunnargrosch Serverless is not a technology

Slide 13

Slide 13 text

Chaos Community Day London 2019 @gunnargrosch Serverless is a methodology

Slide 14

Slide 14 text

Chaos Community Day London 2019 @gunnargrosch

Slide 15

Slide 15 text

Chaos Community Day London 2019 @gunnargrosch Source: HDMI No Signal To display Help, press the ? button

Slide 16

Slide 16 text

Chaos Community Day London 2019 @gunnargrosch “Everything fails, all the time!” Werner Vogels, CTO Amazon Source: HDMI No Signal To display Help, press the ? button

Slide 17

Slide 17 text

Chaos Community Day London 2019 @gunnargrosch Don’t ask what happens if a system fails, but ask what happens when it fails.

Slide 18

Slide 18 text

Chaos Community Day London 2019 @gunnargrosch Challenges for serverless

Slide 19

Slide 19 text

Chaos Community Day London 2019 @gunnargrosch ”There are still servers in serverless” Bob Dylan

Slide 20

Slide 20 text

Chaos Community Day London 2019 @gunnargrosch Serverless is a higher level abstraction

Slide 21

Slide 21 text

Chaos Community Day London 2019 @gunnargrosch “… without thinking about servers.” Amazon Web Services

Slide 22

Slide 22 text

Chaos Community Day London 2019 @gunnargrosch Serverless doesn’t make your application resilient

Slide 23

Slide 23 text

Chaos Community Day London 2019 @gunnargrosch Pros are also cons No servers to manage Less heavy lifting Lots of services to choose from Per function configuration and security More granular architectures

Slide 24

Slide 24 text

Chaos Community Day London 2019 @gunnargrosch Common serverless weaknesses Missing error handling Wrong timeout values Missing fallback Missing regional failover

Slide 25

Slide 25 text

Chaos Community Day London 2019 @gunnargrosch Serverless chaos experiments

Slide 26

Slide 26 text

Chaos Community Day London 2019 @gunnargrosch Serverless chaos experiments • Inject errors in your code • One in X requests throws an error • Turn on and off using parameter or variable • Remove downstream services • Alter the concurrency of your functions • Restrict the capacity of your DynamoDB table • Add configuration errors • Security policies • CORS configuration • Function disk space failures

Slide 27

Slide 27 text

Chaos Community Day London 2019 @gunnargrosch Serverless chaos experiments • Add latency to your functions • Cold starts • Cloud provider issues • Runtime or code issues • Integration issues • Timeouts • Yan Cui wrote an article and published sample code. • Adrian Hornsby built a Lambda Layer around these ideas (and now a Python library).

Slide 28

Slide 28 text

Chaos Community Day London 2019 @gunnargrosch Tools for serverless chaos experiments Gremlin gremlin.com Chaos Toolkit chaostoolkit.org Thundra thundra.io Build Your Own 127.0.0.1

Slide 29

Slide 29 text

Chaos Community Day London 2019 @gunnargrosch Latency injection

Slide 30

Slide 30 text

Chaos Community Day London 2019 @gunnargrosch Status code injection

Slide 31

Slide 31 text

Chaos Community Day London 2019 @gunnargrosch Exception injection

Slide 32

Slide 32 text

Chaos Community Day London 2019 @gunnargrosch Parameter control

Slide 33

Slide 33 text

Chaos Community Day London 2019 @gunnargrosch ”This is cool but now on to what we normally do.” Most serverless devs

Slide 34

Slide 34 text

Chaos Community Day London 2019 @gunnargrosch Serverless Chaos Demo app

Slide 35

Slide 35 text

Chaos Community Day London 2019 @gunnargrosch Serverless Chaos Demo app

Slide 36

Slide 36 text

Chaos Community Day London 2019 @gunnargrosch Failure injection • What if my function take 300 ms extra for each invocation? • What if my function returns an error code? • What if there is an exception in the code? • Hypothesis: My app can handle that failure is injected on a function level. • Let’s do it!

Slide 37

Slide 37 text

Chaos Community Day London 2019 @gunnargrosch “Serverless is a perfect fit for chaos engineering.” Gunnar Grosch

Slide 38

Slide 38 text

Chaos Community Day London 2019 @gunnargrosch Summary • Everything fails, all the time. • A resilient system maintains an acceptable level of service in the face of failure. • Serverless has challenges but is a perfect fit for chaos engineering. • Focus your experiments closer to the user. • Use graceful degradation. • You can do it!

Slide 39

Slide 39 text

Chaos Community Day London 2019 @gunnargrosch Inspiration

Slide 40

Slide 40 text

Chaos Community Day London 2019 @gunnargrosch Do you want more? • Follow @serverlesschaos on Twitter • Try the Serverless Chaos Demo app: https://demo.serverlesschaos.com • YouTube videos and repositories: https://grosch.se • Yan Cui’s article on chaos engineering for Lambda: https://hackernoon.com/how-can-we-apply-the-principles-of-chaos-engineering-to-aws- lambda-80f87e3237e2 • Adrian Hornsby’s article on failure injection: https://medium.com/@adhorn/failure-injection-gain-confidence-in-your-serverless- application-ce6c0060f586

Slide 41

Slide 41 text

@gunnargrosch grosch.se