Slide 1

Slide 1 text

Monitoring serverless applications - should you worry? Nitzan Shapira, Epsagon

Slide 2

Slide 2 text

> whoami Nitzan Shapira (@nitzanshapira) Co-Founder, CEO @ Epsagon Tel Aviv Software Engineer > 12 years

Slide 3

Slide 3 text

Monitoring/Observability - why do we need it? Track service health Troubleshoot and fix Optimize performance/cost

Slide 4

Slide 4 text

From a monolith…

Slide 5

Slide 5 text

To microservices

Slide 6

Slide 6 text

Let’s talk Serverless

Slide 7

Slide 7 text

Serverless is great! Pay per use Autoscaling Development velocity

Slide 8

Slide 8 text

The era of APIs We want managed resources Applications become Highly distributed Highly event-driven
 
 Without access to any server!

Slide 9

Slide 9 text

Back to Monitoring/Observability Track service health Troubleshoot and fix Optimize performance/cost

Slide 10

Slide 10 text

Slow down! Let’s go one by one…

Slide 11

Slide 11 text

Track system health System == Functions ?

Slide 12

Slide 12 text

Functions are important Timeout Out of memory Cold start Unique challenges to Serverless

Slide 13

Slide 13 text

Functions are important

Slide 14

Slide 14 text

Track system health System > Functions ! Functions APIs Transactions

Slide 15

Slide 15 text

Serverless != Functions theburningmonk.com

Slide 16

Slide 16 text

Troubleshoot and fix Functions are not enough Need: track asynchronous events

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Transactions

Slide 19

Slide 19 text

Tracing Asynchronous Invocations

Slide 20

Slide 20 text

Tracing Asynchronous Invocations

Slide 21

Slide 21 text

Tracing Asynchronous Invocations

Slide 22

Slide 22 text

Distributed Tracing

Slide 23

Slide 23 text

Distributed Tracing

Slide 24

Slide 24 text

Implementing Distributed Tracing Manual instrumentation •Before/after calls •At the end of each micro service •High maintenance •High potential of errors

Slide 25

Slide 25 text

Serverless apps are very distributed •Complex systems have thousands of functions •What about the developer velocity?

Slide 26

Slide 26 text

Can it be done differently in serverless?

Slide 27

Slide 27 text

Automation can help to keep up with the development speed of serverless

Slide 28

Slide 28 text

Example

Slide 29

Slide 29 text

Troubleshoot and fix

Slide 30

Slide 30 text

Monitoring serverless Stateless Limited running time Limited memory Coldstarts

Slide 31

Slide 31 text

In Serverless Time is Money

Slide 32

Slide 32 text

How much time do you really spend? Our own code API calls Infrastructure overhead

Slide 33

Slide 33 text

Let’s have a quick look 702ms

Slide 34

Slide 34 text

A real-life example $$$$$$$$$$$$$$$$ How it started…

Slide 35

Slide 35 text

Scanning functions – the easy way Scanning CloudWatch logs using AWS Lambda – every 5 minutes, save to RDS A new Lambda is spawned for every customer’s function (async) Sounds simple and fun! Poll Spawn (async) CloudWatch

Slide 36

Slide 36 text

As time flies… CloudWatch became highly throttled ➔ requests took a very long time ➔ 5K concurrent Lambdas, for 5 minutes, every 5 minutes !!!!

Slide 37

Slide 37 text

Track service health

Slide 38

Slide 38 text

Is that all we need? Probably not

Slide 39

Slide 39 text

Observability

Slide 40

Slide 40 text

Dynamic Service Map

Slide 41

Slide 41 text

Business Flows Subscribe Transfer Payment

Slide 42

Slide 42 text

Business Flows

Slide 43

Slide 43 text

Thank you! [email protected] @nitzanshapira