Monitoring serverless
applications - should you worry?
Nitzan Shapira, Epsagon
Slide 2
Slide 2 text
> whoami
Nitzan Shapira (@nitzanshapira)
Co-Founder, CEO @ Epsagon
Tel Aviv
Software Engineer > 12 years
Slide 3
Slide 3 text
Monitoring/Observability - why do we need it?
Track service health Troubleshoot and fix Optimize
performance/cost
Slide 4
Slide 4 text
From a monolith…
Slide 5
Slide 5 text
To microservices
Slide 6
Slide 6 text
Let’s talk Serverless
Slide 7
Slide 7 text
Serverless is great!
Pay per use Autoscaling Development velocity
Slide 8
Slide 8 text
The era of APIs
We want managed resources
Applications become
Highly distributed
Highly event-driven
Without access to any server!
Slide 9
Slide 9 text
Back to Monitoring/Observability
Track service health Troubleshoot and fix Optimize
performance/cost
Slide 10
Slide 10 text
Slow down!
Let’s go one by one…
Slide 11
Slide 11 text
Track system health
System == Functions ?
Slide 12
Slide 12 text
Functions are important
Timeout
Out of memory Cold start
Unique challenges to Serverless
Slide 13
Slide 13 text
Functions are important
Slide 14
Slide 14 text
Track system health
System > Functions !
Functions
APIs
Transactions
Slide 15
Slide 15 text
Serverless != Functions
theburningmonk.com
Slide 16
Slide 16 text
Troubleshoot and fix
Functions are not enough
Need: track asynchronous events
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
Transactions
Slide 19
Slide 19 text
Tracing Asynchronous Invocations
Slide 20
Slide 20 text
Tracing Asynchronous Invocations
Slide 21
Slide 21 text
Tracing Asynchronous Invocations
Slide 22
Slide 22 text
Distributed Tracing
Slide 23
Slide 23 text
Distributed Tracing
Slide 24
Slide 24 text
Implementing Distributed Tracing
Manual instrumentation
•Before/after calls
•At the end of each micro service
•High maintenance
•High potential of errors
Slide 25
Slide 25 text
Serverless apps are very distributed
•Complex systems have thousands of functions
•What about the developer velocity?
Slide 26
Slide 26 text
Can it be done
differently in serverless?
Slide 27
Slide 27 text
Automation can help to keep
up with the
development speed of
serverless
Slide 28
Slide 28 text
Example
Slide 29
Slide 29 text
Troubleshoot and fix
Slide 30
Slide 30 text
Monitoring serverless
Stateless
Limited running time
Limited memory
Coldstarts
Slide 31
Slide 31 text
In Serverless Time is Money
Slide 32
Slide 32 text
How much time do you really spend?
Our own code
API calls
Infrastructure overhead
Slide 33
Slide 33 text
Let’s have a quick look
702ms
Slide 34
Slide 34 text
A real-life example
$$$$$$$$$$$$$$$$
How it started…
Slide 35
Slide 35 text
Scanning functions – the easy way
Scanning CloudWatch logs using AWS Lambda – every 5
minutes, save to RDS
A new Lambda is spawned for every customer’s function
(async)
Sounds simple and fun!
Poll
Spawn (async)
CloudWatch
Slide 36
Slide 36 text
As time flies…
CloudWatch became highly throttled ➔
requests took a very long time ➔
5K concurrent Lambdas, for 5 minutes, every 5
minutes
!!!!