KubeCon + CloudNativeCon
Seattle
Distributed Tracing in
Serverless Systems
Nitzan Shapira, Epsagon
Slide 2
Slide 2 text
Nitzan Shapira (@nitzanshapira)
Software engineer > 12 years
Co-Founder, CEO at Epsagon
Tel Aviv
2
> whoami
Slide 3
Slide 3 text
What is serverless? How is it different?
What is observability for serverless?
How can distributed tracing help?
How will it help my job?
3
Things to discuss
Slide 4
Slide 4 text
4
[Compute-as-a-Service]
FaaS: Function-as-a-Service
CaaS: Container-as-a-Service
+
Managed services (APIs)
=
Don’t manage infrastructure
Focus on business logic
What is serverless?
Slide 5
Slide 5 text
5
Why serverless?
Pay-per-use: reduces cloud compute cost by 90%
Out-of-the-box auto-scaling
DevOps à LowOps
++Developer velocity
Focus on business logic – iterate faster
Server Utilization
Slide 6
Slide 6 text
6
The limitations of FaaS
Limited memory Limited running time
Cold starts
Stateless
+ concurrency limit
+ some others…
Slide 7
Slide 7 text
7
The properties of serverless applications
Serverless is micro-services
Serverless applications are
- Highly distributed
- Highly event-driven
Utilizing managed services via APIs is key
Slide 8
Slide 8 text
A real example – HSBC
8
Source: re:Invent 2018
Slide 9
Slide 9 text
9
The challenge in serverless
SIMPLE COMPLEX
Yan Cui
Slide 10
Slide 10 text
10
What the community thinks
2018 Serverless Community Survey, serverless.com, July 2018
2017 results
Slide 11
Slide 11 text
11
Observability – why do we need it?
Track system health Troubleshoot and fix Optimize performance and cost
Slide 12
Slide 12 text
12
Observability in serverless
Let’s go one by one
Slide 13
Slide 13 text
13
Track system health
System == Functions ?
Slide 14
Slide 14 text
14
Functions are important
- Errors
- Timeout
- Out-of-memory
- Cold start
Slide 15
Slide 15 text
15
Track system health
System > Functions !
Serverless != Functions
Slide 16
Slide 16 text
16
Track system health
System > Functions !
Functions
APIs
Transactions
Slide 17
Slide 17 text
17
Troubleshoot and fix
Functions are not enough
Need: track asynchronous events
e
Slide 18
Slide 18 text
18
Transactions
Slide 19
Slide 19 text
19
Tracing asynchronous invocations
Slide 20
Slide 20 text
20
Tracing asynchronous invocations
Slide 21
Slide 21 text
21
Tracing asynchronous invocations
Slide 22
Slide 22 text
22
Distributed tracing
…a trace tells the story of a transaction or
workflow as it propagates through a
(potentially distributed) system. Distributed
tracing is a method used to profile and
monitor applications.
Slide 23
Slide 23 text
23
Distributed tracing
Jaeger
Slide 24
Slide 24 text
24
Implementing distributed tracing
Manual tracing/instrumentation
Before/after calls
At the end of each micro-service
High maintenance
High potential of errors
Slide 25
Slide 25 text
25
Serverless apps are very distributed
Complex systems have thousands of functions
What about the developer velocity?
Slide 26
Slide 26 text
26
Can it be done differently in serverless?
Slide 27
Slide 27 text
27
Automation can help to keep up with the
development speed of serverless
32
Where do we spend the most time?
Our own code API calls
Slide 33
Slide 33 text
33
Serverless cost crisis
A real-life example
$$$$$$$$$$$$
Slide 34
Slide 34 text
34
Scanning functions
Scanning CloudWatch using AWS Lambda
Every 5 minutes, save to RDS
A new Lambda is spawned for every customer’s function
Poll
Spawn (async)
CloudWatch
Slide 35
Slide 35 text
35
As time flies…
CloudWatch became highly throttled
Requests took too much time
5K concurrent Lambdas, for 5 minutes,
timing out , every 5 minutes
!!!!
Slide 36
Slide 36 text
36
Why you should care about external APIs
702ms
e
Slide 37
Slide 37 text
37
Track service health
Slide 38
Slide 38 text
38
Business flows
Subscribe
Transfer Payment
Slide 39
Slide 39 text
39
What should I optimize first?
Slide 40
Slide 40 text
40
Remember…
Serverless + Distributed Tracing
=
Perfect marriage
(but only if you automate)