Sampling in
Distributed Tracing
Juraci Paixão Kröhling
Software Engineer
@jpkrohling
Slide 2
Slide 2 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
2
Hi , I’m Juraci!
Software Engineer @ Red Hat, distributed
tracing team
Maintainer on the Jaeger project
Member of the OpenTelemetry project
Slide 3
Slide 3 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
3
Agenda
Why and what
Heads and tails
Other related ideas
Slide 4
Slide 4 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
4
Distributed tracing
Produces a high-fidelity signal.
Slide 5
Slide 5 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
5
Distributed tracing
Slide 6
Slide 6 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
6
Distributed tracing
Slide 7
Slide 7 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
7
Distributed tracing
Slide 8
Slide 8 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
8
Distributed tracing
Slide 9
Slide 9 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
9
Distributed tracing
Slide 10
Slide 10 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
10
Distributed tracing
Beyond a point, it’s not feasible to
store metadata for every transaction.
Slide 11
Slide 11 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
11
Distributed tracing
Beyond a point, it’s not feasible to
store metadata for every transaction.
(except if you are an intelligence agency)
Slide 12
Slide 12 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
12
Distributed tracing
Am I able to manage all this data?
Do I need to keep them all?
Slide 13
Slide 13 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
13
Distributed tracing
Keeping everything allows you to
perform better data analysis,
potentially at a later time.
Slide 14
Slide 14 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
14
Distributed tracing
Keeping everything costs money,
makes maintenance harder, and might
be useless.
Slide 15
Slide 15 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
15
Sampling
The decision to capture or discard a
specific trace.
o11yfest - Sampling in Distributed Tracing
@jpkrohling
17
Head-based sampling
Slide 18
Slide 18 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
18
Head-based sampling
Slide 19
Slide 19 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
19
Head-based sampling
Constant (always or never)
Probabilistic (chance of 1 in N)
Rate-limiting (N per second)
Slide 20
Slide 20 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
20
Head-based sampling
Good when network costs are a
concern.
Slide 21
Slide 21 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
21
Head-based sampling
Downsides:
Valuable traces are not recorded
Historical data analytics usually
not possible
Slide 22
Slide 22 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
22
Tail-based sampling
The decision is made when the trace is
complete.
Slide 23
Slide 23 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
23
Tail-based sampling
The decision is made when the trace is
complete.
(do we know when a trace is complete?)
Slide 24
Slide 24 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
24
Tail-based sampling
Slide 25
Slide 25 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
25
Tail-based sampling
How long should we wait?
Decide based on which attribute?
How many resources does it need?
Slide 26
Slide 26 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
26
Tail-based sampling
How to scale?
Slide 27
Slide 27 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
27
Tail-based sampling
Slide 28
Slide 28 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
28
Tail-based sampling
Slide 29
Slide 29 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
29
Tail-based sampling
Slide 30
Slide 30 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
30
Tail-based sampling
With OpenTelemetry Collector:
loadbalancingexporter
tailsamplingprocessor
Slide 31
Slide 31 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
31
Tail-based sampling
Good when we need to select only a
few interesting traces.
Slide 32
Slide 32 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
32
Tail-based sampling
Downsides:
Need to define “interesting”
More complex to maintain
Limited data analysis
Slide 33
Slide 33 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
33
Stateless Collector Sampling
The collector is responsible for
downsampling, typically without
having a complete view of the trace.
Slide 34
Slide 34 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
34
Adaptive sampling
The technique of changing the
sampling strategy based on the traced
application’s current behavior.
Slide 35
Slide 35 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
35
Adaptive sampling
Typically used to ensure that all
endpoints in a given service are
sampled.
o11yfest - Sampling in Distributed Tracing
@jpkrohling
37
Going back a bit...
We use sampling mostly to reduce
network traffic and storage
requirements.
Slide 38
Slide 38 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
38
Trace aggregation
Slide 39
Slide 39 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
39
Trace aggregation
Slide 40
Slide 40 text
o11yfest - Sampling in Distributed Tracing
@jpkrohling
40
Trace aggregation
Slide 41
Slide 41 text
41
twitter.com/jpkrohling
Photos from Pixabay and Pexels:
Agenda, Distributed Tracing, Sampling, Head-based, Tail-based, Stateless collector sampling,
Adaptive sampling, Remote sampling, Aggregation
Thank you