Upgrade to Pro — share decks privately, control downloads, hide ads and more …

o11yfest - Sampling in Distributed Tracing

o11yfest - Sampling in Distributed Tracing

Sampling is still one of the biggest challenges in distributed tracing. While the basic concept is easy to grasp, the number of choices and their trade-offs requires learning about the techniques and your own workload. In this session, we are giving you all the knowledge required to master the sampling techniques: we’ll talk about head and tail-based sampling, as well as adaptive sampling, and we’ll wrap it up with a bonus discussion on trace aggregation. You’ll leave this session ready to implement scenarios, from the simple “probabilistic head-sampling” up to the complex “scalable tail-based sampling” using open source tools like OpenTelemetry Collector.

Register and watch this talk now! https://o11yfest.org/attend

More Decks by Juraci Paixão Kröhling

Other Decks in Programming

Transcript

  1. Sampling in
    Distributed Tracing
    Juraci Paixão Kröhling
    Software Engineer
    @jpkrohling

    View full-size slide

  2. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    2
    Hi 󰗜, I’m Juraci!
    Software Engineer @ Red Hat, distributed
    tracing team
    Maintainer on the Jaeger project
    Member of the OpenTelemetry project

    View full-size slide

  3. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    3
    Agenda
    Why and what
    Heads and tails
    Other related ideas

    View full-size slide

  4. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    4
    Distributed tracing
    Produces a high-fidelity signal.

    View full-size slide

  5. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    5
    Distributed tracing

    View full-size slide

  6. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    6
    Distributed tracing

    View full-size slide

  7. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    7
    Distributed tracing

    View full-size slide

  8. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    8
    Distributed tracing

    View full-size slide

  9. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    9
    Distributed tracing

    View full-size slide

  10. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    10
    Distributed tracing
    Beyond a point, it’s not feasible to
    store metadata for every transaction.

    View full-size slide

  11. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    11
    Distributed tracing
    Beyond a point, it’s not feasible to
    store metadata for every transaction.
    (except if you are an intelligence agency)

    View full-size slide

  12. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    12
    Distributed tracing
    Am I able to manage all this data?
    Do I need to keep them all?

    View full-size slide

  13. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    13
    Distributed tracing
    Keeping everything allows you to
    perform better data analysis,
    potentially at a later time.

    View full-size slide

  14. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    14
    Distributed tracing
    Keeping everything costs money,
    makes maintenance harder, and might
    be useless.

    View full-size slide

  15. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    15
    Sampling
    The decision to capture or discard a
    specific trace.

    View full-size slide

  16. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    16
    Sampling
    Head-based
    Tail-based

    View full-size slide

  17. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    17
    Head-based sampling

    View full-size slide

  18. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    18
    Head-based sampling

    View full-size slide

  19. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    19
    Head-based sampling
    Constant (always or never)
    Probabilistic (chance of 1 in N)
    Rate-limiting (N per second)

    View full-size slide

  20. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    20
    Head-based sampling
    Good when network costs are a
    concern.

    View full-size slide

  21. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    21
    Head-based sampling
    Downsides:
    Valuable traces are not recorded
    Historical data analytics usually
    not possible

    View full-size slide

  22. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    22
    Tail-based sampling
    The decision is made when the trace is
    complete.

    View full-size slide

  23. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    23
    Tail-based sampling
    The decision is made when the trace is
    complete.
    (do we know when a trace is complete?)

    View full-size slide

  24. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    24
    Tail-based sampling

    View full-size slide

  25. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    25
    Tail-based sampling
    How long should we wait?
    Decide based on which attribute?
    How many resources does it need?

    View full-size slide

  26. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    26
    Tail-based sampling
    How to scale?

    View full-size slide

  27. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    27
    Tail-based sampling

    View full-size slide

  28. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    28
    Tail-based sampling

    View full-size slide

  29. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    29
    Tail-based sampling

    View full-size slide

  30. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    30
    Tail-based sampling
    With OpenTelemetry Collector:
    loadbalancingexporter
    tailsamplingprocessor

    View full-size slide

  31. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    31
    Tail-based sampling
    Good when we need to select only a
    few interesting traces.

    View full-size slide

  32. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    32
    Tail-based sampling
    Downsides:
    Need to define “interesting”
    More complex to maintain
    Limited data analysis

    View full-size slide

  33. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    33
    Stateless Collector Sampling
    The collector is responsible for
    downsampling, typically without
    having a complete view of the trace.

    View full-size slide

  34. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    34
    Adaptive sampling
    The technique of changing the
    sampling strategy based on the traced
    application’s current behavior.

    View full-size slide

  35. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    35
    Adaptive sampling
    Typically used to ensure that all
    endpoints in a given service are
    sampled.

    View full-size slide

  36. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    36
    Remote sampling strategy

    View full-size slide

  37. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    37
    Going back a bit...
    We use sampling mostly to reduce
    network traffic and storage
    requirements.

    View full-size slide

  38. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    38
    Trace aggregation

    View full-size slide

  39. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    39
    Trace aggregation

    View full-size slide

  40. o11yfest - Sampling in Distributed Tracing
    @jpkrohling
    40
    Trace aggregation

    View full-size slide

  41. 41
    twitter.com/jpkrohling
    Photos from Pixabay and Pexels:
    Agenda, Distributed Tracing, Sampling, Head-based, Tail-based, Stateless collector sampling,
    Adaptive sampling, Remote sampling, Aggregation
    Thank you

    View full-size slide