$30 off During Our Annual Pro Sale. View Details »

Distributed Tracing FAQ

Distributed Tracing FAQ

Microservices, containers and more in general distributed systems have opened a different point of view on our system and applications. We need to understand how a single event or requests cross our app jumping over networks, containers, virtual machines and sometime clod provider. There is a specific practice called distributed tracing to increase observability of systems like that. After this talk, you will have a solid idea around what tracing means, how you can instrument your applications and you will be ready to trace your application across many languages using open source technologies like OpenTracing, OpenCensus, Zipkin, Jaeger, InfluxDB. You will ask yourself how you survived until today!

Gianluca Arbezzano

November 06, 2018
Tweet

More Decks by Gianluca Arbezzano

Other Decks in Technology

Transcript

  1. © 2017 InfluxData. All rights reserved.
    1
    Distributed Tracing
    Frequently Asked Questions
    @gianarb

    View Slide

  2. © 2017 InfluxData. All rights reserved.
    2
    Gianluca Arbezzano
    Site Reliability Engineer @InfluxData
    ● https://gianarb.it
    ● @gianarb
    What I like:
    ● I make dirty hacks that look awesome
    ● I grow my vegetables
    ● Travel for fun and work

    View Slide

  3. © 2017 InfluxData. All rights reserved.
    3
    github.com/influxdata

    View Slide

  4. Why do I need distributed
    tracing?

    View Slide

  5. © 2017 InfluxData. All rights reserved.
    5

    View Slide

  6. © 2017 InfluxData. All rights reserved.
    6
    It is a way to describe the
    distribution’s complexity

    View Slide

  7. © 2017 InfluxData. All rights reserved.
    7
    In practice it is a different
    aggregation for the well-known
    logs and stats.

    View Slide

  8. © 2017 InfluxData. All rights reserved.
    8
    To tell the story of our
    distributed system

    View Slide

  9. How a trace looks like?

    View Slide

  10. © 2017 InfluxData. All rights reserved.
    10
    A span is the smallest unit in
    a trace.

    View Slide

  11. © 2017 InfluxData. All rights reserved.
    11
    It describes a single action executed by a program:
    ● A single HTTP request.
    ● A database query.
    ● A message execution in a queue system.
    ● A lookup from a key/value store.

    View Slide

  12. © 2017 InfluxData. All rights reserved.
    12
    A span is described via:
    ● span_id the unique identifier in a trace
    ● trace_id to determine its trace
    ● parent_id to describe a hierarchy
    ● labels a set of key/value pairs
    ● Span Context is a set of value that will be propagated in
    the trace
    ● Logs

    View Slide

  13. © 2017 InfluxData. All rights reserved.
    13
    post: /users
    handle.create_user
    user_exists
    insert_user
    send_email
    nginx
    sA
    mysql
    mysql
    worker
    A single trace

    View Slide

  14. © 2017 InfluxData. All rights reserved.
    14
    post: /users
    handle.create_user
    user_exists
    nginx
    sA
    mysql
    mysql
    sA
    Service Name: mysql
    Trace ID: 34ytsy5hs45gs46hs5g
    Span ID: se5hs5s5hs45gs45gs
    Span Name: user_exists
    Duration: 1.2s
    Start: 56467657457234
    Logs:
    query: “select * from tb_user where id =
    345”
    user: sa_service

    View Slide

  15. How do I follow a
    request?

    View Slide

  16. © 2017 InfluxData. All rights reserved.
    16
    The implementation changes based on what you are instrumenting
    ¨ To instrument HTTP services the solution is via HEADER
    ¨ Same for grpc
    ¨ For queue system you can pass it as part of the message payload

    View Slide

  17. © 2017 InfluxData. All rights reserved.
    17
    B3-Propagation
    https://github.com/openzipkin/b3-propagation
    X-B3-TraceId: 80f198ee56343ba864fe8b2a57d3eff7
    X-B3-ParentSpanId: 05e3ac9a4f6e3b90
    X-B3-SpanId: e457b5a2e4d86bd1
    X-B3-Sampled: 1

    View Slide

  18. Do I need a standard
    for tracing?

    View Slide

  19. © 2017 InfluxData. All rights reserved.
    19
    YES

    View Slide

  20. © 2017 InfluxData. All rights reserved.
    20
    1. Applications can be written using different languages but at the end you need to
    build one single trace. It means that they need to agree on a common
    standard/protocol.
    2. If you use a widely supported standard you can avoid vendor lock-in.

    View Slide

  21. © 2017 InfluxData. All rights reserved.
    21

    View Slide

  22. © 2017 InfluxData. All rights reserved.
    22
    log log log
    log log
    log
    Parent Span Span Context / Baggage
    Child
    Child
    Child Span
    ¨ Spans - Basic unit of timing and causality. Can be tagged with
    key/value pairs.
    ¨ Logs - Structured data recorded on a span.
    ¨ Span Context - serializable format for linking spans across network
    boundaries. Carries baggage, such as a request and client IDs.
    ¨ Tracers - Anything that plugs into the OpenTracing API to record
    information.
    ¨ ZipKin, Jaeger, LightStep, others
    ¨ Also metrics (Prometheus) and logging

    View Slide

  23. © 2017 InfluxData. All rights reserved.
    23
    1.5 year old!
    Tracer implementations: Zipkin, Jaeger, LightStep, SkyWalking, others
    All sorts of companies use OpenTracing:

    View Slide

  24. © 2017 InfluxData. All rights reserved.
    24
    Rapidly growing OSS and vendor adoption
    JDBI
    Java Webservlet
    Jaxr

    View Slide

  25. © 2017 InfluxData. All rights reserved.
    25
    import "github.com/opentracing/opentracing-go"
    import ".../some_tracing_impl"
    func main() {
    opentracing.SetGlobalTracer(
    // tracing impl specific:
    some_tracing_impl.New(...),
    )
    ...
    }
    https://github.com/opentracing/opentracing-go
    Opentracing: Configure the GlobalTracer

    View Slide

  26. © 2017 InfluxData. All rights reserved.
    26
    func xyz(ctx context.Context, ...) {
    ...
    span, ctx := opentracing.StartSpanFromContext(ctx, "operation_name")
    defer span.Finish()
    span.LogFields(
    log.String("event", "soft error"),
    log.String("type", "cache timeout"),
    log.Int("waited.millis", 1500))
    ...
    }
    https://github.com/opentracing/opentracing-go
    Opentracing: Create a Span from the Context

    View Slide

  27. © 2017 InfluxData. All rights reserved.
    27
    func xyz(parentSpan opentracing.Span, ...) {
    ...
    sp := opentracing.StartSpan(
    "operation_name",
    opentracing.ChildOf(parentSpan.Context()))
    defer sp.Finish()
    ...
    }
    https://github.com/opentracing/opentracing-go
    Opentracing: Create a Child Span

    View Slide

  28. © 2017 InfluxData. All rights reserved.
    28
    OpenCensus: instrumentation spec and libraries by Google
    Common
    Interface to get
    stats and
    traces from
    your apps
    Different
    exporters to
    persist your
    data

    View Slide

  29. How a tracing infrastructure
    looks?

    View Slide

  30. © 2017 InfluxData. All rights reserved.
    30
    OpenTracing
    API
    application logic
    µ-service frameworks
    Lambda functions
    RPC & control-flow frameworks
    existing instrumentation
    tracing infrastructure
    main()
    I N S T A N A
    J a e g e r
    microservice process

    View Slide

  31. Can I have a tracing
    infrastructure on-prem?

    View Slide

  32. © 2017 InfluxData. All rights reserved.
    32
    There are different Open Source alternatives:
    ¨ Zipkin
    ¨ Java
    ¨ Sponsored by Twitter
    ¨ Supported backend: ElasticSearch, MySQL, Cassandra
    ¨ Jaeger
    ¨ Go
    ¨ Sponsored by Uber and part of the CNCF
    ¨ Supported backend: ElasticSearch, Cassandra

    View Slide

  33. There are as a service tracing
    infrastructure?

    View Slide

  34. © 2017 InfluxData. All rights reserved.
    34
    ¨ NewRelic
    ¨ Honeycomb
    ¨ LightSteps
    ¨ AWS X-Ray
    ¨ Google Stack Driver

    View Slide

  35. Can I store traces everywhere?

    View Slide

  36. © 2017 InfluxData. All rights reserved.
    36
    Short answer YES.
    At your own risk…
    ¨ Really high cardinality
    ¨ High write throughput
    Probably databases like InfluxDB, Cassandra, MongoDB are a better option compared with
    MySQL, Postgres but it always depends on traffic and amount of data.

    View Slide

  37. © 2017 InfluxData. All rights reserved.
    37
    Reach out:
    @gianarb
    [email protected]
    Any question?

    View Slide