Distributed Tracing FAQ

Distributed Tracing FAQ

Microservices, containers and more in general distributed systems have opened a different point of view on our system and applications. We need to understand how a single event or requests cross our app jumping over networks, containers, virtual machines and sometime clod provider. There is a specific practice called distributed tracing to increase observability of systems like that. After this talk, you will have a solid idea around what tracing means, how you can instrument your applications and you will be ready to trace your application across many languages using open source technologies like OpenTracing, OpenCensus, Zipkin, Jaeger, InfluxDB. You will ask yourself how you survived until today!

Fa5fd3405808cc6a9fe4b126b1ec39bd?s=128

Gianluca Arbezzano

November 06, 2018
Tweet

Transcript

  1. © 2017 InfluxData. All rights reserved. 1 Distributed Tracing Frequently

    Asked Questions @gianarb
  2. © 2017 InfluxData. All rights reserved. 2 Gianluca Arbezzano Site

    Reliability Engineer @InfluxData • https://gianarb.it • @gianarb What I like: • I make dirty hacks that look awesome • I grow my vegetables • Travel for fun and work
  3. © 2017 InfluxData. All rights reserved. 3 github.com/influxdata

  4. Why do I need distributed tracing?

  5. © 2017 InfluxData. All rights reserved. 5

  6. © 2017 InfluxData. All rights reserved. 6 It is a

    way to describe the distribution’s complexity
  7. © 2017 InfluxData. All rights reserved. 7 In practice it

    is a different aggregation for the well-known logs and stats.
  8. © 2017 InfluxData. All rights reserved. 8 To tell the

    story of our distributed system
  9. How a trace looks like?

  10. © 2017 InfluxData. All rights reserved. 10 A span is

    the smallest unit in a trace.
  11. © 2017 InfluxData. All rights reserved. 11 It describes a

    single action executed by a program: • A single HTTP request. • A database query. • A message execution in a queue system. • A lookup from a key/value store.
  12. © 2017 InfluxData. All rights reserved. 12 A span is

    described via: • span_id the unique identifier in a trace • trace_id to determine its trace • parent_id to describe a hierarchy • labels a set of key/value pairs • Span Context is a set of value that will be propagated in the trace • Logs
  13. © 2017 InfluxData. All rights reserved. 13 post: /users handle.create_user

    user_exists insert_user send_email nginx sA mysql mysql worker A single trace
  14. © 2017 InfluxData. All rights reserved. 14 post: /users handle.create_user

    user_exists nginx sA mysql mysql sA Service Name: mysql Trace ID: 34ytsy5hs45gs46hs5g Span ID: se5hs5s5hs45gs45gs Span Name: user_exists Duration: 1.2s Start: 56467657457234 Logs: query: “select * from tb_user where id = 345” user: sa_service
  15. How do I follow a request?

  16. © 2017 InfluxData. All rights reserved. 16 The implementation changes

    based on what you are instrumenting ¨ To instrument HTTP services the solution is via HEADER ¨ Same for grpc ¨ For queue system you can pass it as part of the message payload
  17. © 2017 InfluxData. All rights reserved. 17 B3-Propagation https://github.com/openzipkin/b3-propagation X-B3-TraceId:

    80f198ee56343ba864fe8b2a57d3eff7 X-B3-ParentSpanId: 05e3ac9a4f6e3b90 X-B3-SpanId: e457b5a2e4d86bd1 X-B3-Sampled: 1
  18. Do I need a standard for tracing?

  19. © 2017 InfluxData. All rights reserved. 19 YES

  20. © 2017 InfluxData. All rights reserved. 20 1. Applications can

    be written using different languages but at the end you need to build one single trace. It means that they need to agree on a common standard/protocol. 2. If you use a widely supported standard you can avoid vendor lock-in.
  21. © 2017 InfluxData. All rights reserved. 21

  22. © 2017 InfluxData. All rights reserved. 22 log log log

    log log log Parent Span Span Context / Baggage Child Child Child Span ¨ Spans - Basic unit of timing and causality. Can be tagged with key/value pairs. ¨ Logs - Structured data recorded on a span. ¨ Span Context - serializable format for linking spans across network boundaries. Carries baggage, such as a request and client IDs. ¨ Tracers - Anything that plugs into the OpenTracing API to record information. ¨ ZipKin, Jaeger, LightStep, others ¨ Also metrics (Prometheus) and logging
  23. © 2017 InfluxData. All rights reserved. 23 1.5 year old!

    Tracer implementations: Zipkin, Jaeger, LightStep, SkyWalking, others All sorts of companies use OpenTracing:
  24. © 2017 InfluxData. All rights reserved. 24 Rapidly growing OSS

    and vendor adoption JDBI Java Webservlet Jaxr
  25. © 2017 InfluxData. All rights reserved. 25 import "github.com/opentracing/opentracing-go" import

    ".../some_tracing_impl" func main() { opentracing.SetGlobalTracer( // tracing impl specific: some_tracing_impl.New(...), ) ... } https://github.com/opentracing/opentracing-go Opentracing: Configure the GlobalTracer
  26. © 2017 InfluxData. All rights reserved. 26 func xyz(ctx context.Context,

    ...) { ... span, ctx := opentracing.StartSpanFromContext(ctx, "operation_name") defer span.Finish() span.LogFields( log.String("event", "soft error"), log.String("type", "cache timeout"), log.Int("waited.millis", 1500)) ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Span from the Context
  27. © 2017 InfluxData. All rights reserved. 27 func xyz(parentSpan opentracing.Span,

    ...) { ... sp := opentracing.StartSpan( "operation_name", opentracing.ChildOf(parentSpan.Context())) defer sp.Finish() ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Child Span
  28. © 2017 InfluxData. All rights reserved. 28 OpenCensus: instrumentation spec

    and libraries by Google Common Interface to get stats and traces from your apps Different exporters to persist your data
  29. How a tracing infrastructure looks?

  30. © 2017 InfluxData. All rights reserved. 30 OpenTracing API application

    logic µ-service frameworks Lambda functions RPC & control-flow frameworks existing instrumentation tracing infrastructure main() I N S T A N A J a e g e r microservice process
  31. Can I have a tracing infrastructure on-prem?

  32. © 2017 InfluxData. All rights reserved. 32 There are different

    Open Source alternatives: ¨ Zipkin ¨ Java ¨ Sponsored by Twitter ¨ Supported backend: ElasticSearch, MySQL, Cassandra ¨ Jaeger ¨ Go ¨ Sponsored by Uber and part of the CNCF ¨ Supported backend: ElasticSearch, Cassandra
  33. There are as a service tracing infrastructure?

  34. © 2017 InfluxData. All rights reserved. 34 ¨ NewRelic ¨

    Honeycomb ¨ LightSteps ¨ AWS X-Ray ¨ Google Stack Driver
  35. Can I store traces everywhere?

  36. © 2017 InfluxData. All rights reserved. 36 Short answer YES.

    At your own risk… ¨ Really high cardinality ¨ High write throughput Probably databases like InfluxDB, Cassandra, MongoDB are a better option compared with MySQL, Postgres but it always depends on traffic and amount of data.
  37. © 2017 InfluxData. All rights reserved. 37 Reach out: @gianarb

    gianluca@influxdb.com Any question?