Distributed Tracing FAQ

Distributed Tracing FAQ

Microservices, containers and more in general distributed systems have opened a different point of view on our system and applications. We need to understand how a single event or requests cross our app jumping over networks, containers, virtual machines and sometime clod provider. There is a specific practice called distributed tracing to increase observability of systems like that. After this talk, you will have a solid idea around what tracing means, how you can instrument your applications and you will be ready to trace your application across many languages using open source technologies like OpenTracing, OpenCensus, Zipkin, Jaeger, InfluxDB. You will ask yourself how you survived until today!

Fa5fd3405808cc6a9fe4b126b1ec39bd?s=128

Gianluca Arbezzano

November 06, 2018
Tweet

Transcript

  1. 2.

    © 2017 InfluxData. All rights reserved. 2 Gianluca Arbezzano Site

    Reliability Engineer @InfluxData • https://gianarb.it • @gianarb What I like: • I make dirty hacks that look awesome • I grow my vegetables • Travel for fun and work
  2. 6.

    © 2017 InfluxData. All rights reserved. 6 It is a

    way to describe the distribution’s complexity
  3. 7.

    © 2017 InfluxData. All rights reserved. 7 In practice it

    is a different aggregation for the well-known logs and stats.
  4. 8.
  5. 10.
  6. 11.

    © 2017 InfluxData. All rights reserved. 11 It describes a

    single action executed by a program: • A single HTTP request. • A database query. • A message execution in a queue system. • A lookup from a key/value store.
  7. 12.

    © 2017 InfluxData. All rights reserved. 12 A span is

    described via: • span_id the unique identifier in a trace • trace_id to determine its trace • parent_id to describe a hierarchy • labels a set of key/value pairs • Span Context is a set of value that will be propagated in the trace • Logs
  8. 13.

    © 2017 InfluxData. All rights reserved. 13 post: /users handle.create_user

    user_exists insert_user send_email nginx sA mysql mysql worker A single trace
  9. 14.

    © 2017 InfluxData. All rights reserved. 14 post: /users handle.create_user

    user_exists nginx sA mysql mysql sA Service Name: mysql Trace ID: 34ytsy5hs45gs46hs5g Span ID: se5hs5s5hs45gs45gs Span Name: user_exists Duration: 1.2s Start: 56467657457234 Logs: query: “select * from tb_user where id = 345” user: sa_service
  10. 16.

    © 2017 InfluxData. All rights reserved. 16 The implementation changes

    based on what you are instrumenting ¨ To instrument HTTP services the solution is via HEADER ¨ Same for grpc ¨ For queue system you can pass it as part of the message payload
  11. 17.

    © 2017 InfluxData. All rights reserved. 17 B3-Propagation https://github.com/openzipkin/b3-propagation X-B3-TraceId:

    80f198ee56343ba864fe8b2a57d3eff7 X-B3-ParentSpanId: 05e3ac9a4f6e3b90 X-B3-SpanId: e457b5a2e4d86bd1 X-B3-Sampled: 1
  12. 20.

    © 2017 InfluxData. All rights reserved. 20 1. Applications can

    be written using different languages but at the end you need to build one single trace. It means that they need to agree on a common standard/protocol. 2. If you use a widely supported standard you can avoid vendor lock-in.
  13. 22.

    © 2017 InfluxData. All rights reserved. 22 log log log

    log log log Parent Span Span Context / Baggage Child Child Child Span ¨ Spans - Basic unit of timing and causality. Can be tagged with key/value pairs. ¨ Logs - Structured data recorded on a span. ¨ Span Context - serializable format for linking spans across network boundaries. Carries baggage, such as a request and client IDs. ¨ Tracers - Anything that plugs into the OpenTracing API to record information. ¨ ZipKin, Jaeger, LightStep, others ¨ Also metrics (Prometheus) and logging
  14. 23.

    © 2017 InfluxData. All rights reserved. 23 1.5 year old!

    Tracer implementations: Zipkin, Jaeger, LightStep, SkyWalking, others All sorts of companies use OpenTracing:
  15. 24.

    © 2017 InfluxData. All rights reserved. 24 Rapidly growing OSS

    and vendor adoption JDBI Java Webservlet Jaxr
  16. 25.

    © 2017 InfluxData. All rights reserved. 25 import "github.com/opentracing/opentracing-go" import

    ".../some_tracing_impl" func main() { opentracing.SetGlobalTracer( // tracing impl specific: some_tracing_impl.New(...), ) ... } https://github.com/opentracing/opentracing-go Opentracing: Configure the GlobalTracer
  17. 26.

    © 2017 InfluxData. All rights reserved. 26 func xyz(ctx context.Context,

    ...) { ... span, ctx := opentracing.StartSpanFromContext(ctx, "operation_name") defer span.Finish() span.LogFields( log.String("event", "soft error"), log.String("type", "cache timeout"), log.Int("waited.millis", 1500)) ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Span from the Context
  18. 27.

    © 2017 InfluxData. All rights reserved. 27 func xyz(parentSpan opentracing.Span,

    ...) { ... sp := opentracing.StartSpan( "operation_name", opentracing.ChildOf(parentSpan.Context())) defer sp.Finish() ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Child Span
  19. 28.

    © 2017 InfluxData. All rights reserved. 28 OpenCensus: instrumentation spec

    and libraries by Google Common Interface to get stats and traces from your apps Different exporters to persist your data
  20. 30.

    © 2017 InfluxData. All rights reserved. 30 OpenTracing API application

    logic µ-service frameworks Lambda functions RPC & control-flow frameworks existing instrumentation tracing infrastructure main() I N S T A N A J a e g e r microservice process
  21. 32.

    © 2017 InfluxData. All rights reserved. 32 There are different

    Open Source alternatives: ¨ Zipkin ¨ Java ¨ Sponsored by Twitter ¨ Supported backend: ElasticSearch, MySQL, Cassandra ¨ Jaeger ¨ Go ¨ Sponsored by Uber and part of the CNCF ¨ Supported backend: ElasticSearch, Cassandra
  22. 34.

    © 2017 InfluxData. All rights reserved. 34 ¨ NewRelic ¨

    Honeycomb ¨ LightSteps ¨ AWS X-Ray ¨ Google Stack Driver
  23. 36.

    © 2017 InfluxData. All rights reserved. 36 Short answer YES.

    At your own risk… ¨ Really high cardinality ¨ High write throughput Probably databases like InfluxDB, Cassandra, MongoDB are a better option compared with MySQL, Postgres but it always depends on traffic and amount of data.
  24. 37.