Distributed Tracing FAQ

Microservices, containers and more in general distributed systems have opened a different point of view on our system and applications. We need to understand how a single event or requests cross our app jumping over networks, containers, virtual machines and sometime clod provider. There is a specific practice called distributed tracing to increase observability of systems like that. After this talk, you will have a solid idea around what tracing means, how you can instrument your applications and you will be ready to trace your application across many languages using open source technologies like OpenTracing, OpenCensus, Zipkin, Jaeger, InfluxDB. You will ask yourself how you survived until today!

Gianluca Arbezzano

November 06, 2018

  Gianluca Arbezzano Site

    Reliability Engineer @InfluxData • https://gianarb.it • @gianarb What I like: • I make dirty hacks that look awesome • I grow my vegetables • Travel for fun and work
  It is a

    way to describe the distribution’s complexity
  In practice it

    is a different aggregation for the well-known logs and stats.
  It describes a

    single action executed by a program: • A single HTTP request. • A database query. • A message execution in a queue system. • A lookup from a key/value store.
  A span is

    described via: • span_id the unique identifier in a trace • trace_id to determine its trace • parent_id to describe a hierarchy • labels a set of key/value pairs • Span Context is a set of value that will be propagated in the trace • Logs
  post: /users handle.create_user

    user_exists insert_user send_email nginx sA mysql mysql worker A single trace
  post: /users handle.create_user

    user_exists nginx sA mysql mysql sA Service Name: mysql Trace ID: 34ytsy5hs45gs46hs5g Span ID: se5hs5s5hs45gs45gs Span Name: user_exists Duration: 1.2s Start: 56467657457234 Logs: query: “select * from tb_user where id = 345” user: sa_service
  The implementation changes

    based on what you are instrumenting ¨ To instrument HTTP services the solution is via HEADER ¨ Same for grpc ¨ For queue system you can pass it as part of the message payload
  B3-Propagation https://github.com/openzipkin/b3-propagation X-B3-TraceId:

    80f198ee56343ba864fe8b2a57d3eff7 X-B3-ParentSpanId: 05e3ac9a4f6e3b90 X-B3-SpanId: e457b5a2e4d86bd1 X-B3-Sampled: 1
  1. Applications can

    be written using different languages but at the end you need to build one single trace. It means that they need to agree on a common standard/protocol. 2. If you use a widely supported standard you can avoid vendor lock-in.
  log log log

    log log log Parent Span Span Context / Baggage Child Child Child Span ¨ Spans - Basic unit of timing and causality. Can be tagged with key/value pairs. ¨ Logs - Structured data recorded on a span. ¨ Span Context - serializable format for linking spans across network boundaries. Carries baggage, such as a request and client IDs. ¨ Tracers - Anything that plugs into the OpenTracing API to record information. ¨ ZipKin, Jaeger, LightStep, others ¨ Also metrics (Prometheus) and logging
  1.5 year old!

    Tracer implementations: Zipkin, Jaeger, LightStep, SkyWalking, others All sorts of companies use OpenTracing:
  Rapidly growing OSS

    and vendor adoption JDBI Java Webservlet Jaxr
  import "github.com/opentracing/opentracing-go" import

    ".../some_tracing_impl" func main() { opentracing.SetGlobalTracer( // tracing impl specific: some_tracing_impl.New(...), ) ... } https://github.com/opentracing/opentracing-go Opentracing: Configure the GlobalTracer
  func xyz(ctx context.Context,

    ...) { ... span, ctx := opentracing.StartSpanFromContext(ctx, "operation_name") defer span.Finish() span.LogFields( log.String("event", "soft error"), log.String("type", "cache timeout"), log.Int("waited.millis", 1500)) ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Span from the Context
  func xyz(parentSpan opentracing.Span,

    ...) { ... sp := opentracing.StartSpan( "operation_name", opentracing.ChildOf(parentSpan.Context())) defer sp.Finish() ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Child Span
  OpenCensus: instrumentation spec

    and libraries by Google Common Interface to get stats and traces from your apps Different exporters to persist your data
  OpenTracing API application

    logic µ-service frameworks Lambda functions RPC & control-flow frameworks existing instrumentation tracing infrastructure main() I N S T A N A J a e g e r microservice process
  There are different

    Open Source alternatives: ¨ Zipkin ¨ Java ¨ Sponsored by Twitter ¨ Supported backend: ElasticSearch, MySQL, Cassandra ¨ Jaeger ¨ Go ¨ Sponsored by Uber and part of the CNCF ¨ Supported backend: ElasticSearch, Cassandra
  ¨ NewRelic ¨

    Honeycomb ¨ LightSteps ¨ AWS X-Ray ¨ Google Stack Driver
  Short answer YES.

    At your own risk… ¨ Really high cardinality ¨ High write throughput Probably databases like InfluxDB, Cassandra, MongoDB are a better option compared with MySQL, Postgres but it always depends on traffic and amount of data.