Slide 1

Slide 1 text

© 2017 InfluxData. All rights reserved. 1 Distributed Tracing Frequently Asked Questions @gianarb

Slide 2

Slide 2 text

© 2017 InfluxData. All rights reserved. 2 Gianluca Arbezzano Site Reliability Engineer @InfluxData ● https://gianarb.it ● @gianarb What I like: ● I make dirty hacks that look awesome ● I grow my vegetables ● Travel for fun and work

Slide 3

Slide 3 text

© 2017 InfluxData. All rights reserved. 3 github.com/influxdata

Slide 4

Slide 4 text

Why do I need distributed tracing?

Slide 5

Slide 5 text

© 2017 InfluxData. All rights reserved. 5

Slide 6

Slide 6 text

© 2017 InfluxData. All rights reserved. 6 It is a way to describe the distribution’s complexity

Slide 7

Slide 7 text

© 2017 InfluxData. All rights reserved. 7 In practice it is a different aggregation for the well-known logs and stats.

Slide 8

Slide 8 text

© 2017 InfluxData. All rights reserved. 8 To tell the story of our distributed system

Slide 9

Slide 9 text

How a trace looks like?

Slide 10

Slide 10 text

© 2017 InfluxData. All rights reserved. 10 A span is the smallest unit in a trace.

Slide 11

Slide 11 text

© 2017 InfluxData. All rights reserved. 11 It describes a single action executed by a program: ● A single HTTP request. ● A database query. ● A message execution in a queue system. ● A lookup from a key/value store.

Slide 12

Slide 12 text

© 2017 InfluxData. All rights reserved. 12 A span is described via: ● span_id the unique identifier in a trace ● trace_id to determine its trace ● parent_id to describe a hierarchy ● labels a set of key/value pairs ● Span Context is a set of value that will be propagated in the trace ● Logs

Slide 13

Slide 13 text

© 2017 InfluxData. All rights reserved. 13 post: /users handle.create_user user_exists insert_user send_email nginx sA mysql mysql worker A single trace

Slide 14

Slide 14 text

© 2017 InfluxData. All rights reserved. 14 post: /users handle.create_user user_exists nginx sA mysql mysql sA Service Name: mysql Trace ID: 34ytsy5hs45gs46hs5g Span ID: se5hs5s5hs45gs45gs Span Name: user_exists Duration: 1.2s Start: 56467657457234 Logs: query: “select * from tb_user where id = 345” user: sa_service

Slide 15

Slide 15 text

How do I follow a request?

Slide 16

Slide 16 text

© 2017 InfluxData. All rights reserved. 16 The implementation changes based on what you are instrumenting ¨ To instrument HTTP services the solution is via HEADER ¨ Same for grpc ¨ For queue system you can pass it as part of the message payload

Slide 17

Slide 17 text

© 2017 InfluxData. All rights reserved. 17 B3-Propagation https://github.com/openzipkin/b3-propagation X-B3-TraceId: 80f198ee56343ba864fe8b2a57d3eff7 X-B3-ParentSpanId: 05e3ac9a4f6e3b90 X-B3-SpanId: e457b5a2e4d86bd1 X-B3-Sampled: 1

Slide 18

Slide 18 text

Do I need a standard for tracing?

Slide 19

Slide 19 text

© 2017 InfluxData. All rights reserved. 19 YES

Slide 20

Slide 20 text

© 2017 InfluxData. All rights reserved. 20 1. Applications can be written using different languages but at the end you need to build one single trace. It means that they need to agree on a common standard/protocol. 2. If you use a widely supported standard you can avoid vendor lock-in.

Slide 21

Slide 21 text

© 2017 InfluxData. All rights reserved. 21

Slide 22

Slide 22 text

© 2017 InfluxData. All rights reserved. 22 log log log log log log Parent Span Span Context / Baggage Child Child Child Span ¨ Spans - Basic unit of timing and causality. Can be tagged with key/value pairs. ¨ Logs - Structured data recorded on a span. ¨ Span Context - serializable format for linking spans across network boundaries. Carries baggage, such as a request and client IDs. ¨ Tracers - Anything that plugs into the OpenTracing API to record information. ¨ ZipKin, Jaeger, LightStep, others ¨ Also metrics (Prometheus) and logging

Slide 23

Slide 23 text

© 2017 InfluxData. All rights reserved. 23 1.5 year old! Tracer implementations: Zipkin, Jaeger, LightStep, SkyWalking, others All sorts of companies use OpenTracing:

Slide 24

Slide 24 text

© 2017 InfluxData. All rights reserved. 24 Rapidly growing OSS and vendor adoption JDBI Java Webservlet Jaxr

Slide 25

Slide 25 text

© 2017 InfluxData. All rights reserved. 25 import "github.com/opentracing/opentracing-go" import ".../some_tracing_impl" func main() { opentracing.SetGlobalTracer( // tracing impl specific: some_tracing_impl.New(...), ) ... } https://github.com/opentracing/opentracing-go Opentracing: Configure the GlobalTracer

Slide 26

Slide 26 text

© 2017 InfluxData. All rights reserved. 26 func xyz(ctx context.Context, ...) { ... span, ctx := opentracing.StartSpanFromContext(ctx, "operation_name") defer span.Finish() span.LogFields( log.String("event", "soft error"), log.String("type", "cache timeout"), log.Int("waited.millis", 1500)) ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Span from the Context

Slide 27

Slide 27 text

© 2017 InfluxData. All rights reserved. 27 func xyz(parentSpan opentracing.Span, ...) { ... sp := opentracing.StartSpan( "operation_name", opentracing.ChildOf(parentSpan.Context())) defer sp.Finish() ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Child Span

Slide 28

Slide 28 text

© 2017 InfluxData. All rights reserved. 28 OpenCensus: instrumentation spec and libraries by Google Common Interface to get stats and traces from your apps Different exporters to persist your data

Slide 29

Slide 29 text

How a tracing infrastructure looks?

Slide 30

Slide 30 text

© 2017 InfluxData. All rights reserved. 30 OpenTracing API application logic µ-service frameworks Lambda functions RPC & control-flow frameworks existing instrumentation tracing infrastructure main() I N S T A N A J a e g e r microservice process

Slide 31

Slide 31 text

Can I have a tracing infrastructure on-prem?

Slide 32

Slide 32 text

© 2017 InfluxData. All rights reserved. 32 There are different Open Source alternatives: ¨ Zipkin ¨ Java ¨ Sponsored by Twitter ¨ Supported backend: ElasticSearch, MySQL, Cassandra ¨ Jaeger ¨ Go ¨ Sponsored by Uber and part of the CNCF ¨ Supported backend: ElasticSearch, Cassandra

Slide 33

Slide 33 text

There are as a service tracing infrastructure?

Slide 34

Slide 34 text

© 2017 InfluxData. All rights reserved. 34 ¨ NewRelic ¨ Honeycomb ¨ LightSteps ¨ AWS X-Ray ¨ Google Stack Driver

Slide 35

Slide 35 text

Can I store traces everywhere?

Slide 36

Slide 36 text

© 2017 InfluxData. All rights reserved. 36 Short answer YES. At your own risk… ¨ Really high cardinality ¨ High write throughput Probably databases like InfluxDB, Cassandra, MongoDB are a better option compared with MySQL, Postgres but it always depends on traffic and amount of data.

Slide 37

Slide 37 text

© 2017 InfluxData. All rights reserved. 37 Reach out: @gianarb [email protected] Any question?