Slide 1

Slide 1 text

GoDays Berlin | 22.01.20

Slide 2

Slide 2 text

Distributed tracing is great! (...then why don’t we see it everywhere? )

Slide 3

Slide 3 text

https://xkcd.com/927/

Slide 4

Slide 4 text

Johannes Liebermann Software Developer, Kinvolk Github: johananl Twitter: @j_lieb Email: johannes@kinvolk.io

Slide 5

Slide 5 text

The Kubernetes Linux Experts Engineering services and products for Kubernetes, containers, process management and Linux user-space + kernel Blog: kinvolk.io/blog Github: kinvolk Twitter: kinvolkio Email: hello@kinvolk.io

Slide 6

Slide 6 text

● Distributed tracing in 30 seconds ● Main challenges with distributed tracing ● Introduction to OpenTelemetry ● Demo — instrumenting a distributed Go application ● A peek into the OpenTelemetry Go library

Slide 7

Slide 7 text

This talk assumes familiarity with distributed tracing.

Slide 8

Slide 8 text

helloHandler := func(w http.ResponseWriter, req *http.Request) { ... ctx, span := tr.Start( req.Context(), "handle-hello-request", ... ) defer span.End() _, _ = io.WriteString(w, "Hello, world!\n") }

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

● A span measures a unit of work in a service ● A trace combines multiple spans together handle-http-request query-database render-response 5ms 3ms 2ms trace span

Slide 11

Slide 11 text

Cool!

Slide 12

Slide 12 text

So why doesn’t everybody use it?

Slide 13

Slide 13 text

● Instrumentation == code changes ● Hard to justify reducing team velocity for tracing ● You can’t have “instrumentation holes” ○ At the very least you must propagate context

Slide 14

Slide 14 text

● Vendor-locking for tracing is especially problematic ● Importing a vendor-specific library is scary ○ What if my monitoring vendor raises prices? ● Open-source libraries must remain neutral ○ You can’t require users to use a specific vendor ○ Maintaining support for multiple vendors is a lot of work

Slide 15

Slide 15 text

● Does distributed tracing conflict with microservices?

Slide 16

Slide 16 text

● Multiple microservices ● Multiple programming languages and frameworks ● Multiple protocols (HTTP, gRPC, messaging, ...) ● Multiple tracing backends (Jaeger, Zipkin, Datadog, LightStep, NewRelic, Dynatrace, …)

Slide 17

Slide 17 text

Is there a solution?

Slide 18

Slide 18 text

https://xkcd.com/927/

Slide 19

Slide 19 text

Standards!

Slide 20

Slide 20 text

Lack of standards is especially costly for distributed tracing.

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

OpenCensus

Slide 23

Slide 23 text

● A CNCF project which standardizes tracing APIs ● API only — no implementation! ● Released in December 2016 ● Notable contributors: ○ LightStep, Uber, Instana, SolarWinds, NewRelic, Datadog, Red Hat, ... ● Supported by: ○ LightStep, Datadog, Instana, Dynatrace, Jaeger, ...

Slide 24

Slide 24 text

● A Google project which has been open-sourced ● API and implementation ● Released in January 2018 ● Notable contributors: ○ Google, Microsoft, Splunk, Honeycomb, SignalFx, ... ● Supported by: ○ Stackdriver, Datadog, Honeycomb, AWS X-Ray, Zipkin, ... OpenCensus

Slide 25

Slide 25 text

OpenCensus ⊻

Slide 26

Slide 26 text

+ = May 2019

Slide 27

Slide 27 text

● Announced May 2019 ● The next major version of both OpenTracing and OpenCensus ● A real community effort ● A spec and a set of libraries ● API and implementation ● Tracing and metrics

Slide 28

Slide 28 text

● API ○ Follows the OpenTelemetry specification ○ Can be used without an implementation ● SDK ○ A ready-to-use implementation ○ Alternative implementations are supported ● Exporters ● Bridges https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/library-guidelines.md

Slide 29

Slide 29 text

● Library developers depend only on the API ● Application developers depend on the API and on an implementation ● Monitoring vendors maintain their own exporters

Slide 30

Slide 30 text

● I may want to use an instrumented 3rd-party library without using OpenTelemetry ○ If no implementation is plugged in, telemetry data is not produced ● My code should not be broken by instrumentation ○ The API package is self-sufficient thanks to a built-in noop implementation ● Performance impact should be minimal ○ No blocking of end-user application by default ○ Noop implementation produces negligible overhead ○ Asynchronous exporting of telemetry data

Slide 31

Slide 31 text

● Current status: pre-release ● Production readiness: 2nd half of 2020 ● Libraries for: Go, Python, Java, C++, Rust, PHP, Ruby, .NET, JavaScript, Erlang

Slide 32

Slide 32 text

Latest release: v0.2.1 (alpha) ✓ API (tracing + metrics) ✓ SDK (tracing + metrics) ✓ Context propagation ✓ Exporters: Jaeger, Stackdriver, Prometheus (metrics) ✓ OpenTracing bridge

Slide 33

Slide 33 text

How do I instrument my Go services?

Slide 34

Slide 34 text

// Explicit span creation. handler := func(w http.ResponseWriter, r *http.Request) { ctx, span := tr.Start(r.Context(), "handle-request") defer span.End() // Handle HTTP request. } // Implicit span creation. err := tr.WithSpan(ctx, "do-stuff", func(context.Context) error { return do_stuff() } )

Slide 35

Slide 35 text

// Log an event on the span. span.AddEvent(ctx, "Generating response", key.New("response").String("stuff") ) // Set key-value pairs on the span. span.SetAttributes( key.New("cool").Bool(true), key.New("usefulness").String("very"), )

Slide 36

Slide 36 text

● “Context” refers to request-scoped data ○ Example: request/transaction ID ● Context is propagated across a request’s path ● Distributed tracing relies on context for span correlation ○ Trace ID and span ID must be propagated ● Two types of context propagation: in-process and distributed

Slide 37

Slide 37 text

● Distributed context propagation is protocol-dependent! // Inject tracing metadata on outgoing requests. grpctrace.Inject(ctx, &metadata) // Extract tracing metadata on incoming requests. metadata, spanCtx := grpctrace.Extract(ctx, &metadataCopy)

Slide 38

Slide 38 text

Frontend Seniority Field Role HTTP gRPC

Slide 39

Slide 39 text

Demo

Slide 40

Slide 40 text

Let’s peek into the OpenTelemetry Go library.

Slide 41

Slide 41 text

handleIncomingRequest() { span := tr.Start() defer span.End() library1.doSomething() } library1.doSomething() { // Current span? library2.doSomething() } library2.doSomething() { // Current span? } User Application 3rd-party library 3rd-party library

Slide 42

Slide 42 text

● Used among functions or goroutines within a service ● Must be thread-safe ● Two main approaches: ○ Implicit — thread-local storage, global variables, … ○ Explicit — as an argument in function calls

Slide 43

Slide 43 text

● https://golang.org/pkg/context/ ● Commonly employed for cascading request cancellation ● Can be used for propagating request-scoped data ● Thread-safe ● You may already be using context anyway

Slide 44

Slide 44 text

// Set current span. func ContextWithSpan(ctx context.Context, span Span) context.Context { return context.WithValue(ctx, currentSpanKey, span) } // Get current span. func SpanFromContext(ctx context.Context) Span { if span, has := ctx.Value(currentSpanKey).(Span); has { return span } return NoopSpan{} } api/trace/current.go

Slide 45

Slide 45 text

Conclusion

Slide 46

Slide 46 text

● It’s a lot of work ○ Hopefully you’ll never have to re-instrument your entire codebase ○ Auto-instrumentation is in the works ● We can’t vendor-lock ○ Architecture encourages separation of concerns ● Distributed tracing vs. microservices ○ A good balance between freedom and uniformity

Slide 47

Slide 47 text

Johannes Liebermann Github: johananl Twitter: @j_lieb Email: johannes@kinvolk.io Kinvolk Blog: kinvolk.io/blog Github: kinvolk Twitter: kinvolkio Email: hello@kinvolk.io