$30 off During Our Annual Pro Sale. View Details »

Monitoring Go Applications with OpenTelemetry

Johannes Liebermann
January 22, 2020
340

Monitoring Go Applications with OpenTelemetry

OpenTelemetry is a CNCF sandbox project for standardizing application tracing and monitoring across multiple programming languages, platforms and monitoring vendors. This talk provides a brief introduction to OpenTelemetry, explores the OpenTelemetry Go library and demonstrates how it can be used to make Go applications observable.

Johannes Liebermann

January 22, 2020
Tweet

Transcript

  1. GoDays Berlin | 22.01.20

  2. Distributed tracing is great! (...then why don’t we see it

    everywhere? )
  3. https://xkcd.com/927/

  4. Johannes Liebermann Software Developer, Kinvolk Github: johananl Twitter: @j_lieb Email:

    johannes@kinvolk.io
  5. The Kubernetes Linux Experts Engineering services and products for Kubernetes,

    containers, process management and Linux user-space + kernel Blog: kinvolk.io/blog Github: kinvolk Twitter: kinvolkio Email: hello@kinvolk.io
  6. • Distributed tracing in 30 seconds • Main challenges with

    distributed tracing • Introduction to OpenTelemetry • Demo — instrumenting a distributed Go application • A peek into the OpenTelemetry Go library
  7. This talk assumes familiarity with distributed tracing.

  8. helloHandler := func(w http.ResponseWriter, req *http.Request) { ... ctx, span

    := tr.Start( req.Context(), "handle-hello-request", ... ) defer span.End() _, _ = io.WriteString(w, "Hello, world!\n") }
  9. None
  10. • A span measures a unit of work in a

    service • A trace combines multiple spans together handle-http-request query-database render-response 5ms 3ms 2ms trace span
  11. Cool!

  12. So why doesn’t everybody use it?

  13. • Instrumentation == code changes • Hard to justify reducing

    team velocity for tracing • You can’t have “instrumentation holes” ◦ At the very least you must propagate context
  14. • Vendor-locking for tracing is especially problematic • Importing a

    vendor-specific library is scary ◦ What if my monitoring vendor raises prices? • Open-source libraries must remain neutral ◦ You can’t require users to use a specific vendor ◦ Maintaining support for multiple vendors is a lot of work
  15. • Does distributed tracing conflict with microservices?

  16. • Multiple microservices • Multiple programming languages and frameworks •

    Multiple protocols (HTTP, gRPC, messaging, ...) • Multiple tracing backends (Jaeger, Zipkin, Datadog, LightStep, NewRelic, Dynatrace, …)
  17. Is there a solution?

  18. https://xkcd.com/927/

  19. Standards!

  20. Lack of standards is especially costly for distributed tracing.

  21. None
  22. OpenCensus

  23. • A CNCF project which standardizes tracing APIs • API

    only — no implementation! • Released in December 2016 • Notable contributors: ◦ LightStep, Uber, Instana, SolarWinds, NewRelic, Datadog, Red Hat, ... • Supported by: ◦ LightStep, Datadog, Instana, Dynatrace, Jaeger, ...
  24. • A Google project which has been open-sourced • API

    and implementation • Released in January 2018 • Notable contributors: ◦ Google, Microsoft, Splunk, Honeycomb, SignalFx, ... • Supported by: ◦ Stackdriver, Datadog, Honeycomb, AWS X-Ray, Zipkin, ... OpenCensus
  25. OpenCensus ⊻

  26. + = May 2019

  27. • Announced May 2019 • The next major version of

    both OpenTracing and OpenCensus • A real community effort • A spec and a set of libraries • API and implementation • Tracing and metrics
  28. • API ◦ Follows the OpenTelemetry specification ◦ Can be

    used without an implementation • SDK ◦ A ready-to-use implementation ◦ Alternative implementations are supported • Exporters • Bridges https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/library-guidelines.md
  29. • Library developers depend only on the API • Application

    developers depend on the API and on an implementation • Monitoring vendors maintain their own exporters
  30. • I may want to use an instrumented 3rd-party library

    without using OpenTelemetry ◦ If no implementation is plugged in, telemetry data is not produced • My code should not be broken by instrumentation ◦ The API package is self-sufficient thanks to a built-in noop implementation • Performance impact should be minimal ◦ No blocking of end-user application by default ◦ Noop implementation produces negligible overhead ◦ Asynchronous exporting of telemetry data
  31. • Current status: pre-release • Production readiness: 2nd half of

    2020 • Libraries for: Go, Python, Java, C++, Rust, PHP, Ruby, .NET, JavaScript, Erlang
  32. Latest release: v0.2.1 (alpha) ✓ API (tracing + metrics) ✓

    SDK (tracing + metrics) ✓ Context propagation ✓ Exporters: Jaeger, Stackdriver, Prometheus (metrics) ✓ OpenTracing bridge
  33. How do I instrument my Go services?

  34. // Explicit span creation. handler := func(w http.ResponseWriter, r *http.Request)

    { ctx, span := tr.Start(r.Context(), "handle-request") defer span.End() // Handle HTTP request. } // Implicit span creation. err := tr.WithSpan(ctx, "do-stuff", func(context.Context) error { return do_stuff() } )
  35. // Log an event on the span. span.AddEvent(ctx, "Generating response",

    key.New("response").String("stuff") ) // Set key-value pairs on the span. span.SetAttributes( key.New("cool").Bool(true), key.New("usefulness").String("very"), )
  36. • “Context” refers to request-scoped data ◦ Example: request/transaction ID

    • Context is propagated across a request’s path • Distributed tracing relies on context for span correlation ◦ Trace ID and span ID must be propagated • Two types of context propagation: in-process and distributed
  37. • Distributed context propagation is protocol-dependent! // Inject tracing metadata

    on outgoing requests. grpctrace.Inject(ctx, &metadata) // Extract tracing metadata on incoming requests. metadata, spanCtx := grpctrace.Extract(ctx, &metadataCopy)
  38. Frontend Seniority Field Role HTTP gRPC

  39. Demo

  40. Let’s peek into the OpenTelemetry Go library.

  41. handleIncomingRequest() { span := tr.Start() defer span.End() library1.doSomething() } library1.doSomething()

    { // Current span? library2.doSomething() } library2.doSomething() { // Current span? } User Application 3rd-party library 3rd-party library
  42. • Used among functions or goroutines within a service •

    Must be thread-safe • Two main approaches: ◦ Implicit — thread-local storage, global variables, … ◦ Explicit — as an argument in function calls
  43. • https://golang.org/pkg/context/ • Commonly employed for cascading request cancellation •

    Can be used for propagating request-scoped data • Thread-safe • You may already be using context anyway
  44. // Set current span. func ContextWithSpan(ctx context.Context, span Span) context.Context

    { return context.WithValue(ctx, currentSpanKey, span) } // Get current span. func SpanFromContext(ctx context.Context) Span { if span, has := ctx.Value(currentSpanKey).(Span); has { return span } return NoopSpan{} } api/trace/current.go
  45. Conclusion

  46. • It’s a lot of work ◦ Hopefully you’ll never

    have to re-instrument your entire codebase ◦ Auto-instrumentation is in the works • We can’t vendor-lock ◦ Architecture encourages separation of concerns • Distributed tracing vs. microservices ◦ A good balance between freedom and uniformity
  47. Johannes Liebermann Github: johananl Twitter: @j_lieb Email: johannes@kinvolk.io Kinvolk Blog:

    kinvolk.io/blog Github: kinvolk Twitter: kinvolkio Email: hello@kinvolk.io