Pro Yearly is on sale from $80 to $50! »

Monitoring Go Applications with OpenTelemetry

C9429824e6c8bf0bdd744ca0bfedee2b?s=47 Johannes Liebermann
January 22, 2020
62

Monitoring Go Applications with OpenTelemetry

OpenTelemetry is a CNCF sandbox project for standardizing application tracing and monitoring across multiple programming languages, platforms and monitoring vendors. This talk provides a brief introduction to OpenTelemetry, explores the OpenTelemetry Go library and demonstrates how it can be used to make Go applications observable.

C9429824e6c8bf0bdd744ca0bfedee2b?s=128

Johannes Liebermann

January 22, 2020
Tweet

Transcript

  1. GoDays Berlin | 22.01.20

  2. Distributed tracing is great! (...then why don’t we see it

    everywhere? )
  3. https://xkcd.com/927/

  4. Johannes Liebermann Software Developer, Kinvolk Github: johananl Twitter: @j_lieb Email:

    johannes@kinvolk.io
  5. The Kubernetes Linux Experts Engineering services and products for Kubernetes,

    containers, process management and Linux user-space + kernel Blog: kinvolk.io/blog Github: kinvolk Twitter: kinvolkio Email: hello@kinvolk.io
  6. • Distributed tracing in 30 seconds • Main challenges with

    distributed tracing • Introduction to OpenTelemetry • Demo — instrumenting a distributed Go application • A peek into the OpenTelemetry Go library
  7. This talk assumes familiarity with distributed tracing.

  8. helloHandler := func(w http.ResponseWriter, req *http.Request) { ... ctx, span

    := tr.Start( req.Context(), "handle-hello-request", ... ) defer span.End() _, _ = io.WriteString(w, "Hello, world!\n") }
  9. None
  10. • A span measures a unit of work in a

    service • A trace combines multiple spans together handle-http-request query-database render-response 5ms 3ms 2ms trace span
  11. Cool!

  12. So why doesn’t everybody use it?

  13. • Instrumentation == code changes • Hard to justify reducing

    team velocity for tracing • You can’t have “instrumentation holes” ◦ At the very least you must propagate context
  14. • Vendor-locking for tracing is especially problematic • Importing a

    vendor-specific library is scary ◦ What if my monitoring vendor raises prices? • Open-source libraries must remain neutral ◦ You can’t require users to use a specific vendor ◦ Maintaining support for multiple vendors is a lot of work
  15. • Does distributed tracing conflict with microservices?

  16. • Multiple microservices • Multiple programming languages and frameworks •

    Multiple protocols (HTTP, gRPC, messaging, ...) • Multiple tracing backends (Jaeger, Zipkin, Datadog, LightStep, NewRelic, Dynatrace, …)
  17. Is there a solution?

  18. https://xkcd.com/927/

  19. Standards!

  20. Lack of standards is especially costly for distributed tracing.

  21. None
  22. OpenCensus

  23. • A CNCF project which standardizes tracing APIs • API

    only — no implementation! • Released in December 2016 • Notable contributors: ◦ LightStep, Uber, Instana, SolarWinds, NewRelic, Datadog, Red Hat, ... • Supported by: ◦ LightStep, Datadog, Instana, Dynatrace, Jaeger, ...
  24. • A Google project which has been open-sourced • API

    and implementation • Released in January 2018 • Notable contributors: ◦ Google, Microsoft, Splunk, Honeycomb, SignalFx, ... • Supported by: ◦ Stackdriver, Datadog, Honeycomb, AWS X-Ray, Zipkin, ... OpenCensus
  25. OpenCensus ⊻

  26. + = May 2019

  27. • Announced May 2019 • The next major version of

    both OpenTracing and OpenCensus • A real community effort • A spec and a set of libraries • API and implementation • Tracing and metrics
  28. • API ◦ Follows the OpenTelemetry specification ◦ Can be

    used without an implementation • SDK ◦ A ready-to-use implementation ◦ Alternative implementations are supported • Exporters • Bridges https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/library-guidelines.md
  29. • Library developers depend only on the API • Application

    developers depend on the API and on an implementation • Monitoring vendors maintain their own exporters
  30. • I may want to use an instrumented 3rd-party library

    without using OpenTelemetry ◦ If no implementation is plugged in, telemetry data is not produced • My code should not be broken by instrumentation ◦ The API package is self-sufficient thanks to a built-in noop implementation • Performance impact should be minimal ◦ No blocking of end-user application by default ◦ Noop implementation produces negligible overhead ◦ Asynchronous exporting of telemetry data
  31. • Current status: pre-release • Production readiness: 2nd half of

    2020 • Libraries for: Go, Python, Java, C++, Rust, PHP, Ruby, .NET, JavaScript, Erlang
  32. Latest release: v0.2.1 (alpha) ✓ API (tracing + metrics) ✓

    SDK (tracing + metrics) ✓ Context propagation ✓ Exporters: Jaeger, Stackdriver, Prometheus (metrics) ✓ OpenTracing bridge
  33. How do I instrument my Go services?

  34. // Explicit span creation. handler := func(w http.ResponseWriter, r *http.Request)

    { ctx, span := tr.Start(r.Context(), "handle-request") defer span.End() // Handle HTTP request. } // Implicit span creation. err := tr.WithSpan(ctx, "do-stuff", func(context.Context) error { return do_stuff() } )
  35. // Log an event on the span. span.AddEvent(ctx, "Generating response",

    key.New("response").String("stuff") ) // Set key-value pairs on the span. span.SetAttributes( key.New("cool").Bool(true), key.New("usefulness").String("very"), )
  36. • “Context” refers to request-scoped data ◦ Example: request/transaction ID

    • Context is propagated across a request’s path • Distributed tracing relies on context for span correlation ◦ Trace ID and span ID must be propagated • Two types of context propagation: in-process and distributed
  37. • Distributed context propagation is protocol-dependent! // Inject tracing metadata

    on outgoing requests. grpctrace.Inject(ctx, &metadata) // Extract tracing metadata on incoming requests. metadata, spanCtx := grpctrace.Extract(ctx, &metadataCopy)
  38. Frontend Seniority Field Role HTTP gRPC

  39. Demo

  40. Let’s peek into the OpenTelemetry Go library.

  41. handleIncomingRequest() { span := tr.Start() defer span.End() library1.doSomething() } library1.doSomething()

    { // Current span? library2.doSomething() } library2.doSomething() { // Current span? } User Application 3rd-party library 3rd-party library
  42. • Used among functions or goroutines within a service •

    Must be thread-safe • Two main approaches: ◦ Implicit — thread-local storage, global variables, … ◦ Explicit — as an argument in function calls
  43. • https://golang.org/pkg/context/ • Commonly employed for cascading request cancellation •

    Can be used for propagating request-scoped data • Thread-safe • You may already be using context anyway
  44. // Set current span. func ContextWithSpan(ctx context.Context, span Span) context.Context

    { return context.WithValue(ctx, currentSpanKey, span) } // Get current span. func SpanFromContext(ctx context.Context) Span { if span, has := ctx.Value(currentSpanKey).(Span); has { return span } return NoopSpan{} } api/trace/current.go
  45. Conclusion

  46. • It’s a lot of work ◦ Hopefully you’ll never

    have to re-instrument your entire codebase ◦ Auto-instrumentation is in the works • We can’t vendor-lock ◦ Architecture encourages separation of concerns • Distributed tracing vs. microservices ◦ A good balance between freedom and uniformity
  47. Johannes Liebermann Github: johananl Twitter: @j_lieb Email: johannes@kinvolk.io Kinvolk Blog:

    kinvolk.io/blog Github: kinvolk Twitter: kinvolkio Email: hello@kinvolk.io