Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Observable Applications With OpenTelemetry

C9429824e6c8bf0bdd744ca0bfedee2b?s=47 Johannes Liebermann
April 01, 2020
100

Observable Applications With OpenTelemetry

OpenTelemetry is a CNCF sandbox project which standardizes application tracing and monitoring across multiple programming languages, protocols, platforms and vendors. In this talk I provide a brief introduction to the OpenTelemetry project, explore some of its language libraries, demonstrate how they can be used to make distributed applications observable and look into some of the tricky parts in implementing distributed tracing as well as how they are handled by OpenTelemetry.

C9429824e6c8bf0bdd744ca0bfedee2b?s=128

Johannes Liebermann

April 01, 2020
Tweet

Transcript

  1. Observable Applications With OpenTelemetry Virtual Rejekts | 01.04.20

  2. Do we need distributed tracing?

  3. That depends on the questions we want to ask...

  4. Hi, I'm Johannes Johannes Liebermann Software Developer, Kinvolk Github: johananl

    Twitter: @j_lieb Email: johannes@kinvolk.io
  5. The Kubernetes Linux Experts Engineering services and products for Kubernetes,

    containers, process management and Linux user-space + kernel Blog: kinvolk.io/blog Github: kinvolk Twitter: kinvolkio Email: hello@kinvolk.io Kinvolk
  6. https://xkcd.com/927/

  7. Agenda • Logs, metrics and traces • Distributed tracing: why

    it’s great but also hard • Introduction to OpenTelemetry • A look into the tracing libraries (Go, Python) • Demo — instrumenting a distributed application • Context propagation
  8. This talk assumes familiarity with distributed tracing.

  9. Distributed Tracing in 30 Seconds helloHandler := func(w http.ResponseWriter, req

    *http.Request) { ... ctx, span := tr.Start( req.Context(), "handle-hello-request", ... ) defer span.End() _, _ = io.WriteString(w, "Hello, world!\n") }
  10. Distributed Tracing in 30 Seconds

  11. Distributed Tracing in 30 Seconds • A span measures a

    unit of work in a service • A trace combines multiple spans together handle-http-request query-database render-response 5ms 3ms 2ms trace span
  12. Logs, Metrics, Traces

  13. Logs, Metrics, Traces Generating Processing & Storing Querying Scope Logs

    Easy Hard Hard Node Metrics So-so Easy Easy Node / Service Traces Hard So-so Easy Request
  14. Logs, Metrics, Traces Generating Processing & Storing Querying Scope Logs

    Easy Hard Hard Node Metrics So-so Easy Easy Node / Service Traces Hard So-so Easy Request
  15. What Is the Question? • Why did this node crash?

    • Was function X on the node called? • Is my service healthy? • How much traffic do we have? • Why was this request slow? • Where should I optimize performance? • Which services are involved?
  16. What Is the Question? • Why did this node crash?

    • Was function X on the node called? • Is my service healthy? • How much traffic do we have? • Why was this request slow? • Where should I optimize performance? • Which services are involved? Logs Metrics Traces
  17. What Is the Question? • Why did this node crash?

    • Was function X on the node called? • Is my service healthy? • How much traffic do we have? • Why was this request slow? • Where should I optimize performance? • Which services are involved?
  18. Distributed tracing allows us to get low-level, end-to-end information about

    individual requests.
  19. ...so why not trace everything all the time?

  20. Logs, Metrics, Traces Generating Processing & Storing Querying Scope Logs

    Easy Hard Hard Service Metrics So-so Easy Easy Service / system Traces Hard So-so Easy Request
  21. 1. It’s a Lot of Work • Instrumentation == code

    changes • Hard to justify reducing team velocity for tracing • You can’t have “instrumentation holes” ◦ At the very least you must propagate context
  22. 2. We Can’t Vendor-Lock • Vendor-locking for tracing is especially

    problematic • Importing a vendor-specific library is scary ◦ What if my monitoring vendor raises prices? • Open-source libraries must remain neutral ◦ You can’t require users to use a specific vendor ◦ Maintaining support for multiple vendors is a lot of work
  23. 3. Distributed Tracing vs. Microservices • Does distributed tracing conflict

    with microservices?
  24. • Multiple microservices • Multiple programming languages and frameworks •

    Multiple protocols (HTTP, gRPC, messaging, ...) • Multiple tracing backends (Jaeger, Zipkin, Datadog, LightStep, NewRelic, Dynatrace, …) Multiple Everything
  25. Is there a solution?

  26. Standards!

  27. Lack of standards is especially costly for distributed tracing.

  28. https://xkcd.com/927/

  29. None
  30. OpenCensus

  31. OpenCensus ⊻

  32. + = May 2019

  33. Introducing OpenTelemetry • opentelemetry.io • Announced May 2019 • The

    next major version of both OpenTracing and OpenCensus • A real community effort • A spec and a set of libraries • API and implementation • Tracing and metrics
  34. OpenTelemetry — Architecture • API ◦ Follows the OpenTelemetry specification

    ◦ Can be used without an implementation • SDK ◦ A ready-to-use implementation ◦ Alternative implementations are supported • Exporters • Bridges https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/library-guidelines.md
  35. Separation of Concerns • Library developers depend only on the

    API • Application developers depend on the API and on an implementation • Monitoring vendors maintain their own exporters
  36. Protecting User Applications • I may want to use an

    instrumented 3rd-party library without using OpenTelemetry ◦ If no implementation is plugged in, telemetry data is not produced • My code should not be broken by instrumentation ◦ The API package is self-sufficient thanks to a built-in noop implementation • Performance impact should be minimal ◦ No blocking of end-user application by default ◦ Noop implementation produces negligible overhead ◦ Asynchronous exporting of telemetry data
  37. • Current status: beta • Production readiness: 2nd half of

    2020 • Libraries for: Go, Python, Java, C++, Rust, PHP, Ruby, .NET, JavaScript, Erlang Project Status
  38. Go Library Status Latest release: v0.4.2 (beta) ✓ API (tracing

    + metrics) ✓ SDK (tracing + metrics) ✓ Context propagation ✓ Exporters: Jaeger, Zipkin, Prometheus (metrics) ✓ OpenTracing bridge
  39. Python Library Status Latest release: v0.6.0 (beta) ✓ API (tracing

    + metrics) ✓ SDK (tracing + metrics) ✓ Context propagation ✓ Exporters: Jaeger, Zipkin, Prometheus (metrics) ✓ OpenTracing bridge
  40. How do I instrument my services?

  41. Instrumenting Go Code // Explicit span creation. handler := func(w

    http.ResponseWriter, r *http.Request) { ctx, span := tr.Start(r.Context(), "handle-request") defer span.End() // Handle HTTP request. } // Implicit span creation. err := tr.WithSpan(ctx, "do-stuff", func(context.Context) error { return do_stuff() } )
  42. Instrumenting Go Code // Log an event on the span.

    span.AddEvent(ctx, "Generating response", key.New("response").String("stuff") ) // Set key-value pairs on the span. span.SetAttributes( key.New("cool").Bool(true), key.New("usefulness").String("very"), )
  43. // Inject tracing metadata on outgoing requests. grpctrace.Inject(ctx, &metadata) //

    Extract tracing metadata on incoming requests. metadata, spanCtx := grpctrace.Extract(ctx, &metadataCopy) Propagating Context Between Processes Protocol dependent!
  44. Instrumenting Python Code # Implicit span creation. with tracer.start_as_current_span("do-stuff") as

    span: do_stuff() # Log an event on the span. span.add_event(“something happened”, {‘foo’: ‘bar’}) # Set key-value pairs on the span. span.set_attribute(‘cool’: True)
  45. Frontend Seniority Field Role Demo: Fake Job Title Generator HTTP

    gRPC
  46. Demo

  47. The tricky part: context propagation.

  48. Context — Request-Scoped Data • “Context” refers to request-scoped data

    ◦ Example: request/transaction ID • Context is propagated across a request’s path • Needed for span correlation ◦ Trace ID and span ID must be propagated • Two types of context propagation: in-process and distributed
  49. In-Process Context Propagation handleIncomingRequest() { span := tr.Start() defer span.End()

    library1.doSomething() } library1.doSomething() { // Current span? library2.doSomething() } library2.doSomething() { // Current span? } User Application 3rd-party library 3rd-party library
  50. In-Process Context Propagation • Used among functions or goroutines within

    a service • Must be thread-safe • Two main approaches: ◦ Implicit — thread-local storage, global variables, … ◦ Explicit — as an argument in function calls • Go uses the context standard library package • Python uses context vars
  51. In-Process Context Propagation—Go // Set current span. func ContextWithSpan(ctx context.Context,

    span Span) context.Context { return context.WithValue(ctx, currentSpanKey, span) } // Get current span. func SpanFromContext(ctx context.Context) Span { if span, has := ctx.Value(currentSpanKey).(Span); has { return span } return NoopSpan{} } api/trace/context.go
  52. Distributed Context Propagation httptrace.Inject() httptrace.Extract() grpctrace.Inject() grpctrace.Extract() Service A Service

    B Service C HTTP gRPC
  53. Conclusion

  54. Conclusion • Tracing is tricky—but may well be worth it

    • It’s much easier than before ◦ Hopefully you’ll never have to re-instrument ◦ Auto-instrumentation is in the works • No vendor locking ◦ Architecture encourages separation of concerns • A good balance between freedom and uniformity ◦ Simple APIs ◦ Support for arbitrary implementations ◦ A real community effort
  55. Johannes Liebermann Github: johananl Twitter: @j_lieb Email: johannes@kinvolk.io Kinvolk Blog:

    kinvolk.io/blog Github: kinvolk Twitter: kinvolkio Email: hello@kinvolk.io Thank you!