Distributed Tracing and Monitoring with OpenTelemetry

Distributed Tracing and Monitoring with OpenTelemetry

OpenTelemetry is an emerging standard for tracing and the metrics of cloud services. You can use it to gain observability into applications that span multiple clouds and technological stacks.

I explain how to use open source and vendor-agnostic client libraries for OpenTelemetry and export telemetry to common APM systems such as Zipkin and others. Along the way, we discuss core concepts such as tags, metrics, exporters, zPages, and trace context propagation.


Simon Zeltser

June 13, 2019


  1. Distributed Tracing and Monitoring with OpenTelemetry Simon Zeltser @simon_zeltser

  2. Software Evolution @simon_zeltser On Prem Cloud Virtual Machines Containers Monolith

    Microservices Single Language / Stack Polyglot Single Cloud Multiple Cloud Providers / Hybrid Containers Cloud Functions / Serverless
  3. New architectures => new challenges Debugging Observability Standardizing development practices

    Deployment / Packaging Configuration Management Secrets management @simon_zeltser
  4. Who Am I Software Engineer at Cloud Developer Experience -

    Infrastructure and Operations Tools Over a decade in distributed systems observability github.com/simonz130 @simon_zeltser
  5. What is Observability? Being able to debug the system and

    gain insights into the system’s behavior @simon_zeltser
  6. Observability > Monitoring @simon_zeltser Venn diagram by Cindy Sridharan

  7. @simon_zeltser

  8. Signals Holistic Approach: - Distributed Tracing - Metrics Collection -

    Continuous system profiling - Capturing Logs @simon_zeltser
  9. Ultimate Recipe for reliable cloud service Track System Health Capture

    traces, metrics and logs, create alerts on data Detect Problems Locality, networking, scheduling, dependencies Fix & Refine Optimize performance and cost of services Observability Lifecycle @simon_zeltser
  10. Observability in Distributed Systems Hard and Expensive: • Context propagation

    between components • Multiple environments • External dependencies • Vendor Lock-in • Cost @simon_zeltser Architecture from Hipstershop Demo App
  11. Meet OpenTelemetry! @simon_zeltser OpenTelemetry is integrated set of APIs and

    libraries to generate, collect and describe telemetry in distributed systems Problems OpenTelemetry solves: - Vendor neutrality for tracing, monitoring and logging APIs - Context propagation
  12. OpenTelemetry is: • Single set of APIs for tracing and

    metrics collection • Integrations with popular web, RPC and storage frameworks • Standardized Context Propagation • Exporters for sending data to backend of choice • Collector for smart traces & metrics aggregation @simon_zeltser
  13. Next major version of the OpenTracing and OpenCensus projects +

  14. Roadmap Announcement: https://medium.com/opentracing/merging-opentracing-and-opencensus-f0fe9c7ca6f0 Roadmap: https://medium.com/opentracing/a-roadmap-to-convergence-b074e5815289

  15. Who is behind OpenTelemetry? @simon_zeltser Core Contributors

  16. Metrics with OpenTelemetry

  17. Instrumentation with metrics View Measurement Tag Measure Aggregation View Data

  18. Demo Getting Started - Metrics collection with OpenTelemetry Demo coDE:

    https://github.com/simonz130/opencensus-csharp-samples @simon_zeltser
  19. Tracing with OpenTelemetry Demo coDE: https://github.com/simonz130/opencensus-csharp-samples @simon_zeltser

  20. Tracing with OpenTelemetry - the options Agentless Using Agent Exporter

    Trace Backend Container/VM Application Agent Trace Backend Container/VM HTTP IN HTTP IN Traces Application Initialize exporter in app code Install the agent alongside the app @simon_zeltser
  21. Tracing with OpenTelemetry - Terminology @simon_zeltser Trace - a collection

    of spans Span - a single operation in a trace Sampler - decide whether to export a span Exporter - sending traces to observability systems
  22. Configure Exporter // Configure exporter to export traces to Zipkin

    var exporter = new ZipkinTraceExporter( new ZipkinTraceExporterOptions() { Endpoint = new Uri(zipkinUri), ServiceName = "tracing-to-zipkin-service", }, Tracing.ExportComponent); exporter.Start(); @simon_zeltser
  23. Configure Sampler // 100% sample rate, otherwise, few traces will

    be sampled. ITraceConfig traceConfig = Tracing.TraceConfig; ITraceParams currentConfig = traceConfig.ActiveTraceParams; var newConfig = currentConfig.ToBuilder() .SetSampler(Samplers.AlwaysSample) .Build(); traceConfig.UpdateActiveTraceParams(newConfig); @simon_zeltser
  24. Using the Tracer // 3. Tracer is global singleton. //

    You can register it via dependency injection if it exists // but if not - you can use it as follows: var tracer = Tracing.Tracer; @simon_zeltser
  25. Create a Span // Create a scoped span - only

    covers “using” block using (var scope = tracer.SpanBuilder("Main").StartScopedSpan()) { for (int i = 0; i < 10; i++) { DoWork(i); } } @simon_zeltser
  26. Trace Context Propagation Frontend AdService Checkout Payment Service EmailService {context}

    {context} {context} {context} Common scenarios • A/B testing TraceId=>{context} Frontend AdService Checkout Payment Email time Trace Span
  27. Trace Context Propagation in OpenTelemetry W3C Standard for Context Propagation

    FORMAT: trace-id "-" parent-id "-" trace-flags trace-id = 32HEXDIG ; 16 bytes array identifier. All zeroes forbidden parent-id = 16HEXDIG ; 8 bytes array identifier. All zeroes forbidden trace-flags = 2HEXDIG ; 8 bit flags. Currently only one bit is used (“recorded”) Supported in all OpenCensus client libraries, will be supported in OpenTelemetry @simon_zeltser
  28. OpenCensus Service WHAT? • Agent & Collector for metrics and

    traces WHY? • Smart Sampling • Export to one or more monitoring/tracing backends OC Service Repo .NET Core OC Agent Repo OpenCensus Tracing Library OpenCensus Monitoring Library Jaeger Zipkin Stackdriver Prometheus AppInsights OpenCensus Agent OpenCensus Collector OpenCensus Service BE Destinations @simon_zeltser
  29. Traces & Sampling Problem: Traced apps can generate many traces

    with many spans • Higher operational costs • Noise @simon_zeltser Solution: Smart Sampling • Head-based (sampled at the beginning of the trace) • Tail-based (sampled at the end of the trace)
  30. DEMO Tail-based sampling Demo coDE @simon_zeltser

  31. Architecture @simon_zeltser Synthetic Load Generator simulates traffic to Hipster Shop

    Demo App Checkout Synthetic Load Generator OpenCensus Service (Collector) Jaeger 1 Jaeger 2 Cassandra Cassandra Hipster Shop App Browse Jaeger 1 - head based sampling Jaeger 2 - tail based sampling Demo repo
  32. Integrations OpenCensus and OpenTracing are integrated with a wide variety

    of frameworks, products and libraries. It provides observability for the following: • Redis • Memcached • Google Cloud • Dropwizard • SQL • Caddy • Go kit • GroupCache • MongoDB @simon_zeltser
  33. Community OpenTelemetry Website: https://opentelemetry.io Specifications: https://github.com/open-telemetry/opentelemetry-specification Meetings: https://github.com/open-telemetry/community @simon_zeltser

  34. Thank you! @simon_zeltser CONTACT ME: zeltser@google.com Slides: https://speakerdeck.com/simonz130 Sample Code:

  35. None