Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Tracing and Monitoring with OpenTel...

Distributed Tracing and Monitoring with OpenTelemetry

OpenTelemetry is an emerging standard for tracing and the metrics of cloud services. You can use it to gain observability into applications that span multiple clouds and technological stacks.

I explain how to use open source and vendor-agnostic client libraries for OpenTelemetry and export telemetry to common APM systems such as Zipkin and others. Along the way, we discuss core concepts such as tags, metrics, exporters, zPages, and trace context propagation.

Simon Zeltser

June 13, 2019
Tweet

More Decks by Simon Zeltser

Other Decks in Programming

Transcript

  1. Software Evolution @simon_zeltser On Prem Cloud Virtual Machines Containers Monolith

    Microservices Single Language / Stack Polyglot Single Cloud Multiple Cloud Providers / Hybrid Containers Cloud Functions / Serverless
  2. New architectures => new challenges Debugging Observability Standardizing development practices

    Deployment / Packaging Configuration Management Secrets management @simon_zeltser
  3. Who Am I Software Engineer at Cloud Developer Experience -

    Infrastructure and Operations Tools Over a decade in distributed systems observability github.com/simonz130 @simon_zeltser
  4. What is Observability? Being able to debug the system and

    gain insights into the system’s behavior @simon_zeltser
  5. Signals Holistic Approach: - Distributed Tracing - Metrics Collection -

    Continuous system profiling - Capturing Logs @simon_zeltser
  6. Ultimate Recipe for reliable cloud service Track System Health Capture

    traces, metrics and logs, create alerts on data Detect Problems Locality, networking, scheduling, dependencies Fix & Refine Optimize performance and cost of services Observability Lifecycle @simon_zeltser
  7. Observability in Distributed Systems Hard and Expensive: • Context propagation

    between components • Multiple environments • External dependencies • Vendor Lock-in • Cost @simon_zeltser Architecture from Hipstershop Demo App
  8. Meet OpenTelemetry! @simon_zeltser OpenTelemetry is integrated set of APIs and

    libraries to generate, collect and describe telemetry in distributed systems Problems OpenTelemetry solves: - Vendor neutrality for tracing, monitoring and logging APIs - Context propagation
  9. OpenTelemetry is: • Single set of APIs for tracing and

    metrics collection • Integrations with popular web, RPC and storage frameworks • Standardized Context Propagation • Exporters for sending data to backend of choice • Collector for smart traces & metrics aggregation @simon_zeltser
  10. Demo Getting Started - Metrics collection with OpenTelemetry Demo coDE:

    https://github.com/simonz130/opencensus-csharp-samples @simon_zeltser
  11. Tracing with OpenTelemetry - the options Agentless Using Agent Exporter

    Trace Backend Container/VM Application Agent Trace Backend Container/VM HTTP IN HTTP IN Traces Application Initialize exporter in app code Install the agent alongside the app @simon_zeltser
  12. Tracing with OpenTelemetry - Terminology @simon_zeltser Trace - a collection

    of spans Span - a single operation in a trace Sampler - decide whether to export a span Exporter - sending traces to observability systems
  13. Configure Exporter // Configure exporter to export traces to Zipkin

    var exporter = new ZipkinTraceExporter( new ZipkinTraceExporterOptions() { Endpoint = new Uri(zipkinUri), ServiceName = "tracing-to-zipkin-service", }, Tracing.ExportComponent); exporter.Start(); @simon_zeltser
  14. Configure Sampler // 100% sample rate, otherwise, few traces will

    be sampled. ITraceConfig traceConfig = Tracing.TraceConfig; ITraceParams currentConfig = traceConfig.ActiveTraceParams; var newConfig = currentConfig.ToBuilder() .SetSampler(Samplers.AlwaysSample) .Build(); traceConfig.UpdateActiveTraceParams(newConfig); @simon_zeltser
  15. Using the Tracer // 3. Tracer is global singleton. //

    You can register it via dependency injection if it exists // but if not - you can use it as follows: var tracer = Tracing.Tracer; @simon_zeltser
  16. Create a Span // Create a scoped span - only

    covers “using” block using (var scope = tracer.SpanBuilder("Main").StartScopedSpan()) { for (int i = 0; i < 10; i++) { DoWork(i); } } @simon_zeltser
  17. Trace Context Propagation Frontend AdService Checkout Payment Service EmailService {context}

    {context} {context} {context} Common scenarios • A/B testing TraceId=>{context} Frontend AdService Checkout Payment Email time Trace Span
  18. Trace Context Propagation in OpenTelemetry W3C Standard for Context Propagation

    FORMAT: trace-id "-" parent-id "-" trace-flags trace-id = 32HEXDIG ; 16 bytes array identifier. All zeroes forbidden parent-id = 16HEXDIG ; 8 bytes array identifier. All zeroes forbidden trace-flags = 2HEXDIG ; 8 bit flags. Currently only one bit is used (“recorded”) Supported in all OpenCensus client libraries, will be supported in OpenTelemetry @simon_zeltser
  19. OpenCensus Service WHAT? • Agent & Collector for metrics and

    traces WHY? • Smart Sampling • Export to one or more monitoring/tracing backends OC Service Repo .NET Core OC Agent Repo OpenCensus Tracing Library OpenCensus Monitoring Library Jaeger Zipkin Stackdriver Prometheus AppInsights OpenCensus Agent OpenCensus Collector OpenCensus Service BE Destinations @simon_zeltser
  20. Traces & Sampling Problem: Traced apps can generate many traces

    with many spans • Higher operational costs • Noise @simon_zeltser Solution: Smart Sampling • Head-based (sampled at the beginning of the trace) • Tail-based (sampled at the end of the trace)
  21. Architecture @simon_zeltser Synthetic Load Generator simulates traffic to Hipster Shop

    Demo App Checkout Synthetic Load Generator OpenCensus Service (Collector) Jaeger 1 Jaeger 2 Cassandra Cassandra Hipster Shop App Browse Jaeger 1 - head based sampling Jaeger 2 - tail based sampling Demo repo
  22. Integrations OpenCensus and OpenTracing are integrated with a wide variety

    of frameworks, products and libraries. It provides observability for the following: • Redis • Memcached • Google Cloud • Dropwizard • SQL • Caddy • Go kit • GroupCache • MongoDB @simon_zeltser