Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting Started with OpenTelemetry on Kubernetes

Getting Started with OpenTelemetry on Kubernetes

- What is Telemetry and why do we need to have an open standard for it
- The History and lineage of OpenTelemetry
- Architecture
- Component Layout
- Basic Sampling, constant, probabilistc and tail-sampling
- Caveat of using tail-sampling w.r.t. scalability
- Kubernetes Deployment of OpenTelemetry stack
- Quick Demo of open-source code
- Q&A

Joy Bhattacherjee

July 25, 2020

More Decks by Joy Bhattacherjee

Other Decks in Technology


  1. Observability (control theory) The mathematical dual of controllability of a

    complex system. System Visibility Understanding Architecture Instrumentation
  2. telemetry the process of recording and transmitting the readings of

    an instrument. instrumentation Instrumentation is a collective term for measuring instruments that are used for indicating, measuring and recording physical quantities (of systems)
  3. The Three Pillars, a Taxonomy Logs Metrics Traces Plantext Structured

    Binary RED USE SLI SLO Violation Alerting Playbooks Recovery Tracing Exception Handling Debugging Profiling RCA RCA Audit Anomaly Capacity
  4. Distributed Tracing: Logging + Context • Assign UUID to Each

    Request • Context = UUID + metadata • Next Request = Payload + Context • Baggage = Set(K1:V1, K2:V2, ...) • Async capture: ◦ Timing ◦ Events ◦ Tags • Re-create call tree from store A B C D E service = A service = A, service = B service = A, service = C service = A, service = C, Service = D service = A, service = C, Service = E
  5. green core components red contrib components The codebase needs to

    be re-compiled if you want to include assorted contrib components Only pre-compiled components can later be referenced in a pipeline
  6. Pipeline definition receivers: opencensus: zipkin: endpoint: jaeger: protocols: thrift_http:

    prometheus: config: scrape_configs: - job_name: 'load_generator_app' scrape_interval: 3s static_configs: - targets: ['load-generator:9001'] exporters: opencensus: endpoint: "otel-collector:55678" insecure: true logging: loglevel: debug processors: batch: queued_retry: service: pipelines: traces: receivers: [opencensus, jaeger, zipkin] processors: [batch, queued_retry] exporters: [opencensus, logging] metrics: receivers: [opencensus, prometheus] exporters: [logging,opencensus]
  7. Tail Sampling processors: tail_sampling: decision_wait: 10s num_traces: 100 expected_new_traces_per_sec: 10

    policies: [ { name: sampleNoErrors, type: numeric_attribute, numeric_attribute: {key: status.code, min_value: 0, max_value: 0} }, { name: sample200, type: string_attribute, string_attribute: {key: http.status_code, values: ["200"]} }, { name: ratelimit35, type: rate_limiting, rate_limiting: {spans_per_second: 35} } ]
  8. References • https://storage.googleapis.com/pub-tools-public-publicati on-data/pdf/36356.pdf • https://opensource.googleblog.com/2018/01/opencensus .html • https://blog.twitter.com/engineering/en_us/a/2012/distri buted-systems-tracing-with-zipkin.html

    • https://eng.uber.com/distributed-tracing/ • https://medium.com/opentracing/towards-turnkey-distri buted-tracing-5f4297d1736 • https://medium.com/@AloisReitbauer/trace-context-and- the-road-toward-trace-tool-interoperability-d4d5693236 9c • https://medium.com/opentracing/merging-opentracing-and- opencensus-f0fe9c7ca6f0 • https://github.com/open-telemetry/opentelemetry-collec tor • https://github.com/open-telemetry/opentelemetry-specifi cation https://pastebin.com/8dYNk0sR