Slide 1

Slide 1 text

Getting Started with OpenTelemetry On Kubernetes @hashfyre Consultant @ signoz.io

Slide 2

Slide 2 text

What is telemetry? Why does it need to be “Open”?

Slide 3

Slide 3 text

Observability (control theory) The mathematical dual of controllability of a complex system. System Visibility Understanding Architecture Instrumentation

Slide 4

Slide 4 text

telemetry the process of recording and transmitting the readings of an instrument. instrumentation Instrumentation is a collective term for measuring instruments that are used for indicating, measuring and recording physical quantities (of systems)

Slide 5

Slide 5 text

The Three Pillars, a Taxonomy Logs Metrics Traces Plantext Structured Binary RED USE SLI SLO Violation Alerting Playbooks Recovery Tracing Exception Handling Debugging Profiling RCA RCA Audit Anomaly Capacity

Slide 6

Slide 6 text

Distributed Tracing: Logging + Context ● Assign UUID to Each Request ● Context = UUID + metadata ● Next Request = Payload + Context ● Baggage = Set(K1:V1, K2:V2, ...) ● Async capture: ○ Timing ○ Events ○ Tags ● Re-create call tree from store A B C D E service = A service = A, service = B service = A, service = C service = A, service = C, Service = D service = A, service = C, Service = E

Slide 7

Slide 7 text

Lineage of an open-standard

Slide 8

Slide 8 text

A brief history...

Slide 9

Slide 9 text

https://opentelemetry.devstats.cncf.io/d/4/company-statistics-by-repository-group?orgId=1&from=now-6M&to=now

Slide 10

Slide 10 text

Architecture

Slide 11

Slide 11 text

● SDKs ● Receivers ● Processors ● Exporters ● Data Sinks

Slide 12

Slide 12 text

Internal Component Layout

Slide 13

Slide 13 text

green core components red contrib components The codebase needs to be re-compiled if you want to include assorted contrib components Only pre-compiled components can later be referenced in a pipeline

Slide 14

Slide 14 text

Deploying to Kubernetes

Slide 15

Slide 15 text

● Load-generator ● Otel-collector(agent) ● Otel-collector (server) ● Data Sinks

Slide 16

Slide 16 text

Pipeline definition receivers: opencensus: zipkin: endpoint: 0.0.0.0:9411 jaeger: protocols: thrift_http: prometheus: config: scrape_configs: - job_name: 'load_generator_app' scrape_interval: 3s static_configs: - targets: ['load-generator:9001'] exporters: opencensus: endpoint: "otel-collector:55678" insecure: true logging: loglevel: debug processors: batch: queued_retry: service: pipelines: traces: receivers: [opencensus, jaeger, zipkin] processors: [batch, queued_retry] exporters: [opencensus, logging] metrics: receivers: [opencensus, prometheus] exporters: [logging,opencensus]

Slide 17

Slide 17 text

Tail Sampling processors: tail_sampling: decision_wait: 10s num_traces: 100 expected_new_traces_per_sec: 10 policies: [ { name: sampleNoErrors, type: numeric_attribute, numeric_attribute: {key: status.code, min_value: 0, max_value: 0} }, { name: sample200, type: string_attribute, string_attribute: {key: http.status_code, values: ["200"]} }, { name: ratelimit35, type: rate_limiting, rate_limiting: {spans_per_second: 35} } ]

Slide 18

Slide 18 text

https://github.com/Hashfyre/otel-k8s Demo

Slide 19

Slide 19 text

Questions?

Slide 20

Slide 20 text

References ● https://storage.googleapis.com/pub-tools-public-publicati on-data/pdf/36356.pdf ● https://opensource.googleblog.com/2018/01/opencensus .html ● https://blog.twitter.com/engineering/en_us/a/2012/distri buted-systems-tracing-with-zipkin.html ● https://eng.uber.com/distributed-tracing/ ● https://medium.com/opentracing/towards-turnkey-distri buted-tracing-5f4297d1736 ● https://medium.com/@AloisReitbauer/trace-context-and- the-road-toward-trace-tool-interoperability-d4d5693236 9c ● https://medium.com/opentracing/merging-opentracing-and- opencensus-f0fe9c7ca6f0 ● https://github.com/open-telemetry/opentelemetry-collec tor ● https://github.com/open-telemetry/opentelemetry-specifi cation https://pastebin.com/8dYNk0sR

Slide 21

Slide 21 text

Thanks. @hashfyre signoz.io