$30 off During Our Annual Pro Sale. View Details »

Getting Started with OpenTelemetry on Kubernetes

Getting Started with OpenTelemetry on Kubernetes

- What is Telemetry and why do we need to have an open standard for it
- The History and lineage of OpenTelemetry
- Architecture
- Component Layout
- Basic Sampling, constant, probabilistc and tail-sampling
- Caveat of using tail-sampling w.r.t. scalability
- Kubernetes Deployment of OpenTelemetry stack
- Quick Demo of open-source code
- Q&A

Joy Bhattacherjee

July 25, 2020
Tweet

More Decks by Joy Bhattacherjee

Other Decks in Technology

Transcript

  1. Getting Started with
    OpenTelemetry
    On Kubernetes
    @hashfyre
    Consultant @ signoz.io

    View Slide

  2. What is telemetry?
    Why does it need to be “Open”?

    View Slide

  3. Observability (control theory)
    The mathematical dual of controllability of a complex system.
    System
    Visibility
    Understanding
    Architecture
    Instrumentation

    View Slide

  4. telemetry
    the process of recording and transmitting the readings of an
    instrument.
    instrumentation
    Instrumentation is a collective term for measuring
    instruments that are used for indicating, measuring and
    recording physical quantities (of systems)

    View Slide

  5. The Three Pillars, a Taxonomy
    Logs Metrics Traces
    Plantext
    Structured
    Binary RED USE
    SLI
    SLO
    Violation
    Alerting Playbooks
    Recovery
    Tracing
    Exception
    Handling
    Debugging
    Profiling
    RCA
    RCA
    Audit
    Anomaly
    Capacity

    View Slide

  6. Distributed Tracing: Logging + Context
    ● Assign UUID to Each Request
    ● Context = UUID + metadata
    ● Next Request = Payload + Context
    ● Baggage = Set(K1:V1, K2:V2, ...)
    ● Async capture:
    ○ Timing
    ○ Events
    ○ Tags
    ● Re-create call tree from store
    A
    B C
    D E
    service = A
    service = A,
    service = B
    service = A,
    service = C
    service = A,
    service = C,
    Service = D
    service = A,
    service = C,
    Service = E

    View Slide

  7. Lineage of an open-standard

    View Slide

  8. A brief
    history...

    View Slide

  9. https://opentelemetry.devstats.cncf.io/d/4/company-statistics-by-repository-group?orgId=1&from=now-6M&to=now

    View Slide

  10. Architecture

    View Slide

  11. ● SDKs
    ● Receivers
    ● Processors
    ● Exporters
    ● Data Sinks

    View Slide

  12. Internal Component Layout

    View Slide

  13. green core components
    red contrib components
    The codebase needs
    to be re-compiled if you
    want to include
    assorted contrib
    components
    Only pre-compiled
    components can later
    be referenced in a
    pipeline

    View Slide

  14. Deploying to Kubernetes

    View Slide

  15. ● Load-generator
    ● Otel-collector(agent)
    ● Otel-collector (server)
    ● Data Sinks

    View Slide

  16. Pipeline definition
    receivers:
    opencensus:
    zipkin:
    endpoint: 0.0.0.0:9411
    jaeger:
    protocols:
    thrift_http:
    prometheus:
    config:
    scrape_configs:
    - job_name: 'load_generator_app'
    scrape_interval: 3s
    static_configs:
    - targets: ['load-generator:9001']
    exporters:
    opencensus:
    endpoint: "otel-collector:55678"
    insecure: true
    logging:
    loglevel: debug
    processors:
    batch:
    queued_retry:
    service:
    pipelines:
    traces:
    receivers: [opencensus, jaeger, zipkin]
    processors: [batch, queued_retry]
    exporters: [opencensus, logging]
    metrics:
    receivers: [opencensus, prometheus]
    exporters: [logging,opencensus]

    View Slide

  17. Tail Sampling
    processors:
    tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
    [
    {
    name: sampleNoErrors,
    type: numeric_attribute,
    numeric_attribute: {key: status.code, min_value: 0, max_value: 0}
    },
    {
    name: sample200,
    type: string_attribute,
    string_attribute: {key: http.status_code, values: ["200"]}
    },
    {
    name: ratelimit35,
    type: rate_limiting,
    rate_limiting: {spans_per_second: 35}
    }
    ]

    View Slide

  18. https://github.com/Hashfyre/otel-k8s
    Demo

    View Slide

  19. Questions?

    View Slide

  20. References
    ● https://storage.googleapis.com/pub-tools-public-publicati
    on-data/pdf/36356.pdf
    ● https://opensource.googleblog.com/2018/01/opencensus
    .html
    ● https://blog.twitter.com/engineering/en_us/a/2012/distri
    buted-systems-tracing-with-zipkin.html
    ● https://eng.uber.com/distributed-tracing/
    ● https://medium.com/opentracing/towards-turnkey-distri
    buted-tracing-5f4297d1736
    ● https://medium.com/@AloisReitbauer/trace-context-and-
    the-road-toward-trace-tool-interoperability-d4d5693236
    9c
    ● https://medium.com/opentracing/merging-opentracing-and-
    opencensus-f0fe9c7ca6f0
    ● https://github.com/open-telemetry/opentelemetry-collec
    tor
    ● https://github.com/open-telemetry/opentelemetry-specifi
    cation
    https://pastebin.com/8dYNk0sR

    View Slide

  21. Thanks.
    @hashfyre signoz.io

    View Slide