$30 off During Our Annual Pro Sale. View Details »

OpenTelemetry at AWS

JBD
May 18, 2021

OpenTelemetry at AWS

Register and watch this talk now! https://o11yfest.org/attend.

JBD

May 18, 2021
Tweet

More Decks by JBD

Other Decks in Programming

Transcript

  1. @rakyll
    OpenTelemetry
    at AWS
    Jaana Dogan
    Principal Engineer, AWS
    [email protected]

    View Slide

  2. @rakyll
    Who?
    Jaana Dogan, AWS
    Explicit focus on instrumentation

    View Slide

  3. @rakyll
    Five AWS stories...
    Too many agents
    Too many formats
    Too little correlation
    Too many ways to propagate
    Too many products to support

    View Slide

  4. @rakyll
    Too many agents
    4-5 agents
    Friction in installation
    Operational burden
    Friction in configuration delivery
    Performance penalty

    View Slide

  5. @rakyll
    Too many formats
    EMF
    CloudWatch
    Prometheus
    statsd
    Vendor formats
    ...
    X-Ray
    Zipkin
    Jaeger
    Vendor formats
    ...

    View Slide

  6. @rakyll
    Too little correlation
    Tool fatigue
    Disjoint views
    Missing metadata
    Friction in troubleshooting

    View Slide

  7. @rakyll
    Too many ways to propagate
    Lack of end-to-end traces
    Missing label propagation
    No W3C TraceContext or B3 support
    No runtime propagation standards

    View Slide

  8. @rakyll
    Too many products to support
    CloudWatch
    X-Ray
    Prometheus
    Elasticsearch/OpenSearch
    New Relic, Datadog, Splunk, Honeycomb,
    Lightstep and more.

    View Slide

  9. @rakyll
    What do we use?
    Specification
    Context Propagation
    Semantic Conventions
    Data Model
    Protocol (OTLP)
    Collector
    Client Libraries

    View Slide

  10. @rakyll
    What’s next?
    collector
    Managed on EC2, ECS, EKS, Lambda, etc.

    View Slide

  11. @rakyll
    What’s next?
    collector
    Managed on EC2, ECS, EKS, Lambda, etc.

    View Slide

  12. @rakyll
    What’s next?
    collector
    Managed on EC2, ECS, EKS, Lambda, etc.
    OTLP
    Prometheus
    statsd
    X-Ray
    Jaeger
    Zipkin

    View Slide

  13. @rakyll
    What’s next?
    collector
    Managed on EC2, ECS, EKS, Lambda, etc.
    OTLP
    Prometheus
    statsd
    X-Ray
    Jaeger
    Zipkin
    CloudWatch
    Prometheus
    X-Ray
    Elastic/OpenSearch
    Jaeger
    Zipkin
    Vendors
    Raw storage

    View Slide

  14. @rakyll
    What’s next?
    collector
    Managed on EC2, ECS, EKS, Lambda, etc.
    OTLP
    Prometheus
    statsd
    X-Ray
    Jaeger
    Zipkin
    CloudWatch
    Prometheus
    X-Ray
    Jaeger
    Zipkin
    Vendors
    Raw storage
    enrich, transform, ...

    View Slide

  15. @rakyll
    Container Insights
    now collected by
    OpenTelemetry.

    View Slide

  16. @rakyll
    What do we use?
    Specification
    Context Propagation
    Semantic Conventions
    Data Model
    Protocol (OTLP)
    Collector
    Client Libraries

    View Slide

  17. @rakyll
    What works well?
    Flexible
    Composable
    Lightweight enough
    Holistic
    Legacy protocol friendly
    Community

    View Slide

  18. @rakyll
    What challenges us?
    Stability
    Custom builds
    Compatibility (Prometheus & CloudWatch)
    Boilerplate in client libraries

    View Slide

  19. @rakyll
    What are we
    working on next?

    View Slide

  20. @rakyll
    Prometheus

    View Slide

  21. @rakyll
    Prometheus
    Drop-in replacement for Prometheus
    Data model changes
    Remote write compliance
    Discovery + scrape config compliance
    Kubernetes operator

    View Slide

  22. @rakyll
    Components
    Container Insights receivers and processors
    CloudWatch histogram compatibility
    CloudWatch Logs exporter
    S3 exporter

    View Slide

  23. @rakyll
    Propagation
    Adopting 128-bit trace IDs in X-Ray
    Context propagation in SQL

    View Slide

  24. @rakyll
    Platforms
    EC2
    ECS
    EKS
    Lambda
    (and control plane components...)

    View Slide

  25. @rakyll
    Lambda support

    View Slide

  26. @rakyll
    Others...
    eBPF
    Profiles
    Real time user monitoring
    Network diagnostics
    Database performance

    View Slide

  27. @rakyll
    One more thing...

    View Slide

  28. @rakyll
    Exporting to vendors?
    Vended data streams
    CloudWatch Metric Streams support OTLP
    CW Metrics
    S3 (in JSON or OTLP)
    Kinesis (in JSON or OTLP)

    View Slide

  29. @rakyll
    It’s not a fork. It’s a snapshot for security, performance, support.

    View Slide

  30. @rakyll
    Thank you
    Jaana Dogan
    Principal Engineer, AWS
    [email protected]

    View Slide