Slide 1

Slide 1 text

@rakyll OpenTelemetry at AWS Jaana Dogan Principal Engineer, AWS [email protected]

Slide 2

Slide 2 text

@rakyll Who? Jaana Dogan, AWS Explicit focus on instrumentation

Slide 3

Slide 3 text

@rakyll Five AWS stories... Too many agents Too many formats Too little correlation Too many ways to propagate Too many products to support

Slide 4

Slide 4 text

@rakyll Too many agents 4-5 agents Friction in installation Operational burden Friction in configuration delivery Performance penalty

Slide 5

Slide 5 text

@rakyll Too many formats EMF CloudWatch Prometheus statsd Vendor formats ... X-Ray Zipkin Jaeger Vendor formats ...

Slide 6

Slide 6 text

@rakyll Too little correlation Tool fatigue Disjoint views Missing metadata Friction in troubleshooting

Slide 7

Slide 7 text

@rakyll Too many ways to propagate Lack of end-to-end traces Missing label propagation No W3C TraceContext or B3 support No runtime propagation standards

Slide 8

Slide 8 text

@rakyll Too many products to support CloudWatch X-Ray Prometheus Elasticsearch/OpenSearch New Relic, Datadog, Splunk, Honeycomb, Lightstep and more.

Slide 9

Slide 9 text

@rakyll What do we use? Specification Context Propagation Semantic Conventions Data Model Protocol (OTLP) Collector Client Libraries

Slide 10

Slide 10 text

@rakyll What’s next? collector Managed on EC2, ECS, EKS, Lambda, etc.

Slide 11

Slide 11 text

@rakyll What’s next? collector Managed on EC2, ECS, EKS, Lambda, etc.

Slide 12

Slide 12 text

@rakyll What’s next? collector Managed on EC2, ECS, EKS, Lambda, etc. OTLP Prometheus statsd X-Ray Jaeger Zipkin

Slide 13

Slide 13 text

@rakyll What’s next? collector Managed on EC2, ECS, EKS, Lambda, etc. OTLP Prometheus statsd X-Ray Jaeger Zipkin CloudWatch Prometheus X-Ray Elastic/OpenSearch Jaeger Zipkin Vendors Raw storage

Slide 14

Slide 14 text

@rakyll What’s next? collector Managed on EC2, ECS, EKS, Lambda, etc. OTLP Prometheus statsd X-Ray Jaeger Zipkin CloudWatch Prometheus X-Ray Jaeger Zipkin Vendors Raw storage enrich, transform, ...

Slide 15

Slide 15 text

@rakyll Container Insights now collected by OpenTelemetry.

Slide 16

Slide 16 text

@rakyll What do we use? Specification Context Propagation Semantic Conventions Data Model Protocol (OTLP) Collector Client Libraries

Slide 17

Slide 17 text

@rakyll What works well? Flexible Composable Lightweight enough Holistic Legacy protocol friendly Community

Slide 18

Slide 18 text

@rakyll What challenges us? Stability Custom builds Compatibility (Prometheus & CloudWatch) Boilerplate in client libraries

Slide 19

Slide 19 text

@rakyll What are we working on next?

Slide 20

Slide 20 text

@rakyll Prometheus

Slide 21

Slide 21 text

@rakyll Prometheus Drop-in replacement for Prometheus Data model changes Remote write compliance Discovery + scrape config compliance Kubernetes operator

Slide 22

Slide 22 text

@rakyll Components Container Insights receivers and processors CloudWatch histogram compatibility CloudWatch Logs exporter S3 exporter

Slide 23

Slide 23 text

@rakyll Propagation Adopting 128-bit trace IDs in X-Ray Context propagation in SQL

Slide 24

Slide 24 text

@rakyll Platforms EC2 ECS EKS Lambda (and control plane components...)

Slide 25

Slide 25 text

@rakyll Lambda support

Slide 26

Slide 26 text

@rakyll Others... eBPF Profiles Real time user monitoring Network diagnostics Database performance

Slide 27

Slide 27 text

@rakyll One more thing...

Slide 28

Slide 28 text

@rakyll Exporting to vendors? Vended data streams CloudWatch Metric Streams support OTLP CW Metrics S3 (in JSON or OTLP) Kinesis (in JSON or OTLP)

Slide 29

Slide 29 text

@rakyll It’s not a fork. It’s a snapshot for security, performance, support.

Slide 30

Slide 30 text

@rakyll Thank you Jaana Dogan Principal Engineer, AWS [email protected]