@rakyll
OpenTelemetry
at AWS
Jaana Dogan
Principal Engineer, AWS
[email protected]
Slide 2
Slide 2 text
@rakyll
Who?
Jaana Dogan, AWS
Explicit focus on instrumentation
Slide 3
Slide 3 text
@rakyll
Five AWS stories...
Too many agents
Too many formats
Too little correlation
Too many ways to propagate
Too many products to support
Slide 4
Slide 4 text
@rakyll
Too many agents
4-5 agents
Friction in installation
Operational burden
Friction in configuration delivery
Performance penalty
Slide 5
Slide 5 text
@rakyll
Too many formats
EMF
CloudWatch
Prometheus
statsd
Vendor formats
...
X-Ray
Zipkin
Jaeger
Vendor formats
...
Slide 6
Slide 6 text
@rakyll
Too little correlation
Tool fatigue
Disjoint views
Missing metadata
Friction in troubleshooting
Slide 7
Slide 7 text
@rakyll
Too many ways to propagate
Lack of end-to-end traces
Missing label propagation
No W3C TraceContext or B3 support
No runtime propagation standards
Slide 8
Slide 8 text
@rakyll
Too many products to support
CloudWatch
X-Ray
Prometheus
Elasticsearch/OpenSearch
New Relic, Datadog, Splunk, Honeycomb,
Lightstep and more.
Slide 9
Slide 9 text
@rakyll
What do we use?
Specification
Context Propagation
Semantic Conventions
Data Model
Protocol (OTLP)
Collector
Client Libraries
Slide 10
Slide 10 text
@rakyll
What’s next?
collector
Managed on EC2, ECS, EKS, Lambda, etc.
Slide 11
Slide 11 text
@rakyll
What’s next?
collector
Managed on EC2, ECS, EKS, Lambda, etc.
Slide 12
Slide 12 text
@rakyll
What’s next?
collector
Managed on EC2, ECS, EKS, Lambda, etc.
OTLP
Prometheus
statsd
X-Ray
Jaeger
Zipkin
Slide 13
Slide 13 text
@rakyll
What’s next?
collector
Managed on EC2, ECS, EKS, Lambda, etc.
OTLP
Prometheus
statsd
X-Ray
Jaeger
Zipkin
CloudWatch
Prometheus
X-Ray
Elastic/OpenSearch
Jaeger
Zipkin
Vendors
Raw storage
Slide 14
Slide 14 text
@rakyll
What’s next?
collector
Managed on EC2, ECS, EKS, Lambda, etc.
OTLP
Prometheus
statsd
X-Ray
Jaeger
Zipkin
CloudWatch
Prometheus
X-Ray
Jaeger
Zipkin
Vendors
Raw storage
enrich, transform, ...
Slide 15
Slide 15 text
@rakyll
Container Insights
now collected by
OpenTelemetry.
Slide 16
Slide 16 text
@rakyll
What do we use?
Specification
Context Propagation
Semantic Conventions
Data Model
Protocol (OTLP)
Collector
Client Libraries
Slide 17
Slide 17 text
@rakyll
What works well?
Flexible
Composable
Lightweight enough
Holistic
Legacy protocol friendly
Community
Slide 18
Slide 18 text
@rakyll
What challenges us?
Stability
Custom builds
Compatibility (Prometheus & CloudWatch)
Boilerplate in client libraries
Slide 19
Slide 19 text
@rakyll
What are we
working on next?
Slide 20
Slide 20 text
@rakyll
Prometheus
Slide 21
Slide 21 text
@rakyll
Prometheus
Drop-in replacement for Prometheus
Data model changes
Remote write compliance
Discovery + scrape config compliance
Kubernetes operator