Making Service Launches Boring with Distributed Tracing

Making Service Launches Boring with Tracing Karthik Kumar @k4rkum LightStep
Inc.

“My team has Distributed Tracing, I just don’t know when
it’s useful” -Francisco (Last month’s SRE meetup)

Distributed Tracing What is tracing? What are some best practices?
When is it useful?

Disclaimer

What are traces? Logs Traces = CONTEXT Span Trace

How do I get traces?

What can I do with traces?

When is tracing useful?

Scenario: launching a new service

Scenario: my service is crashing

Scenario: my service isn’t doing its job

Scenario: my service is getting DDoS-ed

Scenario: I want to track service SLOs my service’s SLO
my provider’s SLO

Scenario: I want to improve instrumentation

Tracing Best Practices

Tip #1: Trace close to business value Identify important transactions
Identify critical path of request Add traces!

Critical path exposes bottlenecks

Tip #2: Trace communication libraries

Tip #3: Trace your dependencies Identify downstream services/data store drivers/platform
SDKs/3rd party APIs Add or adopt traces!

PaaS Down Detector

Learn from our mistakes

Tip #4: Add useful tags Version tags to track deployments
Feature flags tags to track config changes Host identifier tags to correlate with machine metrics Customer identifier tags to correlate with business value

Summary When is distributed tracing valuable? Rolling out new services
Finding and ﬁxing issues Identifying optimizations

Summary What are some distributed tracing best practices? Trace close
to your caller Trace communication layer Trace dependencies Add useful tags

Questions? https://lightstep.com/careers @k4rkum

Appendix

Metrics

Analyzing Metrics Set thresholds and alerts Look for correlated variance
across metrics Aggregate across dimensions*

Analyzing Logs Complex, full-text search Granular analysis of rare events

Metrics & Logs are not enough Metrics Scoped to a
single system Vulnerable to high-cardinality tags Logs Scoped to a single system Cost scales with usage

Life is a box of trade-oﬀs Logs Metrics Traces Cost
scales gracefully – ✓ ✓ Accounts for all data (i.e., unsampled) ✓ ✓ – Immune to cardinality ✓ – ✓

Making Service Launches Boring with Distributed...

Making Service Launches Boring with Distributed Tracing

More Decks by Karthik Kumar

Other Decks in Programming

Featured

Transcript