$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distributed Tracing at UBER Scale
Search
Yuri Shkuro
May 24, 2017
Programming
1
440
Distributed Tracing at UBER Scale
Presented at Monitorama-PDF 2017.
Video:
https://vimeo.com/221070602
Yuri Shkuro
May 24, 2017
Tweet
Share
More Decks by Yuri Shkuro
See All by Yuri Shkuro
TEMPLE: Six Pillars of Telemetry
yurishkuro
0
720
Schema-first application telemetry
yurishkuro
0
360
CNCF Webinar Series - Introducing Jaeger 1.0
yurishkuro
1
350
Would You Like Some Tracing With Your Monitoring?
yurishkuro
0
460
From zero to distributed traces: an OpenTracing tutorial
yurishkuro
1
850
Other Decks in Programming
See All in Programming
S3 VectorsとStrands Agentsを利用したAgentic RAGシステムの構築
tosuri13
4
250
Combinatorial Interview Problems with Backtracking Solutions - From Imperative Procedural Programming to Declarative Functional Programming - Part 1
philipschwarz
PRO
0
120
UIデザインに役立つ 2025年の最新CSS / The Latest CSS for UI Design 2025
clockmaker
16
6.3k
ハイパーメディア駆動アプリケーションとIslandアーキテクチャ: htmxによるWebアプリケーション開発と動的UIの局所的適用
nowaki28
0
310
分散DBって何者なんだ... Spannerから学ぶRDBとの違い
iwashi623
0
160
Integrating WordPress and Symfony
alexandresalome
0
110
堅牢なフロントエンドテスト基盤を構築するために行った取り組み
shogo4131
4
1.8k
AI時代もSEOを頑張っている話
shirahama_x
0
220
Building AI Agents with TypeScript #TSKaigiHokuriku
izumin5210
5
1.2k
TVerのWeb内製化 - 開発スピードと品質を両立させるまでの道のり
techtver
PRO
3
1.3k
『実践MLOps』から学ぶ DevOps for ML
nsakki55
2
540
TypeScript 5.9 で使えるようになった import defer でパフォーマンス最適化を実現する
bicstone
1
910
Featured
See All Featured
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
A Modern Web Designer's Workflow
chriscoyier
697
190k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
960
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
The Language of Interfaces
destraynor
162
25k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Transcript
Distributed Tracing at UBER Scale Crea7ng a treasure map for
your monitoring data Yuri Shkuro, UBER Technologies
ABOUT ME • SoAware Engineer on the Observability team in
NYC • Working on the open source distributed tracing system Jaeger • Co-founded the OpenTracing project • Banking industry survivor • Github: yurishkuro • TwiLer: @yurishkuro
Would You Like Some Tracing with Your Monitoring? What does
it take to roll it out?
Why Distributed Tracing • Distributed transac7on monitoring • Performance /
latency op7miza7on • Root cause analysis • Service dependency analysis • Distributed context propaga7on (“baggage”)
JAEGER, Distributed Tracing • Open Source • OpenTracing inside •
In ac7ve development • PRs are welcome • Zipkin compa7ble • github.com/uber/jaeger
Who Thinks Tracing is Awesome?
None
None
Why Doesn’t Everyone Do Tracing?
Tracing Instrumenta7on is HARD EXPENSIVE BORING
Instrumenta7on • Metrics and logging are not new • Tracing
is both new and harder
Context Propaga7on A B C D E {context} {context} {context}
{context} Unique ID → {context} Edge service
Headers: . . . Trace ID . . . Instrumentation
APPLICATION / MICROSERVICE Handler Context [Span] Client Context [Span] Inbound HTTP Request Instrumentation Headers: . . . Trace ID . . . Outbound HTTP Request Context Propaga7on
In-Process Context Propaga7on Implicit, via Thread-Locals but: thread pools, futures
Explicit
It’s Also the Frameworks • Go: stdlib, gorilla, … •
Java: jaxrs2, okhLp, ApacheHLpClient, … • Python: Flask, Django, Tornado, urllib2, … • Node.js – who knows…
OpenTracing to the Rescue
No Help With In-Process Propaga7on • Must be done manually
• UBER has 2000-3000 microservices • Resources of the tracing team are limited • Developers must instrument their code!
BITE MAKE ME! How do we mobilize the org?
Traveling Salesman Problem 2017 edi7on
They Must Want Your Product or S7cks and Carrots
Recap: Why Distributed Tracing • Distributed transac7on monitoring • Performance
/ latency op7miza7on • Root cause analysis • Service dependency analysis • Distributed context propaga7on (“baggage”)
Service Dependency Analysis • Explain to us what we just
built • Who are my dependencies • Workflow analysis • Where is all this traffic coming from? • Service 7ers
Baggage • Tenancy, test or produc7on – Set at the top
– Used at the storage layer, prod or test DB • Authen7ca7on tokens – Signed user or service iden7ty – Checked at mul7ple levels
S7cks and Carrots • Get other teams build features on
top – Performance team – Capacity & cost accoun7ng – Baggage • More carrots • Eventually they become s7cks (peer pressure)
Each Organiza7on is Different Find what works best
How to Measure Adop7on? Measure everything
Does Service X Report Traces? • Daily aggrega7on job •
Auto-book 7ckets • Build a dashboard • Pass/Fail: too easy to pass
Trace Quality Score • Inspect traces – See a caller, but
no spans • Join with other data – Rou7ng logs • Auto-book 7ckets (carefully, not for everyone) – With detailed report
Trace Quality Metrics by Service
Thank You • Jaeger – hLps://github.com/uber/jaeger – Blog: Evolving Distributed
Tracing at UBER – Blog: Take OpenTracing for a HotROD Ride • OpenTracing: hLp://opentracing.io/ • We are hiring • @yurishkuro