Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distributed Tracing at UBER Scale
Search
Yuri Shkuro
May 24, 2017
Programming
1
440
Distributed Tracing at UBER Scale
Presented at Monitorama-PDF 2017.
Video:
https://vimeo.com/221070602
Yuri Shkuro
May 24, 2017
Tweet
Share
More Decks by Yuri Shkuro
See All by Yuri Shkuro
TEMPLE: Six Pillars of Telemetry
yurishkuro
0
740
Schema-first application telemetry
yurishkuro
0
370
CNCF Webinar Series - Introducing Jaeger 1.0
yurishkuro
1
350
Would You Like Some Tracing With Your Monitoring?
yurishkuro
0
470
From zero to distributed traces: an OpenTracing tutorial
yurishkuro
1
870
Other Decks in Programming
See All in Programming
re:Invent 2025 のイケてるサービスを紹介する
maroon1st
0
170
余白を設計しフロントエンド開発を 加速させる
tsukuha
7
2k
Grafana:建立系統全知視角的捷徑
blueswen
0
310
Denoのセキュリティに関する仕組みの紹介 (toranoana.deno #23)
uki00a
0
260
AIで開発はどれくらい加速したのか?AIエージェントによるコード生成を、現場の評価と研究開発の評価の両面からdeep diveしてみる
daisuketakeda
1
920
今こそ知るべき耐量子計算機暗号(PQC)入門 / PQC: What You Need to Know Now
mackey0225
3
350
OSSとなったswift-buildで Xcodeのビルドを差し替えられるため 自分でXcodeを直せる時代になっている ダイアモンド問題編
yimajo
3
550
ゆくKotlin くるRust
exoego
1
210
CSC307 Lecture 07
javiergs
PRO
0
520
例外処理とどう使い分ける?Result型を使ったエラー設計 #burikaigi
kajitack
16
5.8k
Unicodeどうしてる? PHPから見たUnicode対応と他言語での対応についてのお伺い
youkidearitai
PRO
0
990
AI Agent の開発と運用を支える Durable Execution #AgentsInProd
izumin5210
7
2.2k
Featured
See All Featured
The Cost Of JavaScript in 2023
addyosmani
55
9.4k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
1
43
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
57
From π to Pie charts
rasagy
0
120
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
270
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
570
Testing 201, or: Great Expectations
jmmastey
46
8k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Ethics towards AI in product and experience design
skipperchong
2
180
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
0
3.4k
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
0
98
Transcript
Distributed Tracing at UBER Scale Crea7ng a treasure map for
your monitoring data Yuri Shkuro, UBER Technologies
ABOUT ME • SoAware Engineer on the Observability team in
NYC • Working on the open source distributed tracing system Jaeger • Co-founded the OpenTracing project • Banking industry survivor • Github: yurishkuro • TwiLer: @yurishkuro
Would You Like Some Tracing with Your Monitoring? What does
it take to roll it out?
Why Distributed Tracing • Distributed transac7on monitoring • Performance /
latency op7miza7on • Root cause analysis • Service dependency analysis • Distributed context propaga7on (“baggage”)
JAEGER, Distributed Tracing • Open Source • OpenTracing inside •
In ac7ve development • PRs are welcome • Zipkin compa7ble • github.com/uber/jaeger
Who Thinks Tracing is Awesome?
None
None
Why Doesn’t Everyone Do Tracing?
Tracing Instrumenta7on is HARD EXPENSIVE BORING
Instrumenta7on • Metrics and logging are not new • Tracing
is both new and harder
Context Propaga7on A B C D E {context} {context} {context}
{context} Unique ID → {context} Edge service
Headers: . . . Trace ID . . . Instrumentation
APPLICATION / MICROSERVICE Handler Context [Span] Client Context [Span] Inbound HTTP Request Instrumentation Headers: . . . Trace ID . . . Outbound HTTP Request Context Propaga7on
In-Process Context Propaga7on Implicit, via Thread-Locals but: thread pools, futures
Explicit
It’s Also the Frameworks • Go: stdlib, gorilla, … •
Java: jaxrs2, okhLp, ApacheHLpClient, … • Python: Flask, Django, Tornado, urllib2, … • Node.js – who knows…
OpenTracing to the Rescue
No Help With In-Process Propaga7on • Must be done manually
• UBER has 2000-3000 microservices • Resources of the tracing team are limited • Developers must instrument their code!
BITE MAKE ME! How do we mobilize the org?
Traveling Salesman Problem 2017 edi7on
They Must Want Your Product or S7cks and Carrots
Recap: Why Distributed Tracing • Distributed transac7on monitoring • Performance
/ latency op7miza7on • Root cause analysis • Service dependency analysis • Distributed context propaga7on (“baggage”)
Service Dependency Analysis • Explain to us what we just
built • Who are my dependencies • Workflow analysis • Where is all this traffic coming from? • Service 7ers
Baggage • Tenancy, test or produc7on – Set at the top
– Used at the storage layer, prod or test DB • Authen7ca7on tokens – Signed user or service iden7ty – Checked at mul7ple levels
S7cks and Carrots • Get other teams build features on
top – Performance team – Capacity & cost accoun7ng – Baggage • More carrots • Eventually they become s7cks (peer pressure)
Each Organiza7on is Different Find what works best
How to Measure Adop7on? Measure everything
Does Service X Report Traces? • Daily aggrega7on job •
Auto-book 7ckets • Build a dashboard • Pass/Fail: too easy to pass
Trace Quality Score • Inspect traces – See a caller, but
no spans • Join with other data – Rou7ng logs • Auto-book 7ckets (carefully, not for everyone) – With detailed report
Trace Quality Metrics by Service
Thank You • Jaeger – hLps://github.com/uber/jaeger – Blog: Evolving Distributed
Tracing at UBER – Blog: Take OpenTracing for a HotROD Ride • OpenTracing: hLp://opentracing.io/ • We are hiring • @yurishkuro