Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distributed Tracing at UBER Scale
Search
Yuri Shkuro
May 24, 2017
Programming
1
410
Distributed Tracing at UBER Scale
Presented at Monitorama-PDF 2017.
Video:
https://vimeo.com/221070602
Yuri Shkuro
May 24, 2017
Tweet
Share
More Decks by Yuri Shkuro
See All by Yuri Shkuro
TEMPLE: Six Pillars of Telemetry
yurishkuro
0
630
Schema-first application telemetry
yurishkuro
0
320
CNCF Webinar Series - Introducing Jaeger 1.0
yurishkuro
1
340
Would You Like Some Tracing With Your Monitoring?
yurishkuro
0
430
From zero to distributed traces: an OpenTracing tutorial
yurishkuro
1
830
Other Decks in Programming
See All in Programming
Agent Rules as Domain Parser
yodakeisuke
1
460
"使いづらい" をリバースエンジニアリングする UI の読み解き方
rebase_engineering
0
130
Perlで痩せる
yuukis
1
670
少数精鋭エンジニアがフルスタック力を磨く理由 -そしてAI時代へ-
rebase_engineering
0
150
RubyKaigiで得られる10の価値 〜Ruby話を聞くことだけが RubyKaigiじゃない〜
tomohiko9090
0
130
カクヨムAndroidアプリのリブート
numeroanddev
0
320
インターフェース設計のコツとツボ
togishima
2
680
統一感のある Go コードを生成 AI の力で手にいれる
otakakot
0
2.1k
セキュリティマネジャー廃止とクラウドネイティブ型サンドボックス活用
kazumura
1
150
JSAI2025 RecSysChallenge2024 優勝報告
unonao
1
430
コードに語らせよう――自己ドキュメント化が内包する楽しさについて / Let the Code Speak
nrslib
6
1.4k
Parallel::Pipesの紹介
skaji
2
900
Featured
See All Featured
The Invisible Side of Design
smashingmag
299
50k
Fireside Chat
paigeccino
37
3.5k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
Statistics for Hackers
jakevdp
799
220k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
137
34k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.5k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Optimizing for Happiness
mojombo
379
70k
Balancing Empowerment & Direction
lara
1
110
Embracing the Ebb and Flow
colly
85
4.7k
How to train your dragon (web standard)
notwaldorf
92
6.1k
Transcript
Distributed Tracing at UBER Scale Crea7ng a treasure map for
your monitoring data Yuri Shkuro, UBER Technologies
ABOUT ME • SoAware Engineer on the Observability team in
NYC • Working on the open source distributed tracing system Jaeger • Co-founded the OpenTracing project • Banking industry survivor • Github: yurishkuro • TwiLer: @yurishkuro
Would You Like Some Tracing with Your Monitoring? What does
it take to roll it out?
Why Distributed Tracing • Distributed transac7on monitoring • Performance /
latency op7miza7on • Root cause analysis • Service dependency analysis • Distributed context propaga7on (“baggage”)
JAEGER, Distributed Tracing • Open Source • OpenTracing inside •
In ac7ve development • PRs are welcome • Zipkin compa7ble • github.com/uber/jaeger
Who Thinks Tracing is Awesome?
None
None
Why Doesn’t Everyone Do Tracing?
Tracing Instrumenta7on is HARD EXPENSIVE BORING
Instrumenta7on • Metrics and logging are not new • Tracing
is both new and harder
Context Propaga7on A B C D E {context} {context} {context}
{context} Unique ID → {context} Edge service
Headers: . . . Trace ID . . . Instrumentation
APPLICATION / MICROSERVICE Handler Context [Span] Client Context [Span] Inbound HTTP Request Instrumentation Headers: . . . Trace ID . . . Outbound HTTP Request Context Propaga7on
In-Process Context Propaga7on Implicit, via Thread-Locals but: thread pools, futures
Explicit
It’s Also the Frameworks • Go: stdlib, gorilla, … •
Java: jaxrs2, okhLp, ApacheHLpClient, … • Python: Flask, Django, Tornado, urllib2, … • Node.js – who knows…
OpenTracing to the Rescue
No Help With In-Process Propaga7on • Must be done manually
• UBER has 2000-3000 microservices • Resources of the tracing team are limited • Developers must instrument their code!
BITE MAKE ME! How do we mobilize the org?
Traveling Salesman Problem 2017 edi7on
They Must Want Your Product or S7cks and Carrots
Recap: Why Distributed Tracing • Distributed transac7on monitoring • Performance
/ latency op7miza7on • Root cause analysis • Service dependency analysis • Distributed context propaga7on (“baggage”)
Service Dependency Analysis • Explain to us what we just
built • Who are my dependencies • Workflow analysis • Where is all this traffic coming from? • Service 7ers
Baggage • Tenancy, test or produc7on – Set at the top
– Used at the storage layer, prod or test DB • Authen7ca7on tokens – Signed user or service iden7ty – Checked at mul7ple levels
S7cks and Carrots • Get other teams build features on
top – Performance team – Capacity & cost accoun7ng – Baggage • More carrots • Eventually they become s7cks (peer pressure)
Each Organiza7on is Different Find what works best
How to Measure Adop7on? Measure everything
Does Service X Report Traces? • Daily aggrega7on job •
Auto-book 7ckets • Build a dashboard • Pass/Fail: too easy to pass
Trace Quality Score • Inspect traces – See a caller, but
no spans • Join with other data – Rou7ng logs • Auto-book 7ckets (carefully, not for everyone) – With detailed report
Trace Quality Metrics by Service
Thank You • Jaeger – hLps://github.com/uber/jaeger – Blog: Evolving Distributed
Tracing at UBER – Blog: Take OpenTracing for a HotROD Ride • OpenTracing: hLp://opentracing.io/ • We are hiring • @yurishkuro