Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
590
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
640
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
KoogではじめるAIエージェント開発
hiroaki404
1
160
Vue 3.6 時代のリアクティビティ最前線 〜Vapor/alien-signals の実践とパフォーマンス最適化〜
hiranuma
2
260
O Que É e Como Funciona o PHP-FPM?
marcelgsantos
0
230
Vueのバリデーション、結局どれを選べばいい? ― 自作バリデーションの限界と、脱却までの道のり ― / Which Vue Validation Library Should We Really Use? The Limits of Self-Made Validation and How I Finally Moved On
neginasu
2
1.7k
なんでRustの環境構築してないのにRust製のツールが動くの? / Why Do Rust-Based Tools Run Without a Rust Environment?
ssssota
14
47k
20251016_Rails News ~Rails 8.1の足音を聴く~
morimorihoge
3
880
SODA - FACT BOOK(JP)
sodainc
1
8.9k
スマホから Youtube Shortsを見られないようにする
lemolatoon
27
34k
三者三様 宣言的UI
kkagurazaka
0
280
Node-REDのノードの開発・活用事例とコミュニティとの関わり(Node-RED Con Nagoya 2025)
404background
0
100
contribution to astral-sh/uv
shunsock
0
560
CSC305 Lecture 09
javiergs
PRO
0
320
Featured
See All Featured
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
We Have a Design System, Now What?
morganepeng
53
7.8k
Leading Effective Engineering Teams in the AI Era
addyosmani
7
670
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.1k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
640
Java REST API Framework Comparison - PWX 2021
mraible
34
8.9k
Fireside Chat
paigeccino
41
3.7k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.7k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Code Reviewing Like a Champion
maltzj
526
40k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
34
2.3k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.7k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll