Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
550
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
240
Critical Path Analysis
rakyll
0
560
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
Ruby on cygwin 2025-02
fd0
0
180
推しメソッドsource_locationのしくみを探る - はじめてRubyのコードを読んでみた
nobu09
2
280
1年目の私に伝えたい!テストコードを怖がらなくなるためのヒント/Tips for not being afraid of test code
push_gawa
1
570
Djangoにおける複数ユーザー種別認証の設計アプローチ@DjangoCongress JP 2025
delhi09
PRO
4
470
PHPのバージョンアップ時にも役立ったAST
matsuo_atsushi
0
230
GoとPHPのインターフェイスの違い
shimabox
2
210
仕様変更に耐えるための"今の"DRY原則を考える
mkmk884
9
3.2k
pylint custom ruleで始めるレビュー自動化
shogoujiie
0
150
Honoをフロントエンドで使う 3つのやり方
yusukebe
7
3.5k
苦しいTiDBへの移行を乗り越えて快適な運用を目指す
leveragestech
0
1k
Djangoアプリケーション 運用のリアル 〜問題発生から可視化、最適化への道〜 #pyconshizu
kashewnuts
1
260
はじめての Go * WASM *OCR
sgash708
1
100
Featured
See All Featured
Designing for humans not robots
tammielis
250
25k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.3k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
Automating Front-end Workflow
addyosmani
1368
200k
Typedesign – Prime Four
hannesfritz
40
2.5k
Fontdeck: Realign not Redesign
paulrobertlloyd
83
5.4k
Building an army of robots
kneath
303
45k
YesSQL, Process and Tooling at Scale
rocio
172
14k
Scaling GitHub
holman
459
140k
It's Worth the Effort
3n
184
28k
How to Ace a Technical Interview
jacobian
276
23k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll