$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
600
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
270
Critical Path Analysis
rakyll
0
660
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
チームをチームにするEM
hitode909
0
340
大体よく分かるscala.collection.immutable.HashMap ~ Compressed Hash-Array Mapped Prefix-tree (CHAMP) ~
matsu_chara
2
220
マスタデータ問題、マイクロサービスでどう解くか
kts
0
100
実は歴史的なアップデートだと思う AWS Interconnect - multicloud
maroon1st
0
200
【CA.ai #3】Google ADKを活用したAI Agent開発と運用知見
harappa80
0
310
認証・認可の基本を学ぼう前編
kouyuume
0
250
愛される翻訳の秘訣
kishikawakatsumi
3
330
Cell-Based Architecture
larchanjo
0
130
SwiftUIで本格音ゲー実装してみた
hypebeans
0
390
Flutter On-device AI로 완성하는 오프라인 앱, 박제창 @DevFest INCHEON 2025
itsmedreamwalker
1
110
FluorTracer / RayTracingCamp11
kugimasa
0
230
ZOZOにおけるAI活用の現在 ~モバイルアプリ開発でのAI活用状況と事例~
zozotech
PRO
9
5.7k
Featured
See All Featured
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
[SF Ruby Conf 2025] Rails X
palkan
0
530
How STYLIGHT went responsive
nonsquared
100
6k
How GitHub (no longer) Works
holman
316
140k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.6k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.6k
Stop Working from a Prison Cell
hatefulcrawdad
273
21k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Scaling GitHub
holman
464
140k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll