Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
590
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
640
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
マンガアプリViewerの大画面対応を考える
kk__777
0
450
なんでRustの環境構築してないのにRust製のツールが動くの? / Why Do Rust-Based Tools Run Without a Rust Environment?
ssssota
14
47k
kiroとCodexで最高のSpec駆動開発を!!数時間で web3ネイティブなミニゲームを作ってみたよ!
mashharuki
0
1.1k
マイベストのシンプルなデータ基盤の話 - Googleスイートとのつき合い方 / mybest-simple-data-architecture-google-nized
snhryt
0
120
MCPサーバー「モディフィウス」で変更容易性の向上をスケールする / modifius
minodriven
4
680
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
500
SidekiqでAIに商品説明を生成させてみた
akinko_0915
0
110
開発組織の戦略的な役割と 設計スキル向上の効果
masuda220
PRO
10
2k
AIのバカさ加減に怒る前にやっておくこと
blueeventhorizon
0
140
外接に惑わされない自システムの処理時間SLIをOpenTelemetryで実現した話
kotaro7750
0
160
When Dependencies Fail: Building Antifragile Applications in a Fragile World
selcukusta
0
120
Amazon ECS Managed Instances が リリースされた!キャッチアップしよう!! / Let's catch up Amazon ECS Managed Instances
cocoeyes02
0
130
Featured
See All Featured
Site-Speed That Sticks
csswizardry
13
940
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Producing Creativity
orderedlist
PRO
348
40k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
How to Think Like a Performance Engineer
csswizardry
27
2.2k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.7k
KATA
mclloyd
PRO
32
15k
Writing Fast Ruby
sferik
630
62k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll