Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
550
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
250
Critical Path Analysis
rakyll
0
570
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
ML.NETで始める機械学習
ymd65536
0
240
良いコードレビューとは
danimal141
7
1.8k
推しメソッドsource_locationのしくみを探る - はじめてRubyのコードを読んでみた
nobu09
2
340
The Clean ArchitectureがWebフロントエンドでしっくりこないのは何故か / Why The Clean Architecture does not fit with Web Frontend
twada
PRO
3
820
Serverless Rust: Your Low-Risk Entry Point to Rust in Production (and the benefits are huge)
lmammino
1
160
AWS Step Functions は CDK で書こう!
konokenj
4
640
Jakarta EE meets AI
ivargrimstad
0
540
Rubyと自由とAIと
yotii23
6
1.8k
楽しく向き合う例外対応
okutsu
0
710
PHPカンファレンス名古屋2025 タスク分解の試行錯誤〜レビュー負荷を下げるために〜
soichi
1
730
Go 1.24でジェネリックになった型エイリアスの紹介
syumai
2
300
1年目の私に伝えたい!テストコードを怖がらなくなるためのヒント/Tips for not being afraid of test code
push_gawa
1
630
Featured
See All Featured
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
The Pragmatic Product Professional
lauravandoore
32
6.4k
A better future with KSS
kneath
238
17k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
134
33k
Testing 201, or: Great Expectations
jmmastey
42
7.2k
BBQ
matthewcrist
87
9.5k
Designing Experiences People Love
moore
140
23k
A Modern Web Designer's Workflow
chriscoyier
693
190k
Code Reviewing Like a Champion
maltzj
521
39k
GraphQLの誤解/rethinking-graphql
sonatard
69
10k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
A designer walks into a library…
pauljervisheath
205
24k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google jbd@google.com @rakyll