Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
450
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2k
eBPF in Microservices Observability
rakyll
1
1.6k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.4k
Are you ready for production?
rakyll
8
2.5k
Servers are doomed to fail
rakyll
3
1.4k
Serverless Containers
rakyll
1
220
Critical Path Analysis
rakyll
0
400
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
受託開発でGitLab CI を活用していく
xiombatsg
1
270
Ruby Function Composition
bkuhlmann
1
330
CircleCIを活用して AWSへの継続的デリバリーを 実践する
coconala_engineer
1
230
データアナリストが行うDatabricksを活用したETLの自動化事例
shinoa
0
250
SwiftUI Performance 不要なViewの再描画と更新を抑える
bigamitiongit
1
150
入門 AWS Amplify Gen2 / Introduction to AWS Amplify Gen2
genkiogasawara
1
310
今の SmartHR にエンジニアで入社するとどうなるの?
daisukeshinoku
5
4.6k
Code Reviews
bkuhlmann
4
880
DMMプラットフォームがTiDB Cloudを採用した背景
pospome
7
3.3k
From Spring Boot 2 to Spring Boot 3 with Java 22 and Jakarta EE
ivargrimstad
0
870
脱・初心者!脱・マネコン!AWS CDKを使ってみませんか!?
har1101
0
300
単体テストを書かない技術 #phpcon_odawara
o0h
PRO
25
7.8k
Featured
See All Featured
Git: the NoSQL Database
bkeepers
PRO
421
63k
Statistics for Hackers
jakevdp
789
220k
Build The Right Thing And Hit Your Dates
maggiecrowley
23
2k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
29
6k
How to name files
jennybc
64
92k
Put a Button on it: Removing Barriers to Going Fast.
kastner
58
3k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
19
1.9k
Visualization
eitanlees
135
14k
The Brand Is Dead. Long Live the Brand.
mthomps
48
28k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
226
16k
Mobile First: as difficult as doing things right
swwweet
216
8.6k
Stop Working from a Prison Cell
hatefulcrawdad
265
19k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll