Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
520
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.6k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.7k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
240
Critical Path Analysis
rakyll
0
510
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
Pinia Colada が実現するスマートな非同期処理
naokihaba
2
160
OpenTelemetryでRailsのパフォーマンス分析を始めてみよう(KoR2024)
ymtdzzz
4
1.6k
色々なIaCツールを実際に触って比較してみる
iriikeita
0
280
役立つログに取り組もう
irof
27
8.7k
C#/.NETのこれまでのふりかえり
tomokusaba
1
160
デプロイを任されたので、教わった通りにデプロイしたら障害になった件 ~俺のやらかしを越えてゆけ~
techouse
52
32k
Jakarta Concurrencyによる並行処理プログラミングの始め方 (JJUG CCC 2024 Fall)
tnagao7
1
240
Vitest Browser Mode への期待 / Vitest Browser Mode
odanado
PRO
2
1.7k
Java ジェネリクス入門 2024
nagise
0
610
のびしろを広げる巻き込まれ力:偶然を活かすキャリアの作り方/oso2024
takahashiikki
1
410
Googleのテストサイズを活用したテスト環境の構築
toms74209200
0
280
PLoP 2024: The evolution of the microservice architecture pattern language
cer
PRO
0
1.7k
Featured
See All Featured
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
191
16k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
Automating Front-end Workflow
addyosmani
1365
200k
Put a Button on it: Removing Barriers to Going Fast.
kastner
59
3.5k
How To Stay Up To Date on Web Technology
chriscoyier
788
250k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Bootstrapping a Software Product
garrettdimon
PRO
305
110k
Ruby is Unlike a Banana
tanoku
96
11k
10 Git Anti Patterns You Should be Aware of
lemiorhan
654
59k
The World Runs on Bad Software
bkeepers
PRO
65
11k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
25
1.8k
KATA
mclloyd
29
13k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll