Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Observability_at_Google_--_OSCON.pdf
Search
JBD
July 23, 2018
Programming
1
230
Observability_at_Google_--_OSCON.pdf
JBD
July 23, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
250
Critical Path Analysis
rakyll
0
570
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
TCAを用いたAmebaのリアーキテクチャ
dazy
0
210
Rubyと自由とAIと
yotii23
6
1.8k
たのしいSocketのしくみ / Socket Under a Microscope
coe401_
8
1.4k
15分で学ぶDuckDBの可愛い使い方 DuckDBの最近の更新
notrogue
3
750
機能が複雑化しても 頼りになる FactoryBotの話
tamikof
1
220
Introduction to kotlinx.rpc
arawn
0
770
ナレッジイネイブリングにAIを活用してみる ゆるSRE勉強会 #9
nealle
0
160
Bedrock Agentsレスポンス解析によるAgentのOps
licux
3
930
新宿駅構内を三人称視点で探索してみる
satoshi7190
2
120
Swift Testingのモチベを上げたい
stoticdev
2
150
Ça bouge du côté des animations CSS !
goetter
2
160
Honoとフロントエンドの 型安全性について
yodaka
7
1.5k
Featured
See All Featured
Visualization
eitanlees
146
15k
How STYLIGHT went responsive
nonsquared
99
5.4k
Statistics for Hackers
jakevdp
797
220k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
4
440
Raft: Consensus for Rubyists
vanstee
137
6.8k
A Philosophy of Restraint
colly
203
16k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
114
51k
GraphQLの誤解/rethinking-graphql
sonatard
69
10k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
46
2.4k
Adopting Sorbet at Scale
ufuk
75
9.2k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2.1k
Mobile First: as difficult as doing things right
swwweet
223
9.5k
Transcript
Observability at Google JBD, Google (@rakyll)
@rakyll History Long history of distributed systems 10ks of different
services built by 100s of teams Many backends/analysis tools invented here ™
@rakyll
@rakyll 100% availability (is a lie)
“ @rakyll A service is available if users cannot tell
there is an outage.
“ @rakyll Google Load Balancers are available if users cannot
tell there is an outage.
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll An observable system tells more than its availability.
@rakyll Context, status, expectations, debuggability
@rakyll How? Observe by collecting signals Export them to analysis
tools Correlate and analyze to find root cause
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll This is hard Must have integrations for web, RPC,
and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation
@rakyll This is too hard!
@rakyll Borg Stubby Census
opencensus.io
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll Z-Pages • Allows processes report their own dashboards. •
Z-Pages have no sampling.
@rakyll Try! import “go.opencensus.io/plugin/ocgrpc” s := grpc.NewServer(grpc.StatsHandler(&ocgrpc.ServerHandler{})) if err :=
s.Serve(lis); err != nil { log.Fatalf("Failed to serve: %v", err) }
@rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err :=
stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)
@rakyll
@rakyll
@rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide
Z-Pages Smart sampling Exemplars Framework, database, MQ integrations
opencensus.io
Thank you! opencensus.io JBD, Google jbd@google.com @rakyll