Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Observability_at_Google_--_OSCON.pdf
Search
JBD
July 23, 2018
Programming
1
260
Observability_at_Google_--_OSCON.pdf
JBD
July 23, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
630
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
Go言語の特性を活かした公式MCP SDKの設計
hond0413
1
230
「ちょっと古いから」って避けてた技術書、今だからこそ読もう
mottyzzz
10
6.6k
XP, Testing and ninja testing ZOZ5
m_seki
3
610
技術的負債の正体を知って向き合う / Facing Technical Debt
irof
0
150
SpecKitでどこまでできる? コストはどれくらい?
leveragestech
0
690
オープンソースソフトウェアへの解像度🔬
utam0k
12
2.5k
非同期jobをtransaction内で 呼ぶなよ!絶対に呼ぶなよ!
alstrocrack
0
700
Advance Your Career with Open Source
ivargrimstad
0
480
All About Angular's New Signal Forms
manfredsteyer
PRO
0
110
バッチ処理を「状態の記録」から「事実の記録」へ
panda728
PRO
0
150
TFLintカスタムプラグインで始める Terraformコード品質管理
bells17
2
140
Serena MCPのすすめ
wadakatu
4
980
Featured
See All Featured
Gamification - CAS2011
davidbonilla
81
5.5k
A better future with KSS
kneath
239
18k
Facilitating Awesome Meetings
lara
56
6.6k
For a Future-Friendly Web
brad_frost
180
9.9k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Visualization
eitanlees
149
16k
Navigating Team Friction
lara
190
15k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.9k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Transcript
Observability at Google JBD, Google (@rakyll)
@rakyll History Long history of distributed systems 10ks of different
services built by 100s of teams Many backends/analysis tools invented here ™
@rakyll
@rakyll 100% availability (is a lie)
“ @rakyll A service is available if users cannot tell
there is an outage.
“ @rakyll Google Load Balancers are available if users cannot
tell there is an outage.
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll An observable system tells more than its availability.
@rakyll Context, status, expectations, debuggability
@rakyll How? Observe by collecting signals Export them to analysis
tools Correlate and analyze to find root cause
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll This is hard Must have integrations for web, RPC,
and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation
@rakyll This is too hard!
@rakyll Borg Stubby Census
opencensus.io
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll Z-Pages • Allows processes report their own dashboards. •
Z-Pages have no sampling.
@rakyll Try! import “go.opencensus.io/plugin/ocgrpc” s := grpc.NewServer(grpc.StatsHandler(&ocgrpc.ServerHandler{})) if err :=
s.Serve(lis); err != nil { log.Fatalf("Failed to serve: %v", err) }
@rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err :=
stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)
@rakyll
@rakyll
@rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide
Z-Pages Smart sampling Exemplars Framework, database, MQ integrations
opencensus.io
Thank you! opencensus.io JBD, Google
[email protected]
@rakyll