Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Observability_at_Google_--_OSCON.pdf
Search
JBD
July 23, 2018
Programming
1
230
Observability_at_Google_--_OSCON.pdf
JBD
July 23, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.7k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
240
Critical Path Analysis
rakyll
0
550
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
CSS Linter による Baseline サポートの仕組み
ryo_manba
1
110
『GO』アプリ データ基盤のログ収集システムコスト削減
mot_techtalk
0
120
Honoとフロントエンドの 型安全性について
yodaka
7
1.2k
Bedrock Agentsレスポンス解析によるAgentのOps
licux
3
840
責務と認知負荷を整える! 抽象レベルを意識した関心の分離
yahiru
3
500
クリーンアーキテクチャから見る依存の向きの大切さ
shimabox
2
340
Rails アプリ地図考 Flush Cut
makicamel
1
120
AIの力でお手軽Chrome拡張機能作り
taiseiue
0
170
Pulsar2 を雰囲気で使ってみよう
anoken
0
240
Formの複雑さに立ち向かう
bmthd
1
850
Linux && Docker 研修/Linux && Docker training
forrep
24
4.5k
『テスト書いた方が開発が早いじゃん』を解き明かす #phpcon_nagoya
o0h
PRO
2
240
Featured
See All Featured
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
A better future with KSS
kneath
238
17k
The Power of CSS Pseudo Elements
geoffreycrofte
75
5.5k
VelocityConf: Rendering Performance Case Studies
addyosmani
328
24k
Site-Speed That Sticks
csswizardry
4
380
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7.1k
Embracing the Ebb and Flow
colly
84
4.6k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.2k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
33
2.8k
Gamification - CAS2011
davidbonilla
80
5.1k
Scaling GitHub
holman
459
140k
Transcript
Observability at Google JBD, Google (@rakyll)
@rakyll History Long history of distributed systems 10ks of different
services built by 100s of teams Many backends/analysis tools invented here ™
@rakyll
@rakyll 100% availability (is a lie)
“ @rakyll A service is available if users cannot tell
there is an outage.
“ @rakyll Google Load Balancers are available if users cannot
tell there is an outage.
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll An observable system tells more than its availability.
@rakyll Context, status, expectations, debuggability
@rakyll How? Observe by collecting signals Export them to analysis
tools Correlate and analyze to find root cause
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll This is hard Must have integrations for web, RPC,
and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation
@rakyll This is too hard!
@rakyll Borg Stubby Census
opencensus.io
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll Z-Pages • Allows processes report their own dashboards. •
Z-Pages have no sampling.
@rakyll Try! import “go.opencensus.io/plugin/ocgrpc” s := grpc.NewServer(grpc.StatsHandler(&ocgrpc.ServerHandler{})) if err :=
s.Serve(lis); err != nil { log.Fatalf("Failed to serve: %v", err) }
@rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err :=
stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)
@rakyll
@rakyll
@rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide
Z-Pages Smart sampling Exemplars Framework, database, MQ integrations
opencensus.io
Thank you! opencensus.io JBD, Google
[email protected]
@rakyll