Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
Observability_at_Google_--_OSCON.pdf
JBD
July 23, 2018
Programming
0
100
Observability_at_Google_--_OSCON.pdf
JBD
July 23, 2018
Tweet
Share
More Decks by JBD
See All by JBD
rakyll
5
1.2k
rakyll
7
1.5k
rakyll
2
1.1k
rakyll
0
120
rakyll
0
110
rakyll
2
870
rakyll
0
3.6k
rakyll
1
190
rakyll
2
1.5k
Other Decks in Programming
See All in Programming
shigeruoda
0
480
cwozaki
1
1.8k
ajstarks
2
550
yumcyawiz
4
640
standfm
0
120
mizdra
7
4.8k
supikiti
3
1.3k
nbkouhou
0
920
freekmurze
0
210
junmikai
0
290
azdaroth
0
170
yamotuki
0
130
Featured
See All Featured
roundedbygravity
84
7.8k
addyosmani
494
110k
keithpitt
401
20k
eitanlees
111
9.9k
bryan
100
11k
sferik
609
54k
tanoku
258
24k
edds
56
9.3k
samlambert
237
9.9k
notwaldorf
13
1.6k
kneath
219
15k
zakiwarfel
88
3.3k
Transcript
Observability at Google JBD, Google (@rakyll)
@rakyll History Long history of distributed systems 10ks of different
services built by 100s of teams Many backends/analysis tools invented here ™
@rakyll
@rakyll 100% availability (is a lie)
“ @rakyll A service is available if users cannot tell
there is an outage.
“ @rakyll Google Load Balancers are available if users cannot
tell there is an outage.
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll An observable system tells more than its availability.
@rakyll Context, status, expectations, debuggability
@rakyll How? Observe by collecting signals Export them to analysis
tools Correlate and analyze to find root cause
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll This is hard Must have integrations for web, RPC,
and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation
@rakyll This is too hard!
@rakyll Borg Stubby Census
opencensus.io
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll Z-Pages • Allows processes report their own dashboards. •
Z-Pages have no sampling.
@rakyll Try! import “go.opencensus.io/plugin/ocgrpc” s := grpc.NewServer(grpc.StatsHandler(&ocgrpc.ServerHandler{})) if err :=
s.Serve(lis); err != nil { log.Fatalf("Failed to serve: %v", err) }
@rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err :=
stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)
@rakyll
@rakyll
@rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide
Z-Pages Smart sampling Exemplars Framework, database, MQ integrations
opencensus.io
Thank you! opencensus.io JBD, Google jbd@google.com @rakyll