Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Observability_at_Google_--_OSCON.pdf
Search
JBD
July 23, 2018
Programming
1
250
Observability_at_Google_--_OSCON.pdf
JBD
July 23, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
620
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
Constant integer division faster than compiler-generated code
herumi
2
700
Laravel Boost 超入門
fire_arlo
1
120
Oracle Database Technology Night 92 Database Connection control FAN-AC
oracle4engineer
PRO
1
200
ワープロって実は計算機で
pepepper
2
1.4k
コーディングエージェント時代のNeovim
key60228
1
100
オープンセミナー2025@広島LT技術ブログを続けるには
satoshi256kbyte
0
120
未来を拓くAI技術〜エージェント開発とAI駆動開発〜
leveragestech
2
180
兎に角、コードレビュー
mitohato14
0
150
AWS Serverless Application Model入門_20250708
smatsuzaki
0
130
KessokuでDIでもgoroutineを活用する / Go Connect #6
mazrean
0
110
LLMOpsのパフォーマンスを支える技術と現場で実践した改善
po3rin
8
980
AI OCR API on Lambdaを Datadogで可視化してみた
nealle
0
180
Featured
See All Featured
Unsuck your backbone
ammeep
671
58k
How GitHub (no longer) Works
holman
315
140k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
480
Typedesign – Prime Four
hannesfritz
42
2.8k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.4k
Code Reviewing Like a Champion
maltzj
525
40k
Statistics for Hackers
jakevdp
799
220k
The Cost Of JavaScript in 2023
addyosmani
53
8.8k
We Have a Design System, Now What?
morganepeng
53
7.7k
[RailsConf 2023] Rails as a piece of cake
palkan
56
5.8k
Facilitating Awesome Meetings
lara
55
6.5k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Transcript
Observability at Google JBD, Google (@rakyll)
@rakyll History Long history of distributed systems 10ks of different
services built by 100s of teams Many backends/analysis tools invented here ™
@rakyll
@rakyll 100% availability (is a lie)
“ @rakyll A service is available if users cannot tell
there is an outage.
“ @rakyll Google Load Balancers are available if users cannot
tell there is an outage.
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll An observable system tells more than its availability.
@rakyll Context, status, expectations, debuggability
@rakyll How? Observe by collecting signals Export them to analysis
tools Correlate and analyze to find root cause
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll This is hard Must have integrations for web, RPC,
and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation
@rakyll This is too hard!
@rakyll Borg Stubby Census
opencensus.io
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll Z-Pages • Allows processes report their own dashboards. •
Z-Pages have no sampling.
@rakyll Try! import “go.opencensus.io/plugin/ocgrpc” s := grpc.NewServer(grpc.StatsHandler(&ocgrpc.ServerHandler{})) if err :=
s.Serve(lis); err != nil { log.Fatalf("Failed to serve: %v", err) }
@rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err :=
stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)
@rakyll
@rakyll
@rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide
Z-Pages Smart sampling Exemplars Framework, database, MQ integrations
opencensus.io
Thank you! opencensus.io JBD, Google
[email protected]
@rakyll