Slide 1

Slide 1 text

Observability at Google JBD, Google (@rakyll)

Slide 2

Slide 2 text

@rakyll History Long history of distributed systems 10ks of different services built by 100s of teams Many backends/analysis tools invented here ™

Slide 3

Slide 3 text

@rakyll

Slide 4

Slide 4 text

@rakyll 100% availability (is a lie)

Slide 5

Slide 5 text

“ @rakyll A service is available if users cannot tell there is an outage.

Slide 6

Slide 6 text

“ @rakyll Google Load Balancers are available if users cannot tell there is an outage.

Slide 7

Slide 7 text

@rakyll Principled way of saying what level of downtime is acceptable. ● Error rate ● Latency expectations SLOs

Slide 8

Slide 8 text

@rakyll An observable system tells more than its availability.

Slide 9

Slide 9 text

@rakyll Context, status, expectations, debuggability

Slide 10

Slide 10 text

@rakyll How? Observe by collecting signals Export them to analysis tools Correlate and analyze to find root cause

Slide 11

Slide 11 text

@rakyll

Slide 12

Slide 12 text

@rakyll

Slide 13

Slide 13 text

@rakyll

Slide 14

Slide 14 text

@rakyll

Slide 15

Slide 15 text

@rakyll This is hard Must have integrations for web, RPC, and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation

Slide 16

Slide 16 text

@rakyll This is too hard!

Slide 17

Slide 17 text

@rakyll Borg Stubby Census

Slide 18

Slide 18 text

opencensus.io

Slide 19

Slide 19 text

@rakyll

Slide 20

Slide 20 text

@rakyll

Slide 21

Slide 21 text

@rakyll

Slide 22

Slide 22 text

@rakyll

Slide 23

Slide 23 text

@rakyll Z-Pages ● Allows processes report their own dashboards. ● Z-Pages have no sampling.

Slide 24

Slide 24 text

@rakyll Try! import “go.opencensus.io/plugin/ocgrpc” s := grpc.NewServer(grpc.StatsHandler(&ocgrpc.ServerHandler{})) if err := s.Serve(lis); err != nil { log.Fatalf("Failed to serve: %v", err) }

Slide 25

Slide 25 text

@rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err := stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)

Slide 26

Slide 26 text

@rakyll

Slide 27

Slide 27 text

@rakyll

Slide 28

Slide 28 text

@rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide Z-Pages Smart sampling Exemplars Framework, database, MQ integrations

Slide 29

Slide 29 text

opencensus.io

Slide 30

Slide 30 text

Thank you! opencensus.io JBD, Google [email protected] @rakyll