Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Observability_at_Google_--_OSCON.pdf

JBD
July 23, 2018

 Observability_at_Google_--_OSCON.pdf

JBD

July 23, 2018
Tweet

More Decks by JBD

Other Decks in Programming

Transcript

  1. @rakyll History Long history of distributed systems 10ks of different

    services built by 100s of teams Many backends/analysis tools invented here ™
  2. @rakyll Principled way of saying what level of downtime is

    acceptable. • Error rate • Latency expectations SLOs
  3. @rakyll How? Observe by collecting signals Export them to analysis

    tools Correlate and analyze to find root cause
  4. @rakyll This is hard Must have integrations for web, RPC,

    and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation
  5. @rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err :=

    stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)
  6. @rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide

    Z-Pages Smart sampling Exemplars Framework, database, MQ integrations