Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Observability_at_Google_--_OSCON.pdf

E7526ec3e801f8ba99f6746498a154a6?s=47 JBD
July 23, 2018

 Observability_at_Google_--_OSCON.pdf

E7526ec3e801f8ba99f6746498a154a6?s=128

JBD

July 23, 2018
Tweet

Transcript

  1. Observability at Google JBD, Google (@rakyll)

  2. @rakyll History Long history of distributed systems 10ks of different

    services built by 100s of teams Many backends/analysis tools invented here ™
  3. @rakyll

  4. @rakyll 100% availability (is a lie)

  5. “ @rakyll A service is available if users cannot tell

    there is an outage.
  6. “ @rakyll Google Load Balancers are available if users cannot

    tell there is an outage.
  7. @rakyll Principled way of saying what level of downtime is

    acceptable. • Error rate • Latency expectations SLOs
  8. @rakyll An observable system tells more than its availability.

  9. @rakyll Context, status, expectations, debuggability

  10. @rakyll How? Observe by collecting signals Export them to analysis

    tools Correlate and analyze to find root cause
  11. @rakyll

  12. @rakyll

  13. @rakyll

  14. @rakyll

  15. @rakyll This is hard Must have integrations for web, RPC,

    and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation
  16. @rakyll This is too hard!

  17. @rakyll Borg Stubby Census

  18. opencensus.io

  19. @rakyll

  20. @rakyll

  21. @rakyll

  22. @rakyll

  23. @rakyll Z-Pages • Allows processes report their own dashboards. •

    Z-Pages have no sampling.
  24. @rakyll Try! import “go.opencensus.io/plugin/ocgrpc” s := grpc.NewServer(grpc.StatsHandler(&ocgrpc.ServerHandler{})) if err :=

    s.Serve(lis); err != nil { log.Fatalf("Failed to serve: %v", err) }
  25. @rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err :=

    stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)
  26. @rakyll

  27. @rakyll

  28. @rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide

    Z-Pages Smart sampling Exemplars Framework, database, MQ integrations
  29. opencensus.io

  30. Thank you! opencensus.io JBD, Google jbd@google.com @rakyll