Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tracing for Granularity

E7526ec3e801f8ba99f6746498a154a6?s=47 JBD
June 02, 2018

Tracing for Granularity



June 02, 2018


  1. tracing for granularity JBD, Google (@rakyll)

  2. @rakyll

  3. @rakyll tracing? What is tracing and why do we trace?

  4. @rakyll

  5. @rakyll clogged?

  6. @rakyll leaking?

  7. @rakyll path and direction?

  8. @rakyll 100% availability (is a lie)

  9. “ @rakyll A service is available if users cannot tell

    there was an outage.
  10. @rakyll Without an SLO, your team has no principled way

    of saying what level of downtime is acceptable. • Error rate • Latency or throughput expectations Service Level Objectives (SLOs)
  11. @rakyll 28 ms 100 ms 172 ms 56 ms 356

    ms what user sees what else we can see sec.Check auth.AccessToken cache.Lookup spanner.Query GET /messages
  12. @rakyll 182 ms 56 ms 245 ms what user sees

    what else we can see sec.Check auth.AccessToken GET /messages 7 ms cache.Lookup
  13. @rakyll latency...

  14. @rakyll Go is the language to write servers. Many runtime

    activities occur during the program execution: • scheduling • memory allocation • garbage collection Hard to associate a request with its impact on the runtime.
  15. @rakyll clogged?

  16. “ @rakyll There is no easy way to tell why

    latency is high for certain requests. Is it due to GC, scheduler or syscalls? Can you review the code and tell us why? -SRE
  17. @rakyll Execution tracer $ go tool trace • Reports fine-grained

    runtime events in the lifetime of a goroutine. • Reports utilization of CPU cores. But cannot easily tell how handling a request impacts the runtime.
  18. @rakyll 28 ms 100 ms 172 ms 56 ms 356

    ms GET /messages auth.AccessToken cache.Lookup spanner.Query GET /messages
  19. @rakyll 5 68µs 8 123µs networking serialization + deserialization garbage

    collection blocking syscall what actually happens 172 ms auth.AccessToken
  20. @rakyll 5 68µs 8 123µs epoll executing sys gc netwrite

  21. @rakyll How? • Mark sections in code using runtime/trace. •

    Enable execution tracer temporarily and record data. • Examine the recorded data.
  22. @rakyll Go 1.11 introduces... • User regions, tasks and annotations.

    • Association between user code and runtime. • Association with distributed traces.
  23. @rakyll Go 1.11 runtime/trace import “runtime/trace” ctx, task := trace.NewTask(ctx,

    “myHandler”) defer task.End() // Handler code here....
  24. @rakyll region #1 task #1 Go 1.11 runtime/trace region #2

    region #3 region #4 region #5 goroutine #1 goroutine #4 goroutine #5
  25. @rakyll import _ "net/http/pprof" go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }()

  26. @rakyll $ curl http://server:6060/debug/pprof/trace?seconds=5 -o trace.out $ go tool trace

    trace.out 2018/05/04 10:39:59 Parsing trace... 2018/05/04 10:39:59 Splitting trace... 2018/05/04 10:39:59 Opening browser. Trace viewer is listening on
  27. Execution tracer tasks for RPCs (/usertasks)

  28. Execution tracer tasks for RPCs (/usertasks)

  29. RPCs overlapping with garbage collection

  30. Execution tracer regions (/userregions)

  31. Region summary for conn.ready

  32. @rakyll Record in production $ curl http://server/debug/pprof/trace?seconds=5 -o trace.out $

    go tool trace trace.out
  33. @rakyll Try It! Install the Go 1.11 beta1! golang.org/dl

  34. @rakyll $ go get go.opencensus.io/trace import rt “runtime/trace” ctx, span

    := trace.StartSpan(ctx, “/messages”) defer span.End() rt.WithRegion(ctx, “foo”, func(ctx) { // Do something... })
  35. @rakyll Limitations • Execution tracer cannot do accounting for cross-goroutine

    operations automatically. • Exposition format is hard to parse if `go trace tool` is not used.
  36. thank you! JBD, Google jbd@google.com @rakyll