Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to "go tool trace"

An Introduction to "go tool trace"

The execution tracer (go tool trace) is one of Go's most powerful diagnosis tools, but its power and complexity make it difficult to learn. See how it can help you understand goroutines' interactions in great detail, uncovering latency bugs and logical races.

Rhys Hiltner

July 14, 2017
Tweet

More Decks by Rhys Hiltner

Other Decks in Technology

Transcript

  1. go tool trace
    for correct and effective concurrency

    View Slide

  2. Let's talk about Go

    View Slide

  3. Let's talk about go

    View Slide

  4. A special tool for Go's needs
    * and Chromium

    View Slide

  5. goroutine scheduling
    instrumented

    View Slide

  6. Concurrency,
    and how to manage it

    View Slide

  7. Parallelism,
    and how to exploit it

    View Slide

  8. $ go doc runtime/trace
    package trace // import "runtime/trace"
    Go execution tracer. The tracer captures
    a wide range of execution events ...
    A trace can be analyzed later with
    'go tool trace' command.

    View Slide

  9. $ go doc cmd/trace
    Trace is a tool for viewing trace files.
    ...
    View the trace in a web browser:
    go tool trace trace.out

    View Slide

  10. pp-quiet-10.first.trace.redacted.html

    View Slide

  11. grpc.pre-1005-5.exectrace

    View Slide

  12. The tool in three demos:
    1. A timing-dependent bug
    2. What it doesn't show
    3. Latency during GC

    View Slide

  13. #1: A race condition

    View Slide

  14. go test -race
    go build -race
    go install -race

    View Slide

  15. ==================
    WARNING: DATA RACE
    Read at 0x00c420141500 by goroutine 27:
    ...
    Previous write at 0x00c420141500 by
    goroutine 26:
    ...

    View Slide

  16. a logical race, not a data race

    View Slide

  17. gRPC, HTTP/2,
    and flow control

    View Slide

  18. grpc.pre-1005-5.exectrace
    stuff
    happening
    stuff
    not happening

    View Slide

  19. $ go tool trace trace.out
    $ go test -trace=trace.out

    View Slide

  20. "View trace"

    View Slide

  21. grpc.pre-1005-5.exectrace
    1s
    (from context.WithTimeout)

    View Slide

  22. grpc.pre-1005-5.exectrace

    View Slide

  23. grpc.pre-1005-5.exectrace
    Garbage
    collection
    20ms
    Goroutines running
    on OS threads

    View Slide

  24. "Sync blocking profile"

    View Slide

  25. grpc.pre-1005-5.exectrace

    View Slide

  26. grpc.pre-1005-5.exectrace
    one-second
    unexpected delay

    View Slide

  27. grpc.pre-1005-5.exectrace
    goroutine
    "name"

    View Slide

  28. "Goroutine analysis"

    View Slide

  29. grpc.pre-1005-5.exectrace

    View Slide

  30. grpc.pre-1005-5.exectrace

    View Slide

  31. grpc.pre-1005-5.exectrace
    1s

    View Slide

  32. grpc.pre-1005-5.exectrace

    View Slide

  33. grpc.pre-1005-5.exectrace

    View Slide

  34. func wait(ctx context.Context, ...,
    proceed select {
    case // ...
    case ...: // stream closed, etc
    case i := return i, nil
    }
    }

    View Slide

  35. t.sendQuotaPool.add(0)
    tq, err := wait(s.ctx, ...,
    t.sendQuotaPool.acquire())
    if err != nil && ... {
    t.sendQuotaPool.cancel()
    return
    }
    // calculate payload size ...
    t.sendQuotaPool.add(tq-ps)

    View Slide

  36. type quotaPool struct {
    c chan int
    mu sync.Mutex
    quota int
    }

    View Slide

  37. func newQuotaPool(q int) *quotaPool {
    qb := &quotaPool{
    c: make(chan int, 1),
    }
    if q > 0 {
    qb.c } else {
    qb.quota = q
    }
    return qb
    }

    View Slide

  38. func newQuotaPool(q int) *quotaPool {
    qb := &quotaPool{
    c: make(chan int, 1),
    }
    if q > 0 {
    qb.c } else {
    qb.quota = q
    }
    return qb
    }

    View Slide

  39. func newQuotaPool(q int) *quotaPool {
    qb := &quotaPool{
    c: make(chan int, 1),
    }
    if q > 0 {
    qb.c } else {
    qb.quota = q
    }
    return qb
    }

    View Slide

  40. func newQuotaPool(q int) *quotaPool {
    qb := &quotaPool{
    c: make(chan int, 1),
    }
    if q > 0 {
    qb.c } else {
    qb.quota = q
    }
    return qb
    }

    View Slide

  41. type quotaPool struct {
    c chan int // positive quota here
    mu sync.Mutex
    quota int // zero or negative quota
    }

    View Slide

  42. func (qb *quotaPool) acquire()
    return qb.c
    }

    View Slide

  43. func (qb *quotaPool) add(n int) {
    qb.mu.Lock(); defer qb.mu.Unlock()
    if qb.quota += n; qb.quota <= 0 {
    return
    }
    select {
    case qb.c qb.quota = 0
    default:
    }
    }

    View Slide

  44. func (qb *quotaPool) cancel() {
    qb.mu.Lock(); defer qb.mu.Unlock()
    select {
    case n := qb.quota += n
    default:
    }
    }

    View Slide

  45. Goroutine 10:
    pool.add(0)
    wait(..., pool.aq())
    pool.add(tq-ps)

    View Slide

  46. Goroutine 300:
    pool.add(0)
    wait(..., pool.aq())
    pool.cancel()
    Goroutine 400:
    pool.add(0)
    wait(..., pool.aq())
    pool.add(tq-ps)

    View Slide

  47. Goroutine 5000:
    pool.add(0)
    wait(..., pool.aq())
    pool.add(tq-ps)
    Goroutine 6000:
    pool.add(0)
    wait(..., pool.aq())
    pool.add(tq-ps)

    View Slide

  48. grpc.pre-1005-5.exectrace
    G8006, about to
    stall for one second
    G7962, another
    goroutine running
    the same function
    36µs

    View Slide

  49. Goroutine 7962:
    pool.add(0)
    wait(..., pool.aq())
    pool.cancel()
    Goroutine 8006:
    pool.add(0)
    wait(..., pool.aq())
    pool.cancel()

    View Slide

  50. - func (qb *quotaPool) cancel() {
    - qb.mu.Lock(); defer qb.mu.Unlock()
    -
    - select {
    - case n := - qb.quota += n
    - default:
    - }
    - }

    View Slide

  51. func (qb *quotaPool) add(v int) {
    qb.mu.Lock(); defer qb.mu.Unlock()
    select {
    case n := qb.quota += n
    default:
    }
    if qb.quota += v; qb.quota > 0 {
    qb.c qb.quota = 0
    }
    }

    View Slide

  52. func (qb *quotaPool) add(v int) {
    qb.mu.Lock(); defer qb.mu.Unlock()
    select {
    case n := qb.quota += n
    default:
    }
    if qb.quota += v; qb.quota > 0 {
    qb.c qb.quota = 0
    }
    }

    View Slide

  53. #2: It's not a panacea

    View Slide

  54. Three ways to get data:
    1. Testing with -trace flag
    2. Direct runtime/trace use
    3. net/http/pprof handlers

    View Slide

  55. pp-quiet-10.first.trace.redacted.html
    suspicious gaps
    every five seconds
    garbage
    collection
    unlike other
    goroutines

    View Slide

  56. pp-quiet-10.first.trace.redacted.html
    nothing else runs
    during these 90ms

    View Slide

  57. runtime.ReadMemStats
    • and Go 1.8 or earlier
    • and large (40GB) heap
    • leads to long pauses
    (fixed in Go 1.9)

    View Slide

  58. pp-quiet-10.first.trace.redacted.html
    what does this goroutine do?
    how does it spend its CPU time?
    can it run more efficiently?
    can it finish sooner (parallelism)?

    View Slide

  59. pp-quiet-10.first.trace.redacted.html
    net/http.(*conn).serve
    what request did it serve?
    which handler func ran?
    what status code was returned?

    View Slide

  60. #3: The GC is still improving

    View Slide

  61. tc.1.third.trace.redacted.html
    patchy network
    long run queue
    some behavior is
    different during GC
    * this is probably
    bad, but how good
    should we expect?

    View Slide

  62. Go garbage collection timeline:
    Go 1: program fully stopped
    Go 1.1: GC uses parallel threads
    Go 1.4: precise collection
    Go 1.5: global pauses <10ms
    Go 1.8: goroutine pauses <100µs

    View Slide

  63. garbage
    collection
    Go 1 GC
    whole program
    stopped ("STW")
    user code ("mutator")

    View Slide

  64. user code ("mutator")
    garbage
    collection
    Go 1.1 GC
    whole program
    stopped ("STW")

    View Slide

  65. garbage
    collection
    Go 1.5 GC
    user code ("mutator")
    whole program
    stopped ("STW")

    View Slide

  66. garbage
    collection
    Go 1.8 GC
    whole program
    stopped ("STW")
    user code ("mutator")

    View Slide

  67. #3A: Stop-The-World pauses

    View Slide

  68. Everything stops when the GC
    begins and ends mark phase
    Stopping everything takes time

    View Slide

  69. ta.nopreempt-2.fourth.trace.redacted.html
    GC begins
    stragglers
    GC really begins
    3.6ms
    unused CPU

    View Slide

  70. Goroutines can stop when...
    ✓ allocating memory
    ✓ calling functions
    ✓ communicating
    ✗ crunching numbers in a loop

    View Slide

  71. Seen in:
    • encoding/base64
    • encoding/json
    • .../golang/protobuf/proto
    it's measurable for 1MB values
    ... if you're looking for it
    * check code lines before "End Stack Trace"

    View Slide

  72. go func() {
    for i := 0; i < 1e9; i++ {
    }
    }
    Or write your own tight loop:

    View Slide

  73. golang.org/issue/10958
    The Go 1.10 compiler should
    have a general, permanent fix
    Workaround available now

    View Slide

  74. ta.preempt-2.second.trace.redacted.html
    quick pauses, less waste

    View Slide

  75. #3B: Other awkward pauses

    View Slide

  76. A mark/sweep GC:
    • "mark" finds in-use memory
    • "sweep" reclaims the rest

    View Slide

  77. GC needs to make progress
    User code works against that
    User code is forced to help out

    View Slide

  78. tc.1.third.trace.redacted.html
    read(2) syscall
    * hold 'shift' to draw
    a selection box
    user code in goroutines
    runtime
    work below
    helping
    the GC

    View Slide

  79. tc.1.third.trace.redacted.html
    this assist ran for 4.4ms

    View Slide

  80. tc.1.third.trace.redacted.html
    this assist ran for 15.6ms
    ... and didn't even finish

    View Slide

  81. Most assists are well-deserved
    But they start suddenly
    Sweeping requires assists too
    Don't allocate in critical paths?

    View Slide

  82. go tool trace
    http://.../debug/pprof/trace

    View Slide

  83. How can the tool help you?
    1. See time-dependent issues
    2. Complements other profiles
    3. Find latency improvements

    View Slide

  84. Be prepared:
    practice using the tools
    Thank you!
    @rhyshiltner

    View Slide