Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to "go tool trace"

An Introduction to "go tool trace"

The execution tracer (go tool trace) is one of Go's most powerful diagnosis tools, but its power and complexity make it difficult to learn. See how it can help you understand goroutines' interactions in great detail, uncovering latency bugs and logical races.

Rhys Hiltner

July 14, 2017
Tweet

More Decks by Rhys Hiltner

Other Decks in Technology

Transcript

  1. go tool trace
    for correct and effective concurrency

    View full-size slide

  2. Let's talk about Go

    View full-size slide

  3. Let's talk about go

    View full-size slide

  4. A special tool for Go's needs
    * and Chromium

    View full-size slide

  5. goroutine scheduling
    instrumented

    View full-size slide

  6. Concurrency,
    and how to manage it

    View full-size slide

  7. Parallelism,
    and how to exploit it

    View full-size slide

  8. $ go doc runtime/trace
    package trace // import "runtime/trace"
    Go execution tracer. The tracer captures
    a wide range of execution events ...
    A trace can be analyzed later with
    'go tool trace' command.

    View full-size slide

  9. $ go doc cmd/trace
    Trace is a tool for viewing trace files.
    ...
    View the trace in a web browser:
    go tool trace trace.out

    View full-size slide

  10. pp-quiet-10.first.trace.redacted.html

    View full-size slide

  11. grpc.pre-1005-5.exectrace

    View full-size slide

  12. The tool in three demos:
    1. A timing-dependent bug
    2. What it doesn't show
    3. Latency during GC

    View full-size slide

  13. #1: A race condition

    View full-size slide

  14. go test -race
    go build -race
    go install -race

    View full-size slide

  15. ==================
    WARNING: DATA RACE
    Read at 0x00c420141500 by goroutine 27:
    ...
    Previous write at 0x00c420141500 by
    goroutine 26:
    ...

    View full-size slide

  16. a logical race, not a data race

    View full-size slide

  17. gRPC, HTTP/2,
    and flow control

    View full-size slide

  18. grpc.pre-1005-5.exectrace
    stuff
    happening
    stuff
    not happening

    View full-size slide

  19. $ go tool trace trace.out
    $ go test -trace=trace.out

    View full-size slide

  20. "View trace"

    View full-size slide

  21. grpc.pre-1005-5.exectrace
    1s
    (from context.WithTimeout)

    View full-size slide

  22. grpc.pre-1005-5.exectrace

    View full-size slide

  23. grpc.pre-1005-5.exectrace
    Garbage
    collection
    20ms
    Goroutines running
    on OS threads

    View full-size slide

  24. "Sync blocking profile"

    View full-size slide

  25. grpc.pre-1005-5.exectrace

    View full-size slide

  26. grpc.pre-1005-5.exectrace
    one-second
    unexpected delay

    View full-size slide

  27. grpc.pre-1005-5.exectrace
    goroutine
    "name"

    View full-size slide

  28. "Goroutine analysis"

    View full-size slide

  29. grpc.pre-1005-5.exectrace

    View full-size slide

  30. grpc.pre-1005-5.exectrace

    View full-size slide

  31. grpc.pre-1005-5.exectrace
    1s

    View full-size slide

  32. grpc.pre-1005-5.exectrace

    View full-size slide

  33. grpc.pre-1005-5.exectrace

    View full-size slide

  34. func wait(ctx context.Context, ...,
    proceed <-chan int) (int, error) {
    select {
    case <-ctx.Done():
    // ...
    case ...: // stream closed, etc
    case i := <-proceed:
    return i, nil
    }
    }

    View full-size slide

  35. t.sendQuotaPool.add(0)
    tq, err := wait(s.ctx, ...,
    t.sendQuotaPool.acquire())
    if err != nil && ... {
    t.sendQuotaPool.cancel()
    return
    }
    // calculate payload size ...
    t.sendQuotaPool.add(tq-ps)

    View full-size slide

  36. type quotaPool struct {
    c chan int
    mu sync.Mutex
    quota int
    }

    View full-size slide

  37. func newQuotaPool(q int) *quotaPool {
    qb := &quotaPool{
    c: make(chan int, 1),
    }
    if q > 0 {
    qb.c <- q
    } else {
    qb.quota = q
    }
    return qb
    }

    View full-size slide

  38. func newQuotaPool(q int) *quotaPool {
    qb := &quotaPool{
    c: make(chan int, 1),
    }
    if q > 0 {
    qb.c <- q
    } else {
    qb.quota = q
    }
    return qb
    }

    View full-size slide

  39. func newQuotaPool(q int) *quotaPool {
    qb := &quotaPool{
    c: make(chan int, 1),
    }
    if q > 0 {
    qb.c <- q
    } else {
    qb.quota = q
    }
    return qb
    }

    View full-size slide

  40. func newQuotaPool(q int) *quotaPool {
    qb := &quotaPool{
    c: make(chan int, 1),
    }
    if q > 0 {
    qb.c <- q
    } else {
    qb.quota = q
    }
    return qb
    }

    View full-size slide

  41. type quotaPool struct {
    c chan int // positive quota here
    mu sync.Mutex
    quota int // zero or negative quota
    }

    View full-size slide

  42. func (qb *quotaPool) acquire()
    <-chan int {
    return qb.c
    }

    View full-size slide

  43. func (qb *quotaPool) add(n int) {
    qb.mu.Lock(); defer qb.mu.Unlock()
    if qb.quota += n; qb.quota <= 0 {
    return
    }
    select {
    case qb.c <- qb.quota:
    qb.quota = 0
    default:
    }
    }

    View full-size slide

  44. func (qb *quotaPool) cancel() {
    qb.mu.Lock(); defer qb.mu.Unlock()
    select {
    case n := <-qb.c:
    qb.quota += n
    default:
    }
    }

    View full-size slide

  45. Goroutine 10:
    pool.add(0)
    wait(..., pool.aq())
    pool.add(tq-ps)

    View full-size slide

  46. Goroutine 300:
    pool.add(0)
    wait(..., pool.aq())
    pool.cancel()
    Goroutine 400:
    pool.add(0)
    wait(..., pool.aq())
    pool.add(tq-ps)

    View full-size slide

  47. Goroutine 5000:
    pool.add(0)
    wait(..., pool.aq())
    pool.add(tq-ps)
    Goroutine 6000:
    pool.add(0)
    wait(..., pool.aq())
    pool.add(tq-ps)

    View full-size slide

  48. grpc.pre-1005-5.exectrace
    G8006, about to
    stall for one second
    G7962, another
    goroutine running
    the same function
    36µs

    View full-size slide

  49. Goroutine 7962:
    pool.add(0)
    wait(..., pool.aq())
    pool.cancel()
    Goroutine 8006:
    pool.add(0)
    wait(..., pool.aq())
    pool.cancel()

    View full-size slide

  50. - func (qb *quotaPool) cancel() {
    - qb.mu.Lock(); defer qb.mu.Unlock()
    -
    - select {
    - case n := <-qb.c:
    - qb.quota += n
    - default:
    - }
    - }

    View full-size slide

  51. func (qb *quotaPool) add(v int) {
    qb.mu.Lock(); defer qb.mu.Unlock()
    select {
    case n := <-qb.c:
    qb.quota += n
    default:
    }
    if qb.quota += v; qb.quota > 0 {
    qb.c <- qb.quota
    qb.quota = 0
    }
    }

    View full-size slide

  52. func (qb *quotaPool) add(v int) {
    qb.mu.Lock(); defer qb.mu.Unlock()
    select {
    case n := <-qb.c:
    qb.quota += n
    default:
    }
    if qb.quota += v; qb.quota > 0 {
    qb.c <- qb.quota
    qb.quota = 0
    }
    }

    View full-size slide

  53. #2: It's not a panacea

    View full-size slide

  54. Three ways to get data:
    1. Testing with -trace flag
    2. Direct runtime/trace use
    3. net/http/pprof handlers

    View full-size slide

  55. pp-quiet-10.first.trace.redacted.html
    suspicious gaps
    every five seconds
    garbage
    collection
    unlike other
    goroutines

    View full-size slide

  56. pp-quiet-10.first.trace.redacted.html
    nothing else runs
    during these 90ms

    View full-size slide

  57. runtime.ReadMemStats
    • and Go 1.8 or earlier
    • and large (40GB) heap
    • leads to long pauses
    (fixed in Go 1.9)

    View full-size slide

  58. pp-quiet-10.first.trace.redacted.html
    what does this goroutine do?
    how does it spend its CPU time?
    can it run more efficiently?
    can it finish sooner (parallelism)?

    View full-size slide

  59. pp-quiet-10.first.trace.redacted.html
    net/http.(*conn).serve
    what request did it serve?
    which handler func ran?
    what status code was returned?

    View full-size slide

  60. #3: The GC is still improving

    View full-size slide

  61. tc.1.third.trace.redacted.html
    patchy network
    long run queue
    some behavior is
    different during GC
    * this is probably
    bad, but how good
    should we expect?

    View full-size slide

  62. Go garbage collection timeline:
    Go 1: program fully stopped
    Go 1.1: GC uses parallel threads
    Go 1.4: precise collection
    Go 1.5: global pauses <10ms
    Go 1.8: goroutine pauses <100µs

    View full-size slide

  63. garbage
    collection
    Go 1 GC
    whole program
    stopped ("STW")
    user code ("mutator")

    View full-size slide

  64. user code ("mutator")
    garbage
    collection
    Go 1.1 GC
    whole program
    stopped ("STW")

    View full-size slide

  65. garbage
    collection
    Go 1.5 GC
    user code ("mutator")
    whole program
    stopped ("STW")

    View full-size slide

  66. garbage
    collection
    Go 1.8 GC
    whole program
    stopped ("STW")
    user code ("mutator")

    View full-size slide

  67. #3A: Stop-The-World pauses

    View full-size slide

  68. Everything stops when the GC
    begins and ends mark phase
    Stopping everything takes time

    View full-size slide

  69. ta.nopreempt-2.fourth.trace.redacted.html
    GC begins
    stragglers
    GC really begins
    3.6ms
    unused CPU

    View full-size slide

  70. Goroutines can stop when...
    ✓ allocating memory
    ✓ calling functions
    ✓ communicating
    ✗ crunching numbers in a loop

    View full-size slide

  71. Seen in:
    • encoding/base64
    • encoding/json
    • .../golang/protobuf/proto
    it's measurable for 1MB values
    ... if you're looking for it
    * check code lines before "End Stack Trace"

    View full-size slide

  72. go func() {
    for i := 0; i < 1e9; i++ {
    }
    }
    Or write your own tight loop:

    View full-size slide

  73. golang.org/issue/10958
    The Go 1.10 compiler should
    have a general, permanent fix
    Workaround available now

    View full-size slide

  74. ta.preempt-2.second.trace.redacted.html
    quick pauses, less waste

    View full-size slide

  75. #3B: Other awkward pauses

    View full-size slide

  76. A mark/sweep GC:
    • "mark" finds in-use memory
    • "sweep" reclaims the rest

    View full-size slide

  77. GC needs to make progress
    User code works against that
    User code is forced to help out

    View full-size slide

  78. tc.1.third.trace.redacted.html
    read(2) syscall
    * hold 'shift' to draw
    a selection box
    user code in goroutines
    runtime
    work below
    helping
    the GC

    View full-size slide

  79. tc.1.third.trace.redacted.html
    this assist ran for 4.4ms

    View full-size slide

  80. tc.1.third.trace.redacted.html
    this assist ran for 15.6ms
    ... and didn't even finish

    View full-size slide

  81. Most assists are well-deserved
    But they start suddenly
    Sweeping requires assists too
    Don't allocate in critical paths?

    View full-size slide

  82. go tool trace
    http://.../debug/pprof/trace

    View full-size slide

  83. How can the tool help you?
    1. See time-dependent issues
    2. Complements other profiles
    3. Find latency improvements

    View full-size slide

  84. Be prepared:
    practice using the tools
    Thank you!
    @rhyshiltner

    View full-size slide