Slide 1

Slide 1 text

go tool trace for correct and effective concurrency

Slide 2

Slide 2 text

Let's talk about Go

Slide 3

Slide 3 text

Let's talk about go

Slide 4

Slide 4 text

A special tool for Go's needs * and Chromium

Slide 5

Slide 5 text

goroutine scheduling instrumented

Slide 6

Slide 6 text

Concurrency, and how to manage it

Slide 7

Slide 7 text

Parallelism, and how to exploit it

Slide 8

Slide 8 text

$ go doc runtime/trace package trace // import "runtime/trace" Go execution tracer. The tracer captures a wide range of execution events ... A trace can be analyzed later with 'go tool trace' command.

Slide 9

Slide 9 text

$ go doc cmd/trace Trace is a tool for viewing trace files. ... View the trace in a web browser: go tool trace trace.out

Slide 10

Slide 10 text

pp-quiet-10.first.trace.redacted.html

Slide 11

Slide 11 text

grpc.pre-1005-5.exectrace

Slide 12

Slide 12 text

The tool in three demos: 1. A timing-dependent bug 2. What it doesn't show 3. Latency during GC

Slide 13

Slide 13 text

#1: A race condition

Slide 14

Slide 14 text

go test -race go build -race go install -race

Slide 15

Slide 15 text

================== WARNING: DATA RACE Read at 0x00c420141500 by goroutine 27: ... Previous write at 0x00c420141500 by goroutine 26: ...

Slide 16

Slide 16 text

a logical race, not a data race

Slide 17

Slide 17 text

gRPC, HTTP/2, and flow control

Slide 18

Slide 18 text

grpc.pre-1005-5.exectrace stuff happening stuff not happening

Slide 19

Slide 19 text

$ go tool trace trace.out $ go test -trace=trace.out

Slide 20

Slide 20 text

"View trace"

Slide 21

Slide 21 text

grpc.pre-1005-5.exectrace 1s (from context.WithTimeout)

Slide 22

Slide 22 text

grpc.pre-1005-5.exectrace

Slide 23

Slide 23 text

grpc.pre-1005-5.exectrace Garbage collection 20ms Goroutines running on OS threads

Slide 24

Slide 24 text

"Sync blocking profile"

Slide 25

Slide 25 text

grpc.pre-1005-5.exectrace

Slide 26

Slide 26 text

grpc.pre-1005-5.exectrace one-second unexpected delay

Slide 27

Slide 27 text

grpc.pre-1005-5.exectrace goroutine "name"

Slide 28

Slide 28 text

"Goroutine analysis"

Slide 29

Slide 29 text

grpc.pre-1005-5.exectrace

Slide 30

Slide 30 text

grpc.pre-1005-5.exectrace

Slide 31

Slide 31 text

grpc.pre-1005-5.exectrace 1s

Slide 32

Slide 32 text

grpc.pre-1005-5.exectrace

Slide 33

Slide 33 text

grpc.pre-1005-5.exectrace

Slide 34

Slide 34 text

func wait(ctx context.Context, ..., proceed <-chan int) (int, error) { select { case <-ctx.Done(): // ... case ...: // stream closed, etc case i := <-proceed: return i, nil } }

Slide 35

Slide 35 text

t.sendQuotaPool.add(0) tq, err := wait(s.ctx, ..., t.sendQuotaPool.acquire()) if err != nil && ... { t.sendQuotaPool.cancel() return } // calculate payload size ... t.sendQuotaPool.add(tq-ps)

Slide 36

Slide 36 text

type quotaPool struct { c chan int mu sync.Mutex quota int }

Slide 37

Slide 37 text

func newQuotaPool(q int) *quotaPool { qb := "aPool{ c: make(chan int, 1), } if q > 0 { qb.c <- q } else { qb.quota = q } return qb }

Slide 38

Slide 38 text

func newQuotaPool(q int) *quotaPool { qb := "aPool{ c: make(chan int, 1), } if q > 0 { qb.c <- q } else { qb.quota = q } return qb }

Slide 39

Slide 39 text

func newQuotaPool(q int) *quotaPool { qb := "aPool{ c: make(chan int, 1), } if q > 0 { qb.c <- q } else { qb.quota = q } return qb }

Slide 40

Slide 40 text

func newQuotaPool(q int) *quotaPool { qb := "aPool{ c: make(chan int, 1), } if q > 0 { qb.c <- q } else { qb.quota = q } return qb }

Slide 41

Slide 41 text

type quotaPool struct { c chan int // positive quota here mu sync.Mutex quota int // zero or negative quota }

Slide 42

Slide 42 text

func (qb *quotaPool) acquire() <-chan int { return qb.c }

Slide 43

Slide 43 text

func (qb *quotaPool) add(n int) { qb.mu.Lock(); defer qb.mu.Unlock() if qb.quota += n; qb.quota <= 0 { return } select { case qb.c <- qb.quota: qb.quota = 0 default: } }

Slide 44

Slide 44 text

func (qb *quotaPool) cancel() { qb.mu.Lock(); defer qb.mu.Unlock() select { case n := <-qb.c: qb.quota += n default: } }

Slide 45

Slide 45 text

Goroutine 10: pool.add(0) wait(..., pool.aq()) pool.add(tq-ps)

Slide 46

Slide 46 text

Goroutine 300: pool.add(0) wait(..., pool.aq()) pool.cancel() Goroutine 400: pool.add(0) wait(..., pool.aq()) pool.add(tq-ps)

Slide 47

Slide 47 text

Goroutine 5000: pool.add(0) wait(..., pool.aq()) pool.add(tq-ps) Goroutine 6000: pool.add(0) wait(..., pool.aq()) pool.add(tq-ps)

Slide 48

Slide 48 text

grpc.pre-1005-5.exectrace G8006, about to stall for one second G7962, another goroutine running the same function 36µs

Slide 49

Slide 49 text

Goroutine 7962: pool.add(0) wait(..., pool.aq()) pool.cancel() Goroutine 8006: pool.add(0) wait(..., pool.aq()) pool.cancel()

Slide 50

Slide 50 text

- func (qb *quotaPool) cancel() { - qb.mu.Lock(); defer qb.mu.Unlock() - - select { - case n := <-qb.c: - qb.quota += n - default: - } - }

Slide 51

Slide 51 text

func (qb *quotaPool) add(v int) { qb.mu.Lock(); defer qb.mu.Unlock() select { case n := <-qb.c: qb.quota += n default: } if qb.quota += v; qb.quota > 0 { qb.c <- qb.quota qb.quota = 0 } }

Slide 52

Slide 52 text

func (qb *quotaPool) add(v int) { qb.mu.Lock(); defer qb.mu.Unlock() select { case n := <-qb.c: qb.quota += n default: } if qb.quota += v; qb.quota > 0 { qb.c <- qb.quota qb.quota = 0 } }

Slide 53

Slide 53 text

#2: It's not a panacea

Slide 54

Slide 54 text

Three ways to get data: 1. Testing with -trace flag 2. Direct runtime/trace use 3. net/http/pprof handlers

Slide 55

Slide 55 text

pp-quiet-10.first.trace.redacted.html suspicious gaps every five seconds garbage collection unlike other goroutines

Slide 56

Slide 56 text

pp-quiet-10.first.trace.redacted.html nothing else runs during these 90ms

Slide 57

Slide 57 text

runtime.ReadMemStats • and Go 1.8 or earlier • and large (40GB) heap • leads to long pauses (fixed in Go 1.9)

Slide 58

Slide 58 text

pp-quiet-10.first.trace.redacted.html what does this goroutine do? how does it spend its CPU time? can it run more efficiently? can it finish sooner (parallelism)?

Slide 59

Slide 59 text

pp-quiet-10.first.trace.redacted.html net/http.(*conn).serve what request did it serve? which handler func ran? what status code was returned?

Slide 60

Slide 60 text

#3: The GC is still improving

Slide 61

Slide 61 text

tc.1.third.trace.redacted.html patchy network long run queue some behavior is different during GC * this is probably bad, but how good should we expect?

Slide 62

Slide 62 text

Go garbage collection timeline: Go 1: program fully stopped Go 1.1: GC uses parallel threads Go 1.4: precise collection Go 1.5: global pauses <10ms Go 1.8: goroutine pauses <100µs

Slide 63

Slide 63 text

garbage collection Go 1 GC whole program stopped ("STW") user code ("mutator")

Slide 64

Slide 64 text

user code ("mutator") garbage collection Go 1.1 GC whole program stopped ("STW")

Slide 65

Slide 65 text

garbage collection Go 1.5 GC user code ("mutator") whole program stopped ("STW")

Slide 66

Slide 66 text

garbage collection Go 1.8 GC whole program stopped ("STW") user code ("mutator")

Slide 67

Slide 67 text

#3A: Stop-The-World pauses

Slide 68

Slide 68 text

Everything stops when the GC begins and ends mark phase Stopping everything takes time

Slide 69

Slide 69 text

ta.nopreempt-2.fourth.trace.redacted.html GC begins stragglers GC really begins 3.6ms unused CPU

Slide 70

Slide 70 text

Goroutines can stop when... ✓ allocating memory ✓ calling functions ✓ communicating ✗ crunching numbers in a loop

Slide 71

Slide 71 text

Seen in: • encoding/base64 • encoding/json • .../golang/protobuf/proto it's measurable for 1MB values ... if you're looking for it * check code lines before "End Stack Trace"

Slide 72

Slide 72 text

go func() { for i := 0; i < 1e9; i++ { } } Or write your own tight loop:

Slide 73

Slide 73 text

golang.org/issue/10958 The Go 1.10 compiler should have a general, permanent fix Workaround available now

Slide 74

Slide 74 text

ta.preempt-2.second.trace.redacted.html quick pauses, less waste

Slide 75

Slide 75 text

#3B: Other awkward pauses

Slide 76

Slide 76 text

A mark/sweep GC: • "mark" finds in-use memory • "sweep" reclaims the rest

Slide 77

Slide 77 text

GC needs to make progress User code works against that User code is forced to help out

Slide 78

Slide 78 text

tc.1.third.trace.redacted.html read(2) syscall * hold 'shift' to draw a selection box user code in goroutines runtime work below helping the GC

Slide 79

Slide 79 text

tc.1.third.trace.redacted.html this assist ran for 4.4ms

Slide 80

Slide 80 text

tc.1.third.trace.redacted.html this assist ran for 15.6ms ... and didn't even finish

Slide 81

Slide 81 text

Most assists are well-deserved But they start suddenly Sweeping requires assists too Don't allocate in critical paths?

Slide 82

Slide 82 text

go tool trace http://.../debug/pprof/trace

Slide 83

Slide 83 text

How can the tool help you? 1. See time-dependent issues 2. Complements other profiles 3. Find latency improvements

Slide 84

Slide 84 text

Be prepared: practice using the tools Thank you! @rhyshiltner