Slide 1

Slide 1 text

Profiling and Optimizing Go Programs Alexey Palazhchenko 21 July 2017, GoWayFest

Slide 2

Slide 2 text

Profiling and Optimizing Go Programs Alexey Palazhchenko 21 July 2017, GoWayFest

Slide 3

Slide 3 text

Profiling and Optimizing Go Programs Alexey Palazhchenko 21 July 2017, GoWayFest

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

The Most Important Thing It is better to profile your code
 before you have problems
 to learn how it works normally.

Slide 9

Slide 9 text

type Cache interface { Get(id string) interface{} Set(id string, value interface{}) Len() int }

Slide 10

Slide 10 text

func (s *Slice) Get(id string) interface{} { for _, it := range s.items { if it.id == id { return it.value } } return nil }

Slide 11

Slide 11 text

func (s *Slice) Set(id string, value i{}) { for i, it := range s.items { if it.id == id { s.items[i].value = value return } } s.items = append(s.items, item{id, value}) }

Slide 12

Slide 12 text

b.ResetTimer() for i := 0; i < b.N; i++ { for _, id := range ids { Sink = c.Get(id) } }

Slide 13

Slide 13 text

100 200 300 400 0 1 2 3 4 5 6 7 8 9 10 Slice Map Fancy algorithms are slow when n is small, and n is usually small. - Rob Pike

Slide 14

Slide 14 text

Sink func popcnt(x uint64) int { var res uint64 for ; x > 0; x >>= 1 { res += x & 1 } return int(res) }

Slide 15

Slide 15 text

Sink const m1 = 0x5555555555555555 const m2 = 0x3333333333333333 const m4 = 0x0f0f0f0f0f0f0f0f const h01 = 0x0101010101010101 func popcnt2(x uint64) int { x -= (x >> 1) & m1 x = (x & m2) + ((x >> 2) & m2) x = (x + (x >> 4)) & m4 return int((x * h01) >> 56) }

Slide 16

Slide 16 text

Sink func BenchmarkPopcnt(b *testing.B) { for i := 0; i < b.N; i++ { popcnt(uint64(i)) } } func BenchmarkPopcnt2(b *testing.B) { for i := 0; i < b.N; i++ { popcnt2(uint64(i)) } }

Slide 17

Slide 17 text

Sink go test -v -bench=. BenchmarkPopcnt-4 100000000 15.5 ns/op BenchmarkPopcnt2-4 2000000000 0.34 ns/op

Slide 18

Slide 18 text

Sink go test -v -bench=. BenchmarkPopcnt-4 100000000 15.5 ns/op BenchmarkPopcnt2-4 2000000000 0.34 ns/op

Slide 19

Slide 19 text

Sink • go doc compile • go test -bench=. -gcflags "-S"

Slide 20

Slide 20 text

Sink popcnt_test.go:14 MOVQ "".b+8(FP), AX popcnt_test.go:14 MOVQ $0, CX popcnt_test.go:14 MOVQ 200(AX), DX popcnt_test.go:14 CMPQ CX, DX popcnt_test.go:14 JGE $0, 34 popcnt_test.go:14 INCQ CX popcnt_test.go:14 MOVQ 200(AX), DX popcnt_test.go:14 CMPQ CX, DX popcnt_test.go:14 JLT $0, 19 popcnt_test.go:17 RET

Slide 21

Slide 21 text

Sink func BenchmarkPopcnt(b *testing.B) { for i := 0; i < b.N; i++ { Sink = popcnt(uint64(i)) } } func BenchmarkPopcnt2(b *testing.B) { for i := 0; i < b.N; i++ { Sink = popcnt2(uint64(i)) } }

Slide 22

Slide 22 text

Sink go test -v -bench=. BenchmarkPopcnt-4 50000000 39.3 ns/op BenchmarkPopcnt2-4 50000000 26.8 ns/op

Slide 23

Slide 23 text

Sink env GOSSAFUNC=BenchmarkPopcnt2 go test -bench=.

Slide 24

Slide 24 text

Benchmarks • Do not use virtual machine • Do not touch • Disable CPU throttling • rsc.io/benchstat

Slide 25

Slide 25 text

pprof • runtime/pprof • net/http/pprof • go test • Do not enable more than one at once!

Slide 26

Slide 26 text

pprof: CPU • setitimer(2), ITIMER_PROF, SIGPROF • Up to 500 Hz (100 by default) • SetCPUProfileRate(hz) • go test -bench=XXX -cpuprofile=XXX.pprof • go tool pprof -svg -output=XXX.svg cache.test XXX.pprof

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

pprof: mem/block/mutex • pprof.MemProfileRate = bytes • pprof.SetBlockProfileRate(ns) • pprof.SetMutexProfileFraction(rate)

Slide 30

Slide 30 text

type Map struct { m sync.Mutex items map[string]interface{} } func (m *Map) Get(id string) interface{} { m.m.Lock() defer m.m.Unlock() return m.items[id] }

Slide 31

Slide 31 text

pprof: block • go test -v -bench=XXX -blockprofile=XXX.pprof • go tool pprof -svg -lines -output=XXX.svg ccache.test XXX.pprof

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

type Map struct { m sync.RWMutex items map[string]interface{} } func (m *Map) Get(id string) interface{} { m.m.RLock() v := m.items[id] m.m.RUnlock() return v }

Slide 34

Slide 34 text

pprof: custom profiles • Useful when stack traces are useful • Integrated with go tool pprof • Example: connection opening and closing • pprof.NewProfile, pprof.Lookup • Profile.Add, Remove

Slide 35

Slide 35 text

pprof: custom labels • Adds context to current stack • Allows filtering in CLI and better visualisation • Example: full request URL, SQL query • pprof.Do, pprof.WithLabels • To be released in several days in Go 1.9

Slide 36

Slide 36 text

pprof: UI in Go 1.10

Slide 37

Slide 37 text

pprof: UI in Go 1.10

Slide 38

Slide 38 text

Execution tracer • Goroutines starting, stopping, switching • Blocking on channel operations or select • Blocking on mutexes • Blocking on syscalls

Slide 39

Slide 39 text

Execution tracer • All events with full context • Large files with all symbols, binary not required • ~25% slowdown • CPU pprof is for tracking throughput,
 tracer is for tracking latency

Slide 40

Slide 40 text

Execution tracer • import _ "net/http/pprof" • http://127.0.0.1:8080/debug/pprof • redis-benchmark -r 100000 -e -l -t set,get • go tool trace trace.out

Slide 41

Slide 41 text

Execution tracer • go tool trace -pprof=TYPE trace.out > TYPE.pprof • net • sync • syscall • sched

Slide 42

Slide 42 text

Linux • perf (perf_events) • SystemTap • BPF (eBPF)

Slide 43

Slide 43 text

Environment variables • GOGC (100, off) • GODEBUG • gctrace=1 • allocfreetrace=1 • schedtrace=1000

Slide 44

Slide 44 text

Simple optimizations • struct{} • m[string(b)] • for i, c := range []byte(s) • for i := range s { a[i] = }

Slide 45

Slide 45 text

Simple optimizations type Key [64]byte type Value struct { Name [32]byte Balance uint64 Timestamp int64 } m := make(map[Key]Value, 1e8)

Slide 46

Slide 46 text

Profile your code today
 before it’s too late. • golang-ru @ Google Groups • 4gophers.ru/slack • golangshow.com