Profiling and Optimizing Go Programs

Profiling and Optimizing Go Programs

Over the years Go compiler become smarter and standard library become faster, yet sometimes one still needs to fix a bottleneck, to resolve a locking problem and understand garbage collector pauses. My talk is about all this. We will learn to write proper benchmarks, use pprof and execution tracer, will look into runtime debug information and will touch a Linux performance tools. I will also mention a few of Go compiler optimizations and how not to accidentally disable them.

http://goway.io

4618c5e97c59abd315cc2d7dc809f8c8?s=128

Alexey Palazhchenko

July 21, 2017
Tweet

Transcript

  1. Profiling and Optimizing Go Programs Alexey Palazhchenko 21 July 2017,

    GoWayFest
  2. Profiling and Optimizing Go Programs Alexey Palazhchenko 21 July 2017,

    GoWayFest
  3. Profiling and Optimizing Go Programs Alexey Palazhchenko 21 July 2017,

    GoWayFest
  4. None
  5. None
  6. None
  7. None
  8. The Most Important Thing It is better to profile your

    code
 before you have problems
 to learn how it works normally.
  9. type Cache interface { Get(id string) interface{} Set(id string, value

    interface{}) Len() int }
  10. func (s *Slice) Get(id string) interface{} { for _, it

    := range s.items { if it.id == id { return it.value } } return nil }
  11. func (s *Slice) Set(id string, value i{}) { for i,

    it := range s.items { if it.id == id { s.items[i].value = value return } } s.items = append(s.items, item{id, value}) }
  12. b.ResetTimer() for i := 0; i < b.N; i++ {

    for _, id := range ids { Sink = c.Get(id) } }
  13. 100 200 300 400 0 1 2 3 4 5

    6 7 8 9 10 Slice Map Fancy algorithms are slow when n is small, and n is usually small. - Rob Pike
  14. Sink func popcnt(x uint64) int { var res uint64 for

    ; x > 0; x >>= 1 { res += x & 1 } return int(res) }
  15. Sink const m1 = 0x5555555555555555 const m2 = 0x3333333333333333 const

    m4 = 0x0f0f0f0f0f0f0f0f const h01 = 0x0101010101010101 func popcnt2(x uint64) int { x -= (x >> 1) & m1 x = (x & m2) + ((x >> 2) & m2) x = (x + (x >> 4)) & m4 return int((x * h01) >> 56) }
  16. Sink func BenchmarkPopcnt(b *testing.B) { for i := 0; i

    < b.N; i++ { popcnt(uint64(i)) } } func BenchmarkPopcnt2(b *testing.B) { for i := 0; i < b.N; i++ { popcnt2(uint64(i)) } }
  17. Sink go test -v -bench=. BenchmarkPopcnt-4 100000000 15.5 ns/op BenchmarkPopcnt2-4

    2000000000 0.34 ns/op
  18. Sink go test -v -bench=. BenchmarkPopcnt-4 100000000 15.5 ns/op BenchmarkPopcnt2-4

    2000000000 0.34 ns/op
  19. Sink • go doc compile • go test -bench=. -gcflags

    "-S"
  20. Sink popcnt_test.go:14 MOVQ "".b+8(FP), AX popcnt_test.go:14 MOVQ $0, CX popcnt_test.go:14

    MOVQ 200(AX), DX popcnt_test.go:14 CMPQ CX, DX popcnt_test.go:14 JGE $0, 34 popcnt_test.go:14 INCQ CX popcnt_test.go:14 MOVQ 200(AX), DX popcnt_test.go:14 CMPQ CX, DX popcnt_test.go:14 JLT $0, 19 popcnt_test.go:17 RET
  21. Sink func BenchmarkPopcnt(b *testing.B) { for i := 0; i

    < b.N; i++ { Sink = popcnt(uint64(i)) } } func BenchmarkPopcnt2(b *testing.B) { for i := 0; i < b.N; i++ { Sink = popcnt2(uint64(i)) } }
  22. Sink go test -v -bench=. BenchmarkPopcnt-4 50000000 39.3 ns/op BenchmarkPopcnt2-4

    50000000 26.8 ns/op
  23. Sink env GOSSAFUNC=BenchmarkPopcnt2 go test -bench=.

  24. Benchmarks • Do not use virtual machine • Do not

    touch • Disable CPU throttling • rsc.io/benchstat
  25. pprof • runtime/pprof • net/http/pprof • go test • Do

    not enable more than one at once!
  26. pprof: CPU • setitimer(2), ITIMER_PROF, SIGPROF • Up to 500

    Hz (100 by default) • SetCPUProfileRate(hz) • go test -bench=XXX -cpuprofile=XXX.pprof • go tool pprof -svg -output=XXX.svg cache.test XXX.pprof
  27. None
  28. None
  29. pprof: mem/block/mutex • pprof.MemProfileRate = bytes • pprof.SetBlockProfileRate(ns) • pprof.SetMutexProfileFraction(rate)

  30. type Map struct { m sync.Mutex items map[string]interface{} } func

    (m *Map) Get(id string) interface{} { m.m.Lock() defer m.m.Unlock() return m.items[id] }
  31. pprof: block • go test -v -bench=XXX -blockprofile=XXX.pprof • go

    tool pprof -svg -lines -output=XXX.svg ccache.test XXX.pprof
  32. None
  33. type Map struct { m sync.RWMutex items map[string]interface{} } func

    (m *Map) Get(id string) interface{} { m.m.RLock() v := m.items[id] m.m.RUnlock() return v }
  34. pprof: custom profiles • Useful when stack traces are useful

    • Integrated with go tool pprof • Example: connection opening and closing • pprof.NewProfile, pprof.Lookup • Profile.Add, Remove
  35. pprof: custom labels • Adds context to current stack •

    Allows filtering in CLI and better visualisation • Example: full request URL, SQL query • pprof.Do, pprof.WithLabels • To be released in several days in Go 1.9
  36. pprof: UI in Go 1.10

  37. pprof: UI in Go 1.10

  38. Execution tracer • Goroutines starting, stopping, switching • Blocking on

    channel operations or select • Blocking on mutexes • Blocking on syscalls
  39. Execution tracer • All events with full context • Large

    files with all symbols, binary not required • ~25% slowdown • CPU pprof is for tracking throughput,
 tracer is for tracking latency
  40. Execution tracer • import _ "net/http/pprof" • http://127.0.0.1:8080/debug/pprof • redis-benchmark

    -r 100000 -e -l -t set,get • go tool trace trace.out
  41. Execution tracer • go tool trace -pprof=TYPE trace.out > TYPE.pprof

    • net • sync • syscall • sched
  42. Linux • perf (perf_events) • SystemTap • BPF (eBPF)

  43. Environment variables • GOGC (100, off) • GODEBUG • gctrace=1

    • allocfreetrace=1 • schedtrace=1000
  44. Simple optimizations • struct{} • m[string(b)] • for i, c

    := range []byte(s) • for i := range s { a[i] = <zero value> }
  45. Simple optimizations type Key [64]byte type Value struct { Name

    [32]byte Balance uint64 Timestamp int64 } m := make(map[Key]Value, 1e8)
  46. Profile your code today
 before it’s too late. • golang-ru

    @ Google Groups • 4gophers.ru/slack • golangshow.com