Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profiling and Optimizing Go Programs

Profiling and Optimizing Go Programs

Over the years Go compiler become smarter and standard library become faster, yet sometimes one still needs to fix a bottleneck, to resolve a locking problem and understand garbage collector pauses. My talk is about all this. We will learn to write proper benchmarks, use pprof and execution tracer, will look into runtime debug information and will touch a Linux performance tools. I will also mention a few of Go compiler optimizations and how not to accidentally disable them.

http://goway.io

Alexey Palazhchenko

July 21, 2017
Tweet

More Decks by Alexey Palazhchenko

Other Decks in Programming

Transcript

  1. The Most Important Thing It is better to profile your

    code
 before you have problems
 to learn how it works normally.
  2. func (s *Slice) Get(id string) interface{} { for _, it

    := range s.items { if it.id == id { return it.value } } return nil }
  3. func (s *Slice) Set(id string, value i{}) { for i,

    it := range s.items { if it.id == id { s.items[i].value = value return } } s.items = append(s.items, item{id, value}) }
  4. b.ResetTimer() for i := 0; i < b.N; i++ {

    for _, id := range ids { Sink = c.Get(id) } }
  5. 100 200 300 400 0 1 2 3 4 5

    6 7 8 9 10 Slice Map Fancy algorithms are slow when n is small, and n is usually small. - Rob Pike
  6. Sink func popcnt(x uint64) int { var res uint64 for

    ; x > 0; x >>= 1 { res += x & 1 } return int(res) }
  7. Sink const m1 = 0x5555555555555555 const m2 = 0x3333333333333333 const

    m4 = 0x0f0f0f0f0f0f0f0f const h01 = 0x0101010101010101 func popcnt2(x uint64) int { x -= (x >> 1) & m1 x = (x & m2) + ((x >> 2) & m2) x = (x + (x >> 4)) & m4 return int((x * h01) >> 56) }
  8. Sink func BenchmarkPopcnt(b *testing.B) { for i := 0; i

    < b.N; i++ { popcnt(uint64(i)) } } func BenchmarkPopcnt2(b *testing.B) { for i := 0; i < b.N; i++ { popcnt2(uint64(i)) } }
  9. Sink popcnt_test.go:14 MOVQ "".b+8(FP), AX popcnt_test.go:14 MOVQ $0, CX popcnt_test.go:14

    MOVQ 200(AX), DX popcnt_test.go:14 CMPQ CX, DX popcnt_test.go:14 JGE $0, 34 popcnt_test.go:14 INCQ CX popcnt_test.go:14 MOVQ 200(AX), DX popcnt_test.go:14 CMPQ CX, DX popcnt_test.go:14 JLT $0, 19 popcnt_test.go:17 RET
  10. Sink func BenchmarkPopcnt(b *testing.B) { for i := 0; i

    < b.N; i++ { Sink = popcnt(uint64(i)) } } func BenchmarkPopcnt2(b *testing.B) { for i := 0; i < b.N; i++ { Sink = popcnt2(uint64(i)) } }
  11. Benchmarks • Do not use virtual machine • Do not

    touch • Disable CPU throttling • rsc.io/benchstat
  12. pprof: CPU • setitimer(2), ITIMER_PROF, SIGPROF • Up to 500

    Hz (100 by default) • SetCPUProfileRate(hz) • go test -bench=XXX -cpuprofile=XXX.pprof • go tool pprof -svg -output=XXX.svg cache.test XXX.pprof
  13. type Map struct { m sync.Mutex items map[string]interface{} } func

    (m *Map) Get(id string) interface{} { m.m.Lock() defer m.m.Unlock() return m.items[id] }
  14. pprof: block • go test -v -bench=XXX -blockprofile=XXX.pprof • go

    tool pprof -svg -lines -output=XXX.svg ccache.test XXX.pprof
  15. type Map struct { m sync.RWMutex items map[string]interface{} } func

    (m *Map) Get(id string) interface{} { m.m.RLock() v := m.items[id] m.m.RUnlock() return v }
  16. pprof: custom profiles • Useful when stack traces are useful

    • Integrated with go tool pprof • Example: connection opening and closing • pprof.NewProfile, pprof.Lookup • Profile.Add, Remove
  17. pprof: custom labels • Adds context to current stack •

    Allows filtering in CLI and better visualisation • Example: full request URL, SQL query • pprof.Do, pprof.WithLabels • To be released in several days in Go 1.9
  18. Execution tracer • Goroutines starting, stopping, switching • Blocking on

    channel operations or select • Blocking on mutexes • Blocking on syscalls
  19. Execution tracer • All events with full context • Large

    files with all symbols, binary not required • ~25% slowdown • CPU pprof is for tracking throughput,
 tracer is for tracking latency
  20. Simple optimizations • struct{} • m[string(b)] • for i, c

    := range []byte(s) • for i := range s { a[i] = <zero value> }
  21. Simple optimizations type Key [64]byte type Value struct { Name

    [32]byte Balance uint64 Timestamp int64 } m := make(map[Key]Value, 1e8)
  22. Profile your code today
 before it’s too late. • golang-ru

    @ Google Groups • 4gophers.ru/slack • golangshow.com