Slide 1

Slide 1 text

Wanna Go So You Fast? Strange Loop 2017 @tyler_treat

Slide 2

Slide 2 text

@tyler_treat @tyler_treat

Slide 3

Slide 3 text

@tyler_treat this one weird trick Make your code faster with

Slide 4

Slide 4 text

@tyler_treat this one weird trick Make your code faster with

Slide 5

Slide 5 text

@tyler_treat So You Wanna Subvert Go?

Slide 6

Slide 6 text

@tyler_treat Spoiler Alert:
 Go is not a
 systems language…

Slide 7

Slide 7 text

@tyler_treat but that doesn’t mean you can’t build internet-scale systems with it.

Slide 8

Slide 8 text

@tyler_treat

Slide 9

Slide 9 text

@tyler_treat This is a talk about how to write terrible Go code.

Slide 10

Slide 10 text

@tyler_treat @tyler_treat

Slide 11

Slide 11 text

@tyler_treat Because this is a talk about trade-offs.

Slide 12

Slide 12 text

@tyler_treat - Messaging Nerd @ Apcera - Working on nats.io - Distributed systems - bravenewgeek.com Tyler Treat

Slide 13

Slide 13 text

@tyler_treat @tyler_treat

Slide 14

Slide 14 text

@tyler_treat matter? Why does this talk

Slide 15

Slide 15 text

@tyler_treat The compiler isn’t magic.

Slide 16

Slide 16 text

@tyler_treat The compiler isn’t magic.

Slide 17

Slide 17 text

@tyler_treat You have to be
 mindful of performance
 when it matters.

Slide 18

Slide 18 text

@tyler_treat @tyler_treat Where bad things hide

Slide 19

Slide 19 text

@tyler_treat @tyler_treat Where bad things hide Where we’re usually looking

Slide 20

Slide 20 text

@tyler_treat Tire fires at scale @tyler_treat

Slide 21

Slide 21 text

@tyler_treat @tyler_treat @tyler_treat

Slide 22

Slide 22 text

@tyler_treat @tyler_treat @tyler_treat

Slide 23

Slide 23 text

@tyler_treat @tyler_treat @tyler_treat

Slide 24

Slide 24 text

@tyler_treat Overview - Measuring performance - Language features - Memory management - Concurrency and multi-core

Slide 25

Slide 25 text

@tyler_treat Overview - Measuring performance - Language features - Memory management - Concurrency and multi-core

Slide 26

Slide 26 text

@tyler_treat Disclaimer:
 Don’t blindly apply optimizations presented.

Slide 27

Slide 27 text

@tyler_treat tl;dr of this talk is
 “IT DEPENDS!”

Slide 28

Slide 28 text

@tyler_treat Measure Optimize

Slide 29

Slide 29 text

@tyler_treat Measurement Techniques - pprof
 - memory
 - cpu
 - blocking - GODEBUG
 - gctrace
 - schedtrace
 - allocfreetrace - Benchmarking
 - Code-level: testing.B
 - System-level: HdrHistogram (https://github.com/codahale/hdrhistogram)
 bench (https://github.com/tylertreat/bench)

Slide 30

Slide 30 text

@tyler_treat @tyler_treat

Slide 31

Slide 31 text

@tyler_treat The only way to get good at something is to be really fucking bad at it
 for a long time.

Slide 32

Slide 32 text

@tyler_treat Benchmarking… a great way to rattle the
 Hacker News fart chamber.

Slide 33

Slide 33 text

@tyler_treat Overview - Measuring performance - Language features - Memory management - Concurrency and multi-core

Slide 34

Slide 34 text

@tyler_treat channels

Slide 35

Slide 35 text

@tyler_treat “Instead of explicitly using locks to mediate access to shared data, Go encourages the use of channels to pass references to data between goroutines.” https://blog.golang.org/share-memory-by-communicating

Slide 36

Slide 36 text

@tyler_treat @tyler_treat

Slide 37

Slide 37 text

@tyler_treat @tyler_treat USE CHANNELS TO COORDINATE, NOT SYNCHRONIZE.

Slide 38

Slide 38 text

@tyler_treat @tyler_treat

Slide 39

Slide 39 text

@tyler_treat @tyler_treat

Slide 40

Slide 40 text

@tyler_treat defer

Slide 41

Slide 41 text

@tyler_treat @tyler_treat

Slide 42

Slide 42 text

@tyler_treat Is defer still slow?

Slide 43

Slide 43 text

@tyler_treat @tyler_treat

Slide 44

Slide 44 text

@tyler_treat The Secret Life of interface{}

Slide 45

Slide 45 text

@tyler_treat type Stringer interface {
 String() string
 } https://research.swtch.com/interfaces

Slide 46

Slide 46 text

@tyler_treat type Stringer interface {
 String() string
 }
 type Binary uint64 https://research.swtch.com/interfaces

Slide 47

Slide 47 text

@tyler_treat type Stringer interface {
 String() string
 }
 type Binary uint64 200 b := Binary(200) https://research.swtch.com/interfaces

Slide 48

Slide 48 text

@tyler_treat type Stringer interface {
 String() string
 }
 type Binary uint64
 func (i Binary) String() string { return strconv.FormatUint(uint64(i), 2) } 200 b := Binary(200) https://research.swtch.com/interfaces

Slide 49

Slide 49 text

@tyler_treat type Stringer interface {
 String() string
 } https://research.swtch.com/interfaces s := Stringer(b) Stringer tab data

Slide 50

Slide 50 text

@tyler_treat s := Stringer(b) Stringer tab data .
 .
 . itable(Stringer, Binary) type fun[0] type(Binary) (*Binary).String type Stringer interface {
 String() string
 } https://research.swtch.com/interfaces

Slide 51

Slide 51 text

@tyler_treat tab data 200 Binary s := Stringer(b) Stringer .
 .
 . itable(Stringer, Binary) type fun[0] type(Binary) (*Binary).String type Stringer interface {
 String() string
 } https://research.swtch.com/interfaces

Slide 52

Slide 52 text

@tyler_treat

Slide 53

Slide 53 text

@tyler_treat So what?

Slide 54

Slide 54 text

@tyler_treat @tyler_treat

Slide 55

Slide 55 text

@tyler_treat @tyler_treat

Slide 56

Slide 56 text

@tyler_treat @tyler_treat Sorting 100M Interfaces

Slide 57

Slide 57 text

@tyler_treat @tyler_treat Sorting 100M Interfaces

Slide 58

Slide 58 text

@tyler_treat @tyler_treat Sorting 100M Structs

Slide 59

Slide 59 text

@tyler_treat @tyler_treat Sorting 100M Structs

Slide 60

Slide 60 text

@tyler_treat $ go test -bench=. -gcflags="-m"

Slide 61

Slide 61 text

@tyler_treat $ go test -bench=. -gcflags="-m"

Slide 62

Slide 62 text

@tyler_treat @tyler_treat

Slide 63

Slide 63 text

@tyler_treat $ go test -bench=. -gcflags="-l"

Slide 64

Slide 64 text

@tyler_treat @tyler_treat Struct
 No Inlining Interface
 No Inlining

Slide 65

Slide 65 text

@tyler_treat @tyler_treat Struct
 No Inlining Interface
 No Inlining

Slide 66

Slide 66 text

@tyler_treat @tyler_treat Struct
 No Inlining Interface
 No Inlining

Slide 67

Slide 67 text

@tyler_treat @tyler_treat

Slide 68

Slide 68 text

@tyler_treat @tyler_treat x.(*T) inlined

Slide 69

Slide 69 text

@tyler_treat @tyler_treat SSA backend &
 remaining type
 conversions inlined x.(*T) inlined

Slide 70

Slide 70 text

@tyler_treat @tyler_treat

Slide 71

Slide 71 text

@tyler_treat

Slide 72

Slide 72 text

@tyler_treat @tyler_treat Struct Interface

Slide 73

Slide 73 text

@tyler_treat @tyler_treat Struct Interface

Slide 74

Slide 74 text

@tyler_treat @tyler_treat

Slide 75

Slide 75 text

@tyler_treat $ go test -bench=. -gcflags="-S"

Slide 76

Slide 76 text

@tyler_treat $ go test -bench=. -gcflags="-S"

Slide 77

Slide 77 text

@tyler_treat $ go test -bench=. -gcflags="-S"

Slide 78

Slide 78 text

@tyler_treat Key Insight: If performance matters,
 write type-specific code.

Slide 79

Slide 79 text

@tyler_treat Overview - Measuring performance - Language features - Memory management - Concurrency and multi-core

Slide 80

Slide 80 text

@tyler_treat []byte to string
 conversions

Slide 81

Slide 81 text

@tyler_treat

Slide 82

Slide 82 text

@tyler_treat @tyler_treat

Slide 83

Slide 83 text

@tyler_treat @tyler_treat

Slide 84

Slide 84 text

@tyler_treat What’s going on here?

Slide 85

Slide 85 text

@tyler_treat @tyler_treat

Slide 86

Slide 86 text

@tyler_treat memory allocation

Slide 87

Slide 87 text

@tyler_treat @tyler_treat

Slide 88

Slide 88 text

@tyler_treat How is sync.Pool so fast?

Slide 89

Slide 89 text

@tyler_treat Per-CPU storage!

Slide 90

Slide 90 text

@tyler_treat @tyler_treat https://golang.org/src/sync/pool.go

Slide 91

Slide 91 text

@tyler_treat @tyler_treat https://golang.org/src/sync/pool.go

Slide 92

Slide 92 text

@tyler_treat @tyler_treat

Slide 93

Slide 93 text

@tyler_treat Overview - Measuring performance - Language features - Memory management - Concurrency and multi-core

Slide 94

Slide 94 text

@tyler_treat “We generally don’t want sync/atomic to be used at all…Experience has shown us again and again that very very few people are capable of writing correct code that uses atomic operations…” —Ian Lance Taylor

Slide 95

Slide 95 text

@tyler_treat

Slide 96

Slide 96 text

@tyler_treat @tyler_treat Subscribers Messages Fast Topic Matching http://bravenewgeek.com/fast-topic-matching/

Slide 97

Slide 97 text

@tyler_treat @tyler_treat Subscribers Messages Fast Topic Matching http://bravenewgeek.com/fast-topic-matching/

Slide 98

Slide 98 text

@tyler_treat @tyler_treat Fast Topic Matching

Slide 99

Slide 99 text

@tyler_treat @tyler_treat Fast Topic Matching

Slide 100

Slide 100 text

@tyler_treat @tyler_treat

Slide 101

Slide 101 text

@tyler_treat @tyler_treat Fast Topic Matching

Slide 102

Slide 102 text

@tyler_treat @tyler_treat Concurrent
 80,000 inserts
 80,000 lookups


Slide 103

Slide 103 text

@tyler_treat @tyler_treat Ctrie

Slide 104

Slide 104 text

@tyler_treat @tyler_treat G1 G1 1. Assign a generation, G1, to each
 I-node (empty struct). Ctrie

Slide 105

Slide 105 text

@tyler_treat 1. Assign a generation, G1, to each
 I-node (empty struct). 2. Add new node by copying I-node with updated branch and generation then GCAS, i.e. atomically:
 - compare I-nodes to detect tree
 mutations.
 - compare root generations to detect
 snapshots. @tyler_treat G2 G1 Ctrie

Slide 106

Slide 106 text

@tyler_treat @tyler_treat

Slide 107

Slide 107 text

@tyler_treat @tyler_treat

Slide 108

Slide 108 text

@tyler_treat The Go race detector
 doesn’t protect you from
 doing dumb stuff.

Slide 109

Slide 109 text

@tyler_treat @tyler_treat

Slide 110

Slide 110 text

@tyler_treat @tyler_treat

Slide 111

Slide 111 text

@tyler_treat @tyler_treat

Slide 112

Slide 112 text

@tyler_treat Side note:
 unsafe is, in fact, unsafe.

Slide 113

Slide 113 text

@tyler_treat “Packages that import unsafe may depend on internal properties of the Go implementation. We reserve the right to make changes to the implementation that may break such programs.” https://golang.org/doc/go1compat

Slide 114

Slide 114 text

@tyler_treat

Slide 115

Slide 115 text

@tyler_treat Key Insight: Struct layout can make
 a big difference.

Slide 116

Slide 116 text

@tyler_treat @tyler_treat Mechanical Sympathy

Slide 117

Slide 117 text

@tyler_treat https://github.com/Workiva/go-datastructures/blob/master/queue/ring.go @tyler_treat

Slide 118

Slide 118 text

@tyler_treat @tyler_treat

Slide 119

Slide 119 text

@tyler_treat @tyler_treat

Slide 120

Slide 120 text

@tyler_treat @tyler_treat

Slide 121

Slide 121 text

@tyler_treat @tyler_treat https://golang.org/src/sync/rwmutex.go

Slide 122

Slide 122 text

@tyler_treat @tyler_treat https://golang.org/src/sync/rwmutex.go

Slide 123

Slide 123 text

@tyler_treat CPU reader reader reader RWMutex

Slide 124

Slide 124 text

@tyler_treat CPU reader reader CPU reader reader reader RWMutex

Slide 125

Slide 125 text

@tyler_treat CPU reader reader CPU reader reader reader reader CPU reader reader reader RWMutex

Slide 126

Slide 126 text

@tyler_treat CPU reader reader CPU reader reader reader reader CPU reader reader CPU reader reader reader RWMutex

Slide 127

Slide 127 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader

Slide 128

Slide 128 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader

Slide 129

Slide 129 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader

Slide 130

Slide 130 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader

Slide 131

Slide 131 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader

Slide 132

Slide 132 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader

Slide 133

Slide 133 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex

Slide 134

Slide 134 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex

Slide 135

Slide 135 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex

Slide 136

Slide 136 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex

Slide 137

Slide 137 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex

Slide 138

Slide 138 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex

Slide 139

Slide 139 text

@tyler_treat RWMutex CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex

Slide 140

Slide 140 text

CPU reader CPU reader reader reader CPU reader reader CPU reader reader U writer CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader CPU reader reader CPU reader reader reader writer reader reader CPU reader reader reader reader CPU reader reader reader reader reader reader reader reader reader reader reader reader U reader reader ader ader U reader reader ader ader ader reader ader CPU read read reader reader CPU read read reader reader CPU read reader read reader @tyler_treat

Slide 141

Slide 141 text

@tyler_treat @tyler_treat

Slide 142

Slide 142 text

@tyler_treat How to create
 CPU->RWMutex
 mapping?

Slide 143

Slide 143 text

@tyler_treat @tyler_treat https://github.com/jonhoo/drwmutex/blob/master/cpu_amd64.s

Slide 144

Slide 144 text

@tyler_treat /proc/cpuinfo

Slide 145

Slide 145 text

@tyler_treat @tyler_treat

Slide 146

Slide 146 text

@tyler_treat memory RWMutex1 24 bytes

Slide 147

Slide 147 text

@tyler_treat RWMutex1 RWMutex2 memory 24 bytes

Slide 148

Slide 148 text

@tyler_treat RWMutex1 RWMutex2 RWMutex3 memory 24 bytes

Slide 149

Slide 149 text

@tyler_treat RWMutex1 RWMutex2 RWMutex3 RWMutexN … memory 24 bytes

Slide 150

Slide 150 text

@tyler_treat RWMutex1 RWMutex2 RWMutex3 RWMutexN … memory 24 bytes 64 bytes (cache line size)

Slide 151

Slide 151 text

@tyler_treat RWMutex1 RWMutex2 RWMutex3 RWMutexN … memory 24 bytes 64 bytes (cache line size) Cache rules everything around me

Slide 152

Slide 152 text

@tyler_treat https://github.com/jonhoo/drwmutex/blob/master/drwmutex.go @tyler_treat

Slide 153

Slide 153 text

@tyler_treat https://github.com/jonhoo/drwmutex/blob/master/drwmutex.go @tyler_treat

Slide 154

Slide 154 text

@tyler_treat padding … 64 bytes (cache line size) memory 24 bytes RWMutex1 Cache rules everything around me

Slide 155

Slide 155 text

@tyler_treat @tyler_treat

Slide 156

Slide 156 text

@tyler_treat @tyler_treat

Slide 157

Slide 157 text

@tyler_treat @tyler_treat

Slide 158

Slide 158 text

@tyler_treat Go makes concurrency
 easy enough to be dangerous.

Slide 159

Slide 159 text

@tyler_treat Conclusions

Slide 160

Slide 160 text

@tyler_treat The standard library provides
 general solutions (and they’re
 generally what you should use). 1

Slide 161

Slide 161 text

@tyler_treat Seemingly small, idiomatic
 decisions can have profound
 performance implications. 2

Slide 162

Slide 162 text

@tyler_treat The Go toolchain has lots
 of tools for analyzing your
 code—learn them. 3

Slide 163

Slide 163 text

@tyler_treat Go’s compiler and runtime
 continue to improve. 4

Slide 164

Slide 164 text

@tyler_treat Performance profile can
 change dramatically
 between releases. 5

Slide 165

Slide 165 text

@tyler_treat Relying on assumptions
 can be fatal. 6

Slide 166

Slide 166 text

@tyler_treat Code is marginal,
 architecture is material. 7

Slide 167

Slide 167 text

@tyler_treat Peeking behind the curtains
 can pay dividends. 8

Slide 168

Slide 168 text

@tyler_treat Above all, optimize for the
 right trade-off. 9

Slide 169

Slide 169 text

@tyler_treat Thanks!