Slide 1

Slide 1 text

Taming memory: Performance-tuning a (Crystal) application Denis Defreyne / SoundCloud HQ / November 24, 2015 1

Slide 2

Slide 2 text

The contents of this talk aren’t particularly revolutionary. 2 DISCLAIMER

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

4

Slide 5

Slide 5 text

C 5

Slide 6

Slide 6 text

Gosu (Ruby) 6 ✂

Slide 7

Slide 7 text

LÖVE (Lua) 7 ✂

Slide 8

Slide 8 text

8

Slide 9

Slide 9 text

9

Slide 10

Slide 10 text

Rust 10

Slide 11

Slide 11 text

Crystal 11 ✂

Slide 12

Slide 12 text

I don’t know much about game development. 12 DISCLAIMER

Slide 13

Slide 13 text

13

Slide 14

Slide 14 text

14

Slide 15

Slide 15 text

15

Slide 16

Slide 16 text

16

Slide 17

Slide 17 text

17

Slide 18

Slide 18 text

17

Slide 19

Slide 19 text

memory, the game memory, the computer thingie 18

Slide 20

Slide 20 text

Allocating objects 19

Slide 21

Slide 21 text

donkey = Donkey.new(3, "grey") 20

Slide 22

Slide 22 text

donkey = Donkey.allocate donkey.initialize(3, "grey") 21

Slide 23

Slide 23 text

donkey = malloc(6).cast(Donkey) donkey.initialize(3, "grey") 22

Slide 24

Slide 24 text

What is memory? 23

Slide 25

Slide 25 text

24

Slide 26

Slide 26 text

25 0 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 …

Slide 27

Slide 27 text

26 0 1 2 3 4 5 6 7 E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 …

Slide 28

Slide 28 text

27 0 1 2 3 4 5 6 7 3 G R E Y Ø E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 …

Slide 29

Slide 29 text

Freeing memory 28

Slide 30

Slide 30 text

donkey = malloc(6).cast(Donkey) donkey.initialize(3, "grey") free(donkey) 29

Slide 31

Slide 31 text

Garbage collection 30 ✨

Slide 32

Slide 32 text

Calling functions 31

Slide 33

Slide 33 text

32

Slide 34

Slide 34 text

10 LET S = 0 15 MAT INPUT V 20 LET N = NUM 30 IF N = 0 THEN 99 40 FOR I = 1 TO N 45 LET S = S + V(I) 50 NEXT I 60 PRINT S/N 70 GO TO 10 99 END 33

Slide 35

Slide 35 text

10 LET S = 0 15 MAT INPUT V 20 LET N = NUM 30 IF N = 0 THEN 99 40 FOR I = 1 TO N 45 LET S = S + V(I) 50 NEXT I 60 PRINT S/N 70 GO TO 10 99 END 34

Slide 36

Slide 36 text

Structured programming [makes] extensive use of subroutines […] 35

Slide 37

Slide 37 text

return 36

Slide 38

Slide 38 text

9074: eb 4a jmp 90c0 <_mysql_init_character_set+0x124> 9076: 8b 43 f8 mov -0x8(%rbx),%eax 9079: 83 f8 01 cmp $0x1,%eax 907c: 74 04 je 9082 <_mysql_init_character_set+0xe6> 907e: 85 c0 test %eax,%eax 9080: 75 06 jne 9088 <_mysql_init_character_set+0xec> 9082: 4c 8b 7b f0 mov -0x10(%rbx),%r15 9086: eb 38 jmp 90c0 <_mysql_init_character_set+0x124> 9088: 48 8b 4b f0 mov -0x10(%rbx),%rcx 908c: 48 8d 35 15 a3 03 00 lea 0x3a315(%rip),%rsi # 433a8 <_zcfree+0x1fd6> 9093: bf 51 04 00 00 mov $0x451,%edi 9098: 31 d2 xor %edx,%edx 909a: 31 c0 xor %eax,%eax 909c: e8 72 76 02 00 callq 30713 <_my_printf_error> 90a1: 48 8d 35 56 a3 03 00 lea 0x3a356(%rip),%rsi # 433fe <_zcfree+0x202c> 90a8: 4c 8d 3d 61 b9 03 00 lea 0x3b961(%rip),%r15 # 44a10 <_zcfree+0x363e> 90af: bf 51 04 00 00 mov $0x451,%edi 90b4: 31 d2 xor %edx,%edx 90b6: 31 c0 xor %eax,%eax 90b8: 4c 89 f9 mov %r15,%rcx 37

Slide 39

Slide 39 text

0000000000030713 <_my_printf_error>: 30713: 55 push %rbp 30714: 48 89 e5 mov %rsp,%rbp 30717: 41 57 push %r15 30719: 41 56 push %r14 3071b: 41 54 push %r12 3071d: 53 push %rbx 3071e: 48 81 ec d0 02 00 00 sub $0x2d0,%rsp … 30932: 4c 3b 75 e8 cmp -0x18(%rbp),%r14 30936: 75 0c jne 30944 <_my_printf_warning+0xda> 30938: 48 81 c4 d0 02 00 00 add $0x2d0,%rsp 3093f: 5b pop %rbx 30940: 41 5e pop %r14 30942: 5d pop %rbp 30943: c3 retq 38

Slide 40

Slide 40 text

39 main

Slide 41

Slide 41 text

39 my_printf_error main call

Slide 42

Slide 42 text

39 my_printf_error main STACK call (4 byte elements)

Slide 43

Slide 43 text

39 return address my_printf_error main STACK call (4 byte elements)

Slide 44

Slide 44 text

39 return address my_printf_error main STACK call my_vsnprintf_ex call (4 byte elements)

Slide 45

Slide 45 text

39 return address my_printf_error main STACK call my_vsnprintf_ex return address call (4 byte elements)

Slide 46

Slide 46 text

39 return address my_printf_error main STACK call my_vsnprintf_ex call ret (4 byte elements)

Slide 47

Slide 47 text

39 my_printf_error main STACK call ret my_vsnprintf_ex call ret (4 byte elements)

Slide 48

Slide 48 text

Passing arguments 40

Slide 49

Slide 49 text

41

Slide 50

Slide 50 text

41 …

Slide 51

Slide 51 text

41 param 1 …

Slide 52

Slide 52 text

41 param 1 param 2 …

Slide 53

Slide 53 text

41 param 1 param 2 return address …

Slide 54

Slide 54 text

41 param 1 param 2 …

Slide 55

Slide 55 text

41 …

Slide 56

Slide 56 text

42

Slide 57

Slide 57 text

42 …

Slide 58

Slide 58 text

42 param param return address …

Slide 59

Slide 59 text

42 param param return address … param return address

Slide 60

Slide 60 text

42 param param return address … param return address return address

Slide 61

Slide 61 text

42 param param return address … param return address

Slide 62

Slide 62 text

42 param param return address …

Slide 63

Slide 63 text

42 …

Slide 64

Slide 64 text

Storing local variables 43

Slide 65

Slide 65 text

44

Slide 66

Slide 66 text

44 …

Slide 67

Slide 67 text

44 param param return address …

Slide 68

Slide 68 text

44 param param return address … local variable local variable

Slide 69

Slide 69 text

44 param param return address …

Slide 70

Slide 70 text

44 …

Slide 71

Slide 71 text

45

Slide 72

Slide 72 text

Collecting garbage 46

Slide 73

Slide 73 text

47 param 1 param 2 return address local variable 1 local variable 2 local variable 3 local variable local variable

Slide 74

Slide 74 text

47 param 1 param 2 return address local variable 1 local variable 2 local variable 3 local variable local variable

Slide 75

Slide 75 text

47 ? ? local variable local variable

Slide 76

Slide 76 text

48 ? ? local variable local variable

Slide 77

Slide 77 text

49 ? ? Mark phase local variable local variable

Slide 78

Slide 78 text

50 Sweep phase local variable local variable

Slide 79

Slide 79 text

Mark and sweep: the simplest GC technique 51

Slide 80

Slide 80 text

52

Slide 81

Slide 81 text

79% of time spent in the garbage collector‽ 53

Slide 82

Slide 82 text

‽ interrobang — U+203D 54 ( )

Slide 83

Slide 83 text

Stop this GC madness! 55

Slide 84

Slide 84 text

Avoiding garbage collection:
 three techniques 56

Slide 85

Slide 85 text

Avoiding garbage collection
 through explicit reuse 57

Slide 86

Slide 86 text

1_000_000.times do point = Point.new(random(width), random(height)) draw(point) end 58

Slide 87

Slide 87 text

point = Point.new 1_000_000.times do point.x = random(width) point.y = random(height) draw(point) end 59

Slide 88

Slide 88 text

- only works for a single instance - mutating state can lead to bugs 60

Slide 89

Slide 89 text

Avoiding garbage collection
 through memory pooling 61

Slide 90

Slide 90 text

62

Slide 91

Slide 91 text

pool = Pool(Entity).new(1000)
 62

Slide 92

Slide 92 text

pool = Pool(Entity).new(1000)
 entity = pool.acquire
 62

Slide 93

Slide 93 text

pool = Pool(Entity).new(1000)
 entity = pool.acquire
 pool.release(entity) 62

Slide 94

Slide 94 text

+ can reuse multiple objects - memory management is more manual 63

Slide 95

Slide 95 text

Avoiding garbage collection
 through stack allocation 64

Slide 96

Slide 96 text

65

Slide 97

Slide 97 text

65 …

Slide 98

Slide 98 text

65 param param return address …

Slide 99

Slide 99 text

65 param param return address … local variable local variable

Slide 100

Slide 100 text

65 param param return address …

Slide 101

Slide 101 text

65 …

Slide 102

Slide 102 text

class Point getter :x, :y def initialize(@x, @y) end end 66

Slide 103

Slide 103 text

struct Point getter :x, :y def initialize(@x, @y) end end 67

Slide 104

Slide 104 text

1_000_000.times do point = Point.new(random(width), random(height)) draw(point) end 68

Slide 105

Slide 105 text

+ no explicit memory management - only usable for local variables 69

Slide 106

Slide 106 text

Don’t destroy the CPU cache! 70

Slide 107

Slide 107 text

71

Slide 108

Slide 108 text

71 HD

Slide 109

Slide 109 text

71 HD RAM

Slide 110

Slide 110 text

71 HD RAM Cache

Slide 111

Slide 111 text

71 HD RAM Cache ~ 1 000 000 ns

Slide 112

Slide 112 text

71 HD RAM Cache ~ 1 000 000 ns ~ 500 ns

Slide 113

Slide 113 text

71 HD RAM Cache ~ 1 000 000 ns ~ 1-10 ns ~ 500 ns

Slide 114

Slide 114 text

72

Slide 115

Slide 115 text

73

Slide 116

Slide 116 text

If data is available in the cache, we have a cache hit. If it’s not available in the cache, we have a cache miss. 74

Slide 117

Slide 117 text

To avoid cache misses, keep similar data together in memory. 75

Slide 118

Slide 118 text

76 position velocity rotation armor shield

Slide 119

Slide 119 text

77 position velocity (for movement) rotation armor shield

Slide 120

Slide 120 text

77 position velocity 14 used bytes / 32 total bytes = 44% efficiency (for movement) rotation armor shield

Slide 121

Slide 121 text

78

Slide 122

Slide 122 text

79

Slide 123

Slide 123 text

Store similar data in contiguous arrays. 80

Slide 124

Slide 124 text

positions = Pool(Position).new(1000) velocities = Pool(Velocity).new(1000) 81

Slide 125

Slide 125 text

Demo 82

Slide 126

Slide 126 text

Demo: Stack allocation 83

Slide 127

Slide 127 text

84 r

Slide 128

Slide 128 text

84 r

Slide 129

Slide 129 text

#total (2 × r) 2 #inside π × r 2 85 ≈

Slide 130

Slide 130 text

#total (2 × r) 2 #inside π × r 2 85 ≈

Slide 131

Slide 131 text

#total 4 #inside π 86 ≈

Slide 132

Slide 132 text

4 × #inside #total 87 π ≈

Slide 133

Slide 133 text

(use the source, luke) 88

Slide 134

Slide 134 text

Demo: Cache grinding 89

Slide 135

Slide 135 text

(use the source, luke) 90

Slide 136

Slide 136 text

91

Slide 137

Slide 137 text

92 slack @denis / mail [email protected] Denis Defreyne Ask me about anything but potatoes.

Slide 138

Slide 138 text

Extra slides 93

Slide 139

Slide 139 text

94 lldb, gdb debugger Instruments (Mac OS X) performance analyser and visualiser dtrace dynamic tracing framework