Slide 1

Slide 1 text

Taming memory: Performance-tuning a (Crystal) application Denis Defreyne / RUG::B / December 3, 2015 1

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

The contents of this talk aren’t particularly revolutionary. 3 DISCLAIMER

Slide 4

Slide 4 text

4

Slide 5

Slide 5 text

5

Slide 6

Slide 6 text

Crystal 6

Slide 7

Slide 7 text

I don’t know much about game development. 7 DISCLAIMER

Slide 8

Slide 8 text

8

Slide 9

Slide 9 text

9

Slide 10

Slide 10 text

10

Slide 11

Slide 11 text

11

Slide 12

Slide 12 text

12

Slide 13

Slide 13 text

13

Slide 14

Slide 14 text

memory, the game memory, the computer thingie 14

Slide 15

Slide 15 text

Allocating objects 15

Slide 16

Slide 16 text

donkey = Donkey.new(3, "grey") 16

Slide 17

Slide 17 text

donkey = Donkey.allocate donkey.initialize(3, "grey") 17

Slide 18

Slide 18 text

donkey = malloc(6).cast(Donkey) donkey.initialize(3, "grey") 18

Slide 19

Slide 19 text

What is memory? 19

Slide 20

Slide 20 text

20

Slide 21

Slide 21 text

21 0 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 …

Slide 22

Slide 22 text

22 0 1 2 3 4 5 6 7 E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 …

Slide 23

Slide 23 text

23 0 1 2 3 4 5 6 7 3 G R E Y Ø E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 …

Slide 24

Slide 24 text

Freeing memory 24

Slide 25

Slide 25 text

donkey = malloc(6).cast(Donkey) donkey.initialize(3, "grey") free(donkey) 25

Slide 26

Slide 26 text

Garbage collection 26 ✨

Slide 27

Slide 27 text

Calling functions 27

Slide 28

Slide 28 text

28

Slide 29

Slide 29 text

10 LET S = 0 15 MAT INPUT V 20 LET N = NUM 30 IF N = 0 THEN 99 40 FOR I = 1 TO N 45 LET S = S + V(I) 50 NEXT I 60 PRINT S/N 70 GO TO 10 99 END 29

Slide 30

Slide 30 text

10 LET S = 0 15 MAT INPUT V 20 LET N = NUM 30 IF N = 0 THEN 99 40 FOR I = 1 TO N 45 LET S = S + V(I) 50 NEXT I 60 PRINT S/N 70 GO TO 10 99 END 30

Slide 31

Slide 31 text

Structured programming [makes] extensive use of subroutines […] 31

Slide 32

Slide 32 text

return 32

Slide 33

Slide 33 text

9074: eb 4a jmp 90c0 <_mysql_init_character_set+0x124> 9076: 8b 43 f8 mov -0x8(%rbx),%eax 9079: 83 f8 01 cmp $0x1,%eax 907c: 74 04 je 9082 <_mysql_init_character_set+0xe6> 907e: 85 c0 test %eax,%eax 9080: 75 06 jne 9088 <_mysql_init_character_set+0xec> 9082: 4c 8b 7b f0 mov -0x10(%rbx),%r15 9086: eb 38 jmp 90c0 <_mysql_init_character_set+0x124> 9088: 48 8b 4b f0 mov -0x10(%rbx),%rcx 908c: 48 8d 35 15 a3 03 00 lea 0x3a315(%rip),%rsi # 433a8 <_zcfree+0x1fd6> 9093: bf 51 04 00 00 mov $0x451,%edi 9098: 31 d2 xor %edx,%edx 909a: 31 c0 xor %eax,%eax 909c: e8 72 76 02 00 callq 30713 <_my_printf_error> 90a1: 48 8d 35 56 a3 03 00 lea 0x3a356(%rip),%rsi # 433fe <_zcfree+0x202c> 90a8: 4c 8d 3d 61 b9 03 00 lea 0x3b961(%rip),%r15 # 44a10 <_zcfree+0x363e> 90af: bf 51 04 00 00 mov $0x451,%edi 90b4: 31 d2 xor %edx,%edx 90b6: 31 c0 xor %eax,%eax 90b8: 4c 89 f9 mov %r15,%rcx 33

Slide 34

Slide 34 text

0000000000030713 <_my_printf_error>: 30713: 55 push %rbp 30714: 48 89 e5 mov %rsp,%rbp 30717: 41 57 push %r15 30719: 41 56 push %r14 3071b: 41 54 push %r12 3071d: 53 push %rbx 3071e: 48 81 ec d0 02 00 00 sub $0x2d0,%rsp … 30932: 4c 3b 75 e8 cmp -0x18(%rbp),%r14 30936: 75 0c jne 30944 <_my_printf_warning+0xda> 30938: 48 81 c4 d0 02 00 00 add $0x2d0,%rsp 3093f: 5b pop %rbx 30940: 41 5e pop %r14 30942: 5d pop %rbp 30943: c3 retq 34

Slide 35

Slide 35 text

35 return address my_printf_error main STACK call ret my_vsnprintf_ex return address call ret (4 byte elements)

Slide 36

Slide 36 text

Passing arguments 36

Slide 37

Slide 37 text

37 param 1 param 2 return address …

Slide 38

Slide 38 text

38 param param return address … param return address return address

Slide 39

Slide 39 text

Storing local variables 39

Slide 40

Slide 40 text

40 param param return address … local variable local variable

Slide 41

Slide 41 text

41

Slide 42

Slide 42 text

Collecting garbage 42

Slide 43

Slide 43 text

43 param 1 param 2 return address local variable 1 local variable 2 local variable 3 ? ? local variable local variable

Slide 44

Slide 44 text

44 ? ? local variable local variable

Slide 45

Slide 45 text

45 ? ? Mark phase local variable local variable

Slide 46

Slide 46 text

46 Sweep phase local variable local variable

Slide 47

Slide 47 text

Mark and sweep: the simplest GC technique 47

Slide 48

Slide 48 text

48

Slide 49

Slide 49 text

79% of time spent in the garbage collector‽ 49

Slide 50

Slide 50 text

‽ interrobang — U+203D 50 ( )

Slide 51

Slide 51 text

Stop this GC madness! 51

Slide 52

Slide 52 text

Avoiding garbage collection:
 three techniques 52

Slide 53

Slide 53 text

Avoiding garbage collection
 through explicit reuse 53

Slide 54

Slide 54 text

1_000_000.times do point = Point.new(random(width), random(height)) draw(point) end 54

Slide 55

Slide 55 text

point = Point.new 1_000_000.times do point.x = random(width) point.y = random(height) draw(point) end 55

Slide 56

Slide 56 text

- only works for a single instance - mutating state can lead to bugs 56

Slide 57

Slide 57 text

Avoiding garbage collection
 through memory pooling 57

Slide 58

Slide 58 text

pool = Pool(Entity).new(1000)
 entity = pool.acquire
 pool.release(entity) 58

Slide 59

Slide 59 text

+ can reuse multiple objects - memory management is more manual 59

Slide 60

Slide 60 text

Avoiding garbage collection
 through stack allocation 60

Slide 61

Slide 61 text

61 param param return address … local variable local variable

Slide 62

Slide 62 text

class Point getter :x, :y def initialize(@x, @y) end end 62

Slide 63

Slide 63 text

struct Point getter :x, :y def initialize(@x, @y) end end 63

Slide 64

Slide 64 text

1_000_000.times do point = Point.new(random(width), random(height)) draw(point) end 64

Slide 65

Slide 65 text

+ no explicit memory management - only usable for local variables 65

Slide 66

Slide 66 text

Don’t destroy the CPU cache! 66

Slide 67

Slide 67 text

67 HD RAM Cache ~ 1 000 000 ns ~ 1-10 ns ~ 500 ns

Slide 68

Slide 68 text

68

Slide 69

Slide 69 text

69

Slide 70

Slide 70 text

For maximum speed, keep similar data together in memory. 70

Slide 71

Slide 71 text

71 position velocity rotation health AI

Slide 72

Slide 72 text

72 position velocity 12 used bytes / 32 total bytes = 37% efficiency (for movement) rotation health AI

Slide 73

Slide 73 text

73

Slide 74

Slide 74 text

74

Slide 75

Slide 75 text

Store similar data in contiguous arrays. 75

Slide 76

Slide 76 text

positions = Pool(Position).new(1000) velocities = Pool(Velocity).new(1000) 76

Slide 77

Slide 77 text

Demo 77

Slide 78

Slide 78 text

Demo: Stack allocation 78

Slide 79

Slide 79 text

79 (shamelessly copied from Tobi)

Slide 80

Slide 80 text

80 r

Slide 81

Slide 81 text

#total (2 × r) 2 #inside π × r 2 81 ≈

Slide 82

Slide 82 text

#total 4 #inside π 82 ≈

Slide 83

Slide 83 text

4 × #inside #total 83 π ≈

Slide 84

Slide 84 text

(use the source, luke) 84

Slide 85

Slide 85 text

Demo: Cache grinding 85

Slide 86

Slide 86 text

(use the source, luke) 86

Slide 87

Slide 87 text

87

Slide 88

Slide 88 text

88 @ddfreyne denis@stoneship.org Denis Defreyne I’ll question your answers now.