Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Taming memory: performance-tuning a (Crystal) application [RUG::B edition]

Taming memory: performance-tuning a (Crystal) application [RUG::B edition]

When developing a game, you need to pay attention to performance. After all, a game needs to run fast, and have a predictable frame rate, and stuttering will throw people off.

I’ve had performance issues even in Crystal, a fast, compiled, statically-typed language with a syntax inspired by Ruby. As it turns out, the way a program handles memory can have a huge impact on performance. Luckily, Crystal gives a great deal of control over how this can be done. It’s also possible to use familiar tools with Crystal to debug issues and identify bottlenecks.

In this talk, I’ll share what I’ve learnt about memory and performance tuning, and give an introduction to several powerful tools for identifying performance issues.

Denis Defreyne

December 03, 2015
Tweet

More Decks by Denis Defreyne

Other Decks in Technology

Transcript

  1. Taming memory:
    Performance-tuning
    a (Crystal) application
    Denis Defreyne / RUG::B / December 3, 2015
    1

    View Slide

  2. 2

    View Slide

  3. The contents of this talk
    aren’t particularly revolutionary.
    3
    DISCLAIMER

    View Slide

  4. 4

    View Slide

  5. 5

    View Slide

  6. Crystal
    6

    View Slide

  7. I don’t know much
    about game development.
    7
    DISCLAIMER

    View Slide

  8. 8

    View Slide

  9. 9

    View Slide

  10. 10

    View Slide

  11. 11

    View Slide

  12. 12

    View Slide

  13. 13

    View Slide

  14. memory, the game
    memory, the computer thingie
    14

    View Slide

  15. Allocating objects
    15

    View Slide

  16. donkey = Donkey.new(3, "grey")
    16

    View Slide

  17. donkey = Donkey.allocate
    donkey.initialize(3, "grey")
    17

    View Slide

  18. donkey = malloc(6).cast(Donkey)
    donkey.initialize(3, "grey")
    18

    View Slide

  19. What is memory?
    19

    View Slide

  20. 20

    View Slide

  21. 21
    0 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
    20 21 22 23 …

    View Slide

  22. 22
    0 1 2 3 4 5 6 7 E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
    20 21 22 23 …

    View Slide

  23. 23
    0 1 2 3 4 5 6 7 3 G R E Y Ø E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
    20 21 22 23 …

    View Slide

  24. Freeing memory
    24

    View Slide

  25. donkey = malloc(6).cast(Donkey)
    donkey.initialize(3, "grey")
    free(donkey)
    25

    View Slide

  26. Garbage collection
    26

    View Slide

  27. Calling functions
    27

    View Slide

  28. 28

    View Slide

  29. 10 LET S = 0
    15 MAT INPUT V
    20 LET N = NUM
    30 IF N = 0 THEN 99
    40 FOR I = 1 TO N
    45 LET S = S + V(I)
    50 NEXT I
    60 PRINT S/N
    70 GO TO 10
    99 END
    29

    View Slide

  30. 10 LET S = 0
    15 MAT INPUT V
    20 LET N = NUM
    30 IF N = 0 THEN 99
    40 FOR I = 1 TO N
    45 LET S = S + V(I)
    50 NEXT I
    60 PRINT S/N
    70 GO TO 10
    99 END
    30

    View Slide

  31. Structured programming [makes]
    extensive use of subroutines […]
    31

    View Slide

  32. return
    32

    View Slide

  33. 9074: eb 4a jmp 90c0 <_mysql_init_character_set+0x124>
    9076: 8b 43 f8 mov -0x8(%rbx),%eax
    9079: 83 f8 01 cmp $0x1,%eax
    907c: 74 04 je 9082 <_mysql_init_character_set+0xe6>
    907e: 85 c0 test %eax,%eax
    9080: 75 06 jne 9088 <_mysql_init_character_set+0xec>
    9082: 4c 8b 7b f0 mov -0x10(%rbx),%r15
    9086: eb 38 jmp 90c0 <_mysql_init_character_set+0x124>
    9088: 48 8b 4b f0 mov -0x10(%rbx),%rcx
    908c: 48 8d 35 15 a3 03 00 lea 0x3a315(%rip),%rsi # 433a8 <_zcfree+0x1fd6>
    9093: bf 51 04 00 00 mov $0x451,%edi
    9098: 31 d2 xor %edx,%edx
    909a: 31 c0 xor %eax,%eax
    909c: e8 72 76 02 00 callq 30713 <_my_printf_error>
    90a1: 48 8d 35 56 a3 03 00 lea 0x3a356(%rip),%rsi # 433fe <_zcfree+0x202c>
    90a8: 4c 8d 3d 61 b9 03 00 lea 0x3b961(%rip),%r15 # 44a10 <_zcfree+0x363e>
    90af: bf 51 04 00 00 mov $0x451,%edi
    90b4: 31 d2 xor %edx,%edx
    90b6: 31 c0 xor %eax,%eax
    90b8: 4c 89 f9 mov %r15,%rcx
    33

    View Slide

  34. 0000000000030713 <_my_printf_error>:
    30713: 55 push %rbp
    30714: 48 89 e5 mov %rsp,%rbp
    30717: 41 57 push %r15
    30719: 41 56 push %r14
    3071b: 41 54 push %r12
    3071d: 53 push %rbx
    3071e: 48 81 ec d0 02 00 00 sub $0x2d0,%rsp

    30932: 4c 3b 75 e8 cmp -0x18(%rbp),%r14
    30936: 75 0c jne 30944 <_my_printf_warning+0xda>
    30938: 48 81 c4 d0 02 00 00 add $0x2d0,%rsp
    3093f: 5b pop %rbx
    30940: 41 5e pop %r14
    30942: 5d pop %rbp
    30943: c3 retq
    34

    View Slide

  35. 35
    return address my_printf_error
    main
    STACK
    call
    ret
    my_vsnprintf_ex
    return address
    call
    ret
    (4 byte elements)

    View Slide

  36. Passing arguments
    36

    View Slide

  37. 37
    param 1
    param 2
    return address

    View Slide

  38. 38
    param
    param
    return address

    param
    return address
    return address

    View Slide

  39. Storing local variables
    39

    View Slide

  40. 40
    param
    param
    return address

    local variable
    local variable

    View Slide

  41. 41

    View Slide

  42. Collecting garbage
    42

    View Slide

  43. 43
    param 1
    param 2
    return address
    local variable 1
    local variable 2
    local variable 3
    ?
    ?
    local variable
    local variable

    View Slide

  44. 44
    ?
    ?
    local variable
    local variable

    View Slide

  45. 45
    ?
    ?
    Mark phase
    local variable
    local variable

    View Slide

  46. 46
    Sweep phase
    local variable
    local variable

    View Slide

  47. Mark and sweep:
    the simplest GC technique
    47

    View Slide

  48. 48

    View Slide

  49. 79% of time spent in
    the garbage collector‽
    49

    View Slide


  50. interrobang — U+203D
    50
    ( )

    View Slide

  51. Stop this GC madness!
    51

    View Slide

  52. Avoiding garbage collection:

    three techniques
    52

    View Slide

  53. Avoiding garbage collection

    through explicit reuse
    53

    View Slide

  54. 1_000_000.times do
    point = Point.new(random(width), random(height))
    draw(point)
    end
    54

    View Slide

  55. point = Point.new
    1_000_000.times do
    point.x = random(width)
    point.y = random(height)
    draw(point)
    end
    55

    View Slide

  56. - only works for a single instance
    - mutating state can lead to bugs
    56

    View Slide

  57. Avoiding garbage collection

    through memory pooling
    57

    View Slide

  58. pool = Pool(Entity).new(1000)

    entity = pool.acquire

    pool.release(entity)
    58

    View Slide

  59. + can reuse multiple objects
    - memory management is more manual
    59

    View Slide

  60. Avoiding garbage collection

    through stack allocation
    60

    View Slide

  61. 61
    param
    param
    return address

    local variable
    local variable

    View Slide

  62. class Point
    getter :x, :y
    def initialize(@x, @y)
    end
    end
    62

    View Slide

  63. struct Point
    getter :x, :y
    def initialize(@x, @y)
    end
    end
    63

    View Slide

  64. 1_000_000.times do
    point = Point.new(random(width), random(height))
    draw(point)
    end
    64

    View Slide

  65. + no explicit memory management
    - only usable for local variables
    65

    View Slide

  66. Don’t destroy the CPU cache!
    66

    View Slide

  67. 67
    HD
    RAM
    Cache
    ~ 1 000 000 ns
    ~ 1-10 ns
    ~ 500 ns

    View Slide

  68. 68

    View Slide

  69. 69

    View Slide

  70. For maximum speed,
    keep similar data together
    in memory.
    70

    View Slide

  71. 71
    position velocity rotation health AI

    View Slide

  72. 72
    position velocity
    12 used bytes / 32 total bytes = 37% efficiency
    (for movement)
    rotation health AI

    View Slide

  73. 73

    View Slide

  74. 74

    View Slide

  75. Store similar data
    in contiguous arrays.
    75

    View Slide

  76. positions = Pool(Position).new(1000)
    velocities = Pool(Velocity).new(1000)
    76

    View Slide

  77. Demo
    77

    View Slide

  78. Demo:
    Stack allocation
    78

    View Slide

  79. 79
    (shamelessly copied from Tobi)

    View Slide

  80. 80
    r

    View Slide

  81. #total (2 × r) 2
    #inside π × r 2
    81

    View Slide

  82. #total 4
    #inside π
    82

    View Slide

  83. 4 × #inside
    #total
    83
    π ≈

    View Slide

  84. (use the source, luke)
    84

    View Slide

  85. Demo:
    Cache grinding
    85

    View Slide

  86. (use the source, luke)
    86

    View Slide

  87. 87

    View Slide

  88. 88
    @ddfreyne
    [email protected]
    Denis Defreyne
    I’ll question your answers now.

    View Slide