Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Overview of Ruby memory management

Overview of Ruby memory management

Ruby is a wonderful language aiming to make us, developers, be happy and enjoyable. Working with Ruby for a long time, you must be curious, and wonder what Ruby does underlying to achieve its magic. In this talk, I'll focus on how Ruby manages its objects, and how Ruby's garbage collector works.

Quang-Minh Nguyen

April 18, 2020
Tweet

More Decks by Quang-Minh Nguyen

Other Decks in Programming

Transcript

  1. About me - Software Engineer @ Employment Hero - Married

    to Ruby, but have an affair with C, Go, Java, etc. - Newbie Open Source contributor
  2. Agenda - How does Ruby structure and manage memory? -

    How does Garbage Collector work in Ruby? - How has Garbage Collector been evolving through time? - Q & A
  3. If you don’t know Ruby “ A dynamic, open source

    programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write. ”
  4. If you don’t know Ruby - Ruby is a object

    - oriented, dynamic typed language
  5. How does Ruby construct an object? - Ruby is written

    in C, so the type system of Ruby is a reflection of C - All objects have a same underlying struct, regardless its classes - When we assign an object to a variable, the variable actually contains a *pointer* to a struct called *RVALUE* Variable a RVALUE typedef uintptr_t VALUE
  6. Structure of RVALUE - Each object in Ruby has a

    fixed size of *40 bytes*, which is the biggest component in the union. - When using, the VALUE pointer is casted to the desired type pointer
  7. Structure of RVALUE Variable a RVALUE typedef uintptr_t VALUE RFloat

    RBasic basic double float_value RArray RBasic basic long len long capa VALUE * ptr - All union components in RVALUE have a common field call RBasic. - It defines the class and type flags of a current object
  8. How does Ruby manage the object? - All objects in

    Ruby are managed in a centralized called Object Space Ruby VM Object Space Malloc Params GC Stats GC Control Eden Heap Tomb Heap Objects are stored here
  9. Ruby’s Heap structure - The objects are stored in small

    unit called Heap Page - The heap keeps track of a linked list of Heap page, and a linked list of free objects. Eden Heap Page 1 Header Free Slot 1 Object #1 Free Slot 2 Object #2 Object #3 Page 2 Header Object #4 Object #5 Free slot 3 Free slot 4 Free Slot 5 Page 3 Header Page 4 Header Free list
  10. What happens when we create an object? - The Heap

    is checked to ensure there is a place in free list - If there is no slot in the free list, or the heap is *stressful*, a GC is triggered - The unused objects are cleaned - The free list is rebuilt - If there is still no free slot in the free list, a allocate a new page - Recycle from the “Tomb Page” - Malloc an actual new page - Find the first object in the free list, convert it to the desired typed object
  11. - Each page is allocated with around *16 kb* of

    memory - Each page contains around *409* objects (40 bytes each) - Why “around”? - The malloc has an overhead of some bytes - Ruby wants the object pointer address to be a multiple of 40 - Ruby wants to align the allocated memory with 4kb OS page - Therefore, the actual size of an allocated page and the number of objects in a page is calculated, depends on the alignment and different between pages Ruby’s Page structure
  12. Ruby’s Page structure heap_page body * freelist * total_slots free_slots

    mark_bits next * header * page_body alignment bytes Slot 1 Slot 2 Slot 3 …. Malloc overhead bytes Exactly 16 kb in memory ...
  13. Object addresses and tagged pointers - Thanks to page alignment,

    all object addresses in Ruby are the multiples of 40 - For example: 11111111110010010110100100000010110111111110000 - If an address doesn’t satisfy, it is not a normal object. - Ruby uses the address to differentiate special objects: - False: 00000000000000000000000000000000 - True: 00000000000000000000000000000010 - Fixnum: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1 - … - Those special pointers are called tagged pointers - Integers are recognized by the last bit. It decodes by shift the address to right - Address: 11110111 => Integer b1111011 => Integer 123 - This optimization is to speed up the calculation and reduce object allocation
  14. What about the structural objects? - If every object is

    40 bytes, how about more complicated objects? - Like Arrays, although it stores the pointers only, it still needs spaces - Or String, if every character is an object, it would be 40 bytes (insane)
  15. - If the inside data is small enough that can

    fit into 40 bytes, it is embedded right into the object. - If the inside data exceeds a threshold, the object acquires the memory from the OS directly, then report to the C What about the structural objects?
  16. What about the structural objects? Eden Heap Page 1 Header

    Free Slot 1 Array #1 Free Slot 2 Hash #1 Object #3 Items Item 1 Item 2 Item 3 Item 4 ... Keys Key 1 Key 2 Key 3 ... Values Value 1 Value 2 Value 3 ...
  17. - When the inside data changes, it will *free* and

    *malloc* again, depends on the situation - Some objects like Array, String have “shared” nature. It means two arrays / strings can share a same underlying memory allocation. What about the structural objects?
  18. That’s the paradox - Even if the heap is full

    of free slots, when we create a big string, it stills acquires more memory.
  19. The basic of Ruby GC - The main GC algorithm

    in Ruby is Mark-and-Sweep - And Ruby follows Stop-the-world approach - It means when GC runs, all the processing is stopped - There are two separate steps: - Marking step: mark an object if it’s still in used (there is a link from other object) - Sweep step: clean up all unmarked objects. Return the memory to the OS is possible
  20. Marking phase - Although the objects are structured into heap

    pages, the marking process doesn’t base on the heap page. - GC traverses the object graph using DFS, with a stack, starting a special called Root object. - All initial objects must have a link with root objects. - At each node, it follows link to another object, depends on the node type
  21. Marking phase: Array Array 1 ... Marking Stack Array 1

    Object at #1 Object at #2 Object at #3 Object at #4
  22. Marking phase: Array ... Marking Stack Array 1 Object at

    #1 Object at #2 Object at #3 Object at #4 Object at #1 Object at #2 Object at #3 Object at #4
  23. Marking phase: Instance Instance a ... Marking Stack Instance a

    Class A Parent class B Instance variable 1 Instance variable 2
  24. Marking phase: Instance ... Marking Stack Instance a Class A

    Parent class B Instance variable 1 Instance variable 2 Class A Parent class B Instance variable 1 Instance variable 2
  25. Marking phase - Before version 2.0, the mark flag used

    to be stored in the object struct, but it’s not COW friendly - After then, the marking result is stored in mark_bits of heap page - A full mark process stops when there is nothing more to traverse heap_page body * freelist * total_slots free_slots mark_bits = 0001010001000 next * ...
  26. Sweeping phase - GC will loop through all the pages.

    In each page, it loops through all bits in the mark bitmap heap_page body * freelist * total_slots free_slots mark_bits = 0001010001000 next * ... heap_page body * freelist * total_slots free_slots mark_bits = 0001010001000 next * ... heap_page body * freelist * total_slots free_slots mark_bits = 0001010001000 next * ...
  27. Sweeping phase - If an object is not marked, it

    is cleaned up - If it’s a structural structure, *free* is triggered - Then, the object in page is overridden, flag as free object
  28. Sweeping phase - If an object is not marked, it

    is cleaned up - If it’s a structural structure, *free* is triggered - Then, the object in page is overridden, flag as free object Eden Heap Page 1 Header Free Slot Object #1 Free Slot Unmarked Object #3 Page 2 Header Object #4 Object #5 Free Slot Unmarked Unmarked Page 3 Header Unmarked Unmarked Unmarked Unmarked Unmarked Page 4 Header Free list Items Item 1 Item 2 Item 3 Item 4 ...
  29. Sweeping phase - If an object is not marked, it

    is cleaned up - If it’s a structural structure, *free* is triggered - Then, the object in page is overridden, flag as free object Eden Heap Page 1 Header Free Slot Object #1 Free Slot Free Slot Object #3 Page 2 Header Object #4 Object #5 Free Slot Free Slot Free Slot Page 3 Header Free Slot Free Slot Free Slot Free Slot Free Slot Page 4 Header Free list
  30. Sweeping phase - If a page is 100% free, it

    is unlinked, and move to Tomb Heap - The pages in Tomb Heap will be recycled, or released via *free* Eden Heap Page 1 Header Free Slot Object #1 Free Slot Free Slot Object #3 Page 2 Header Object #4 Object #5 Free Slot Free Slot Free Slot Page 3 Header Free Slot Free Slot Free Slot Free Slot Free Slot Page 4 Header Free list
  31. Sweeping phase - If a page is 100% free, it

    is unlinked, and move to Tomb Heap - The pages in Tomb Heap will be recycled, or released via *free* Eden Heap Page 1 Header Free Slot Object #1 Free Slot Free Slot Object #3 Page 2 Header Object #4 Object #5 Free Slot Free Slot Free Slot Page 4 Header Free list Page 3 Header Free Slot Free Slot Free Slot Free Slot Free Slot Tomb Heap
  32. Sweeping phase - Then, the free list is re-built again

    Eden Heap Page 1 Header Free Slot Object #1 Free Slot Free Slot Object #3 Page 2 Header Object #4 Object #5 Free Slot Free Slot Free Slot Page 4 Header Free list Page 3 Header Free Slot Free Slot Free Slot Free Slot Free Slot Tomb Heap
  33. Stop-the-word GC is slow - A GC can run before

    any memory allocation. So, when the memory is stressful, there are usually hiccup in performance - In old versions of Ruby, the lag could be 100ms to 500ms to seconds, depending on the heap size. - That’s awful, especially when I just want to create a single string. Mark Sweep Mark Sweep
  34. First optimization: Lazy Sweep - I just want to allocate

    a single object, why do I need to wait for the GC to run the sweeping on all objects? - Solution: Mark once, then, sweeping until enough space for the allocation - Available in version 1.9.0 - The total marking time is the same - The total sweeping time is the same - But the stopping time reduces, splitted into smaller ones. Mark Sweep Mark Sweep Sweep
  35. Second optimization: Generational GC - Famous GC philosophy, applied in

    a lot of languages - Based on a hypothesis: “Most objects die young”
  36. Marking with Generational GC - GC classifies the objects into

    two generation: old and young. - Objects are in young generation by default. - Objects are promoted to old generation when it survives through 1 - 3 GC process (depending on the Ruby version) - There are two types of GC now: - Minor GC: The marking process will stop traversing if an object is marked - Major GC: The same Mark-and-sweep mechanism
  37. Marking problem with Generational GC Root Old Old Old Old

    Old Old Minor GC 3 New New Forgotten
  38. Marking problem with Generational GC - If an old object

    has a connection to the new one, it is forgotten, and sweep wrongly - Solution: add an old object into a list, called Remember set, when there is a connection from that old object to a new object.
  39. Marking problem with Generational GC Root Old Old Old Old

    Old Old Minor GC 3 New New Remember Set
  40. Write-barrier problems - The link between an old object to

    new object is called “write-barrier” - It’s actually an edge in the object graph - The problem is that, before this optimization, the write-barrier didn’t exist yet - So, the Ruby core team has to change the object creating API to insert the write- barrier
  41. Ruby’s C-friendly problem - Inserting write-barrier in Ruby internally is

    easy, although time-consuming - But Ruby supports and depends a lot on C-extension, and C-extensions can allocate objects freely, and store the pointer anywhere. - Therefore, Ruby cannot insert the write-barrier for them - If Ruby forces to use API, and remove the old one, it would break most of gems that use C-extension - Remember Python 2 and Python 3 story?
  42. Protected and Unprotected write-barrier - The solution is to split

    into two types of write-barrier - Protected write-barriers: protected by Ruby. Ensure that this connection is created Ruby core. - Unprotected write-barriers: the ones which are not protected - The objects with unprotected write-barriers are always added to the RememberSet, and always be marked.
  43. Second optimization: Restricted Generational GC - It’s not a truly

    Generational GC - It has some disadvantages, like if the ruby process uses a lot of C-extensions and allocates a lot of objects, the minor GC is equal to major GC - But still, good enough - Smaller marking time (major still the same) - Totally sweeping time still the same Minor Sweep Major Sweep Sweep Minor Sweep
  44. Third optimization: Restricted Incremental GC - Idea: instead of doing

    a whole big marking process, GC will split that into smaller one. - Reduce stop period of major GC, less performance hiccup - The total time for major GC still the same - The time for minor GC still the same Sweep Sweep Sweep Mark Mark Mark
  45. Other minor optimizations - Symbol GC - Grow the number

    of free slots by a factor - Reduce the number of GC
  46. The future of GC in Ruby - Heap compaction -

    An object can be referenced from a C-extension - It’s impossible to move an object around in a same page or between pages - It creates fragmentation, and a page could not be cleaned up. - So, usually, Ruby barely frees any page to return the memory to the OS - Parallel GC - Regardless how small the marking / sweeping process is optimized, it’s still stop-the-world - Moving the whole GC into a dedicated thread, or at least the marking process will be a huge game changer
  47. References - https://github.com/ruby/ruby/blob/trunk/gc.c - Talk of ko1 about the RGenGC

    - Talk of ko1 about the RInGC - Talk of nari3 about the Symbol GC - The garbage collection handbook
  48. Fin