Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Compacting GC for MRI

Compacting GC for MRI

Implementation of manual heap compaction for MRI

Aaron Patterson

November 20, 2019
Tweet

More Decks by Aaron Patterson

Other Decks in Programming

Transcript

  1. Ruby Cores [aaron@TC ~]$ ls /cores core.22948 core.32321 core.44049 core.73547

    core.76951 core.31231 core.36549 core.73064 core.73848 core.77093 core.31784 core.36550 core.73479 core.76655 core.82911 core.31802 core.36605 core.73493 core.76710 core.86743
  2. Pipeline Operator 1..100 |> map { |x| x.to_s } |>

    sort |> reverse |> take 5 |> display
  3. CoW Friendliness Computer Memory Allocated Memory Allocated Memory Allocated Memory

    Free Memory Free Memory Allocated Memory Allocated Memory Allocated Memory Free Memory Free Memory Parent Process Child Process
  4. CoW Friendliness Computer Memory Allocated Memory Allocated Memory Allocated Memory

    Free Memory Free Memory Allocated Memory Allocated Memory Allocated Memory Free Memory Free Memory Parent Process Child Process
  5. CoW Friendliness Computer Memory Allocated Memory Allocated Memory Allocated Memory

    Free Memory Free Memory Allocated Memory Allocated Memory Allocated Memory Free Memory Free Memory Parent Process Child Process
  6. CoW Friendliness Computer Memory Allocated Memory Allocated Memory Allocated Memory

    Free Memory Free Memory Allocated Memory Allocated Memory Allocated Memory Free Memory Free Memory Parent Process Child Process
  7. Ruby Heaps System Memory Malloc Heap Ruby’s Object Heap String.new

    "The Quick Brown Fox Jumps Over The Lazy Dog"
  8. Ruby’s Heap Layout 40 bytes Each chunk is a "slot"

    Em pty Filled Empty Filled Moved
  9. Ruby’s Heap Layout Each slot has a unique address 1

    2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
  10. Moving Objects A B C D E F Free Scan

    1 2 3 4 5 6 7 8 9 10 Empty Filled Moved 4 5
  11. Updating References A B C D E F 1 2

    3 4 5 6 7 8 9 10 Empty Filled Moved Before Moving Objects
  12. Updating References A B C D 5 4 1 2

    3 4 5 6 7 8 9 10 Empty Filled F Moved E After Moving Objects
  13. Updating References A B C D 5 4 1 2

    3 4 5 6 7 8 9 10 Empty Filled F Moved E After Moving Objects
  14. Object Movement def compact heap = [ ... ] #

    many slots left = 0 right = heap.length - 1 while left < right left_slot = heap[left] right_slot = heap[right] if is_empty?(left_slot) && !is_empty?(right_slot) && can_move?(right_slot) swap(left, right) heap[right] = T_MOVED.new(left) # leave forwarding address end while !is_empty?(heap[left]) left += 1 end while is_empty?(heap[right]) || !can_move?(heap[right]) right -= 1 end end end Pointers Met? Copy / Forward Advance "free" Retreat "scan"
  15. Reference Updating def update_references heap.each do |slot| next if is_empty?(slot)

    || is_moved?(slot) slot.references.each_with_index do |child, i| if is_moved?(child) slot.set_reference(i, child.new_location) end end end end How are references stored?
  16. Finding References • How do Hashes hold references? • How

    do Arrays hold references? • How do Objects hold references? • … • … • …
  17. Reference Updating static void gc_ref_update_array(rb_objspace_t * objspace, VALUE v) {

    long i, len; if (FL_TEST(v, ELTS_SHARED)) return; len = RARRAY_LEN(v); if (len > 0) { VALUE *ptr = (VALUE *)RARRAY_CONST_PTR_TRANSIENT(v); for(i = 0; i < len; i++) { UPDATE_IF_MOVED(objspace, ptr[i]); } } } static void gc_ref_update_object(rb_objspace_t * objspace, VALUE v) { uint32_t i, len = ROBJECT_NUMIV(v); VALUE *ptr = ROBJECT_IVPTR(v); for (i = 0; i < len; i++) { UPDATE_IF_MOVED(objspace, ptr[i]); } } static int hash_replace_ref(st_data_t *key, st_data_t *value, st_data_t argp, int existing)
  18. Yajl typedef struct { VALUE builderStack; VALUE parse_complete_callback; int nestedArrayLevel;

    int nestedHashLevel; int objectsFound; int symbolizeKeys; yajl_handle parser; } yajl_parser_wrapper; C Code (yajl_ext.h) malloc(yajl_parser_wrapper) Ruby Object T_DATA Ruby Object Ruby Object builderStack parse_complete_callback Ruby Heap Malloc Heap
  19. Yajl typedef struct { VALUE builderStack; VALUE parse_complete_callback; int nestedArrayLevel;

    int nestedHashLevel; int objectsFound; int symbolizeKeys; yajl_handle parser; } yajl_parser_wrapper; C Code (yajl_ext.h) malloc(yajl_parser_wrapper) Ruby Object T_DATA Ruby Object Ruby Object builderStack parse_complete_callback GC: "idk, " MOVED! Ruby Heap Malloc Heap
  20. Yajl Mark Function void yajl_parser_wrapper_mark(void * wrapper) { yajl_parser_wrapper *

    w = wrapper; if (w) { rb_gc_mark(w->builderStack); rb_gc_mark(w->parse_complete_callback); } } malloc(yajl_parser_wrapper) Ruby Object T_DATA Ruby Object Ruby Object rb_gc_mark(builderStack) rb_gc_mark(parse_complete_callback) Ruby Heap Malloc Heap
  21. Pinning Bits 1 2 3 4 5 6 7 8

    9 10 Yajl [ ] ? "foo" "bar" ? Address Content Pinned x = [ "foo", "bar" ] y = Yajl.new Ruby Code rb_gc_m ark rb_gc_m ark gc_m ark_no_pin gc_m ark_no_pin
  22. Pinning Bits 1 2 3 4 5 6 7 8

    9 10 Yajl [ ] ? ? Address Content Pinned x = [ "foo", "bar" ] y = Yajl.new Ruby Code Free Scan "bar" "foo" 4 5 Move Step
  23. Pinning Bits 1 2 3 4 5 6 7 8

    9 10 Yajl [ ] ? ? Address Content Pinned x = [ "foo", "bar" ] y = Yajl.new Ruby Code 4 5 Reference Update Step "bar" "foo" Update
  24. Compaction Callback static const rb_data_type_t yajl_parser_type = { "Yajl/parser", {yajl_parser_wrapper_mark,

    yajl_parser_wrapper_free, NULL,}, 0, 0, RUBY_TYPED_FREE_IMMEDIATELY, }; Mark No Compaction Callback static const rb_data_type_t yajl_parser_type = { "Yajl/parser", {yajl_parser_wrapper_mark, yajl_parser_wrapper_free, NULL, yajl_parser_compact}, 0, 0, RUBY_TYPED_FREE_IMMEDIATELY, }; Compact With Compaction Callback Sweep
  25. "No Pin" Marking void yajl_parser_wrapper_mark(void * wrapper) { yajl_parser_wrapper *

    w = wrapper; if (w) { rb_gc_mark(w->builderStack); rb_gc_mark(w->parse_complete_callback); } } No Compaction Support void yajl_parser_wrapper_mark(void * wrapper) { yajl_parser_wrapper * w = wrapper; if (w) { rb_gc_mark_movable(w->builderStack); rb_gc_mark_movable(w->parse_complete_callback); } } With Compaction Support
  26. Compaction Callback void yajl_parser_compact(void *wrapper) { yajl_parser_wrapper * w =

    wrapper; if (w) { w->builderStack = rb_gc_new_location(w->builderStack); w->parse_complete_callback = rb_gc_location(w->parse_complete_callback); } } New Location
  27. Problem Object Graph Object Implemented in Ruby Object Implemented in

    C Some Object Automatically Marked!! (gc_mark_no_pin) Not Marked
  28. Compaction 1 2 3 4 5 6 7 8 9

    10 Ruby Obj C Obj ? 4 5 3
  29. RubyVM Instruction Sequence ISeq Object (in C) def foo "bar"

    end Code Mark Array (Ruby) "bar" Marked Marked NOT Marked
  30. "Direct Marking" in Ruby 2.6 ISeq Object (in C) def

    foo "bar" end Code "bar" Marked https://bugs.ruby-lang.org/issues/14370
  31. Ruby AST AST Object (in C) def foo "bar" end

    Code Mark Array (Ruby) "bar" Marked Marked NOT Marked
  32. Direct AST Marking AST Object (in C) def foo "bar"

    end Code "bar" Marked https://github.com/ruby/ruby/pull/2414
  33. Ruby IR IR Object (in C) def foo "bar" end

    Code Mark Array (Ruby) "bar" Marked Marked NOT Marked
  34. JSON Object Implemented in Ruby Object Implemented in C Some

    Object Automatically Marked!! (gc_mark) Not Marked
  35. Problem Code static VALUE CNaN, CInfinity, CMinusInfinity; void Init_parser(void) {

    #undef rb_intern CNaN = rb_const_get(mJSON, rb_intern("NaN")); CInfinity = rb_const_get(mJSON, rb_intern("Infinity")); CMinusInfinity = rb_const_get(mJSON, rb_intern("MinusInfinity")); }
  36. Potential Crasher require 'json' JSON.send :remove_const, :NaN GC.start json =

    '{ "foo": NaN }' JSON.load(json, nil, :allow_nan => true)['foo'].nan? Cut the reference
  37. Fix diff --git a/ext/json/parser/parser.c b/ext/json/parser/parser.c index 0bd328ca42..6f0d31c2eb 100644 --- a/ext/json/parser/parser.c

    +++ b/ext/json/parser/parser.c @@ -2099,8 +2099,13 @@ void Init_parser(void) rb_define_method(cParser, "source", cParser_source, 0); CNaN = rb_const_get(mJSON, rb_intern("NaN")); + rb_gc_register_mark_object(CNaN); + CInfinity = rb_const_get(mJSON, rb_intern("Infinity")); + rb_gc_register_mark_object(CInfinity); + CMinusInfinity = rb_const_get(mJSON, rb_intern("MinusInfinity")); + rb_gc_register_mark_object(CMinusInfinity); i_json_creatable_p = rb_intern("json_creatable?"); i_json_create = rb_intern("json_create");
  38. MsgPack Object Implemented in Ruby Object Implemented in C Some

    Object Automatically Marked!! (gc_mark) Not Marked
  39. object_id is based on location 1 2 3 4 5

    6 7 8 9 10 Ruby Obj Ruby Obj Ruby Obj 5 object#object_id => 1 object#object_id => 2 object#object_id => 9 object#object_id => ?
  40. "Seen" Object IDs $seen_object_id = {} class Object def object_id

    $seen_object_id[memory_location] ||= memory_location end end
  41. Object ID After Move x = Object.new x.object_id GC.compact x.object_id

    1 2 3 4 X Heap Object ID Table Memory Location Object ID 4 4 Updated Object ID Table Memory Location Object ID 1 4
  42. Object ID Collisions x = Object.new x.object_id GC.compact x.object_id y

    = Object.new y.object_id 1 2 3 4 X Heap Object ID Table Memory Location Object ID 4 4 Updated Object ID Table Memory Location Object ID 1 4 y.object_id => ??? x.object_id => 4 Y
  43. Collision Resolution $seen_object_id = {} $location_to_object_id = {} class Object

    def object_id id = memory_location while $seen_object_id[id] id += 1 end $seen_object_id[id] = id $location_to_object_id[memory_location] = id end end
  44. Object ID Collisions x = Object.new x.object_id GC.compact x.object_id y

    = Object.new y.object_id 1 2 3 4 X Heap Object ID Table Memory Location Object ID 4 4 Updated Object ID Table Memory Location Object ID 1 4 y.object_id => 5 x.object_id => 4 Y Updated Object ID Table- Memory Location Object ID 1 4 4 5
  45. GC Cleanup $seen_object_id = {} $location_to_object_id = {} def free_obj(obj)

    if $location_to_object_id[obj.memory_location] id = $location_to_object_id.delete(obj.memory_location) $seen_object_id.delete(id) end end
  46. Monotonic IDs irb(main):001:0> Object.new.object_id => 480 irb(main):002:0> Object.new.object_id => 500

    irb(main):003:0> Object.new.object_id => 520 irb(main):004:0> Object.new => #<Object:0x00007fdab60dcf30> irb(main):005:0> Object.new.object_id => 540
  47. Inspect vs Object ID $ ruby -v -e'x = Object.new;

    p x; p x.object_id' ruby 2.6.4p104 (2019-08-28 revision 67798) [x86_64-darwin18] #<Object:0x00007f9f7a13c948> 70161462322340 $ ruby -v -e'x = Object.new; p x; p x.object_id' ruby 2.7.0dev (2019-11-15T09:07:34Z master 11ae47c266) [x86_64-darwin19] #<Object:0x00007fd708886878> 40 Related! No Relationship! Ruby 2.6 Ruby 2.7
  48. Sliding Compaction 1 2 3 4 5 6 7 8

    9 10 Yajl [ ] ? "foo" "bar" ? Address Content
  49. Problem Code /* static value pointing at Foo constant */

    static VALUE foo = rb_const_get("Foo");
  50. Compaction Bug 1 2 3 4 5 6 7 8

    9 10 Foo 4 5 3 static VALUE foo = rb_const_get("Foo");
  51. Compaction Bug 1 2 3 4 5 6 7 8

    9 10 Foo 4 5 3 static VALUE foo = rb_const_get("Foo"); Bar
  52. Heap Doubling 1 2 3 4 5 6 7 8

    9 10 4 5 11 12 13 14 15 16 17 18 19 20 18 19 20
  53. Compaction Bug 1 2 3 4 5 6 7 8

    9 10 Foo 4 5 3 static VALUE foo = rb_const_get("Foo");
  54. Compaction Bug 1 2 3 4 5 6 7 8

    9 10 Foo 4 5 3 static VALUE foo = rb_const_get("Foo"); Bar
  55. Compaction Bug 1 2 3 4 5 6 7 8

    9 10 Foo 4 5 3 static VALUE foo = rb_const_get("Foo"); Bar
  56. Compaction Bug 1 2 3 4 5 6 7 8

    9 10 Foo 4 5 3 static VALUE foo = rb_const_get("Foo"); Bar