Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RubyKaigi2025_martin_okoshi

 RubyKaigi2025_martin_okoshi

Avatar for joetake

joetake

June 10, 2025
Tweet

Other Decks in Programming

Transcript

  1. Write You a Barrier for Great Good Automatic Checking and

    Insertion of Write Barriers in Ruby C Extensions Joichiro Okoshi (大越 丈逸朗) and Martin J. Dürst Aoyama Gakuin University (青山学院大学)
  2. Outline •Introduction •Memory management in MRI •Extending Ruby with C

    •Introduction of our tool: wb_check •Implementation of wb_check 2
  3. Title – The Long Version • Short Title (RubyKaigi submission

    limit): Write You a Barrier - Automatic Insertion of Write Barriers • Original Title: Write You a Barrier for Great Good - Automatic Checking and Insertion of Write Barriers in Ruby C Extensions 3
  4. Speaker Introduction Joichiro Okoshi (大越 丈逸朗) 1st year of Master's

    program, Intelligence and Information Course, Graduate School of Science and Engineering, Aoyama Gakuin University (青山学院大学大学院理工学研究科知能情報コース M1) Favorites : Fishing, traveling, taking a photos, ... The tool was developed for the senior thesis Martin J. Dürst Department of Integrated Information Technology, College of Science and Engineering, Aoyama Gakuin University (青山学院大学理工学部情報テクノロジー学科教授) 4
  5. Our Lab's Past Contributions to Ruby •Character encoding conversion (String#encode,

    2007) •Unicode normalization (String#normalize, 2014) •Unicode upcase/downcase (String#upcase,…, 2016) •Update Unicode version (up to Unicode 15.0.0) 5
  6. Main Keywords •Garbage collection (GC): Not: Collecting garbage But: Recycling

    memory •Write Barrier (WB): Not: Forbid writing But: When writing, do something additional 6
  7. New tool wb_check • Checks/inserts write barriers for C extension

    code • Uses static analysis based on type information • Write barrier is a feature of garbage collection (GC) of MRI • wb_check is still experimental • Code is at https://github.com/joetake/wb_check • Please provide comments 7
  8. Motivation Using T_DATA object in C extensions is useful Using

    T_DATA object without write barriers is slower Inserting write barriers is complex and error-prone 8
  9. Outline •Introduction •Memory management in MRI •Extending Ruby with C

    •Introduction of our tool: wb_check •Implementation of wb_check 9
  10. Dynamic Memory Allocation 10 Heap area for Ruby Objects ……

    Heap Page (64KB) v Slots v vv v v v v vv vv
  11. Generational GC 12 Object A Old Roots Object B Young

    Object C Young Object D Old Object E Young Object A Old Roots Object B Young Object C Young Object D Old Object E Young Major (Full) GC Minor GC
  12. Concept of Write Barrier 13 Object B Old Object C

    Young remembered set Object A Young write barrier write barrier
  13. Details of Write Barrier 14 VALUE rb_ary_push(VALUE ary, VALUE item)

    { … VALUE target_ary = ary_ensure_room_for_push(ary, 1); RARRAY_PTR_USE(ary, ptr, { RB_OBJ_WRITE(target_ary, &ptr[idx], item); … } } Implementation of Array’s '<<' method in ruby/array.c # method that push object ‘item’ to array object ‘ary’ # get a new space at the end of the array # write new reference # and call write barrier
  14. WB-unprotected ・MRI allows WB-unprotected objects ・MRI knows which objects are

    unprotected 15 struct heap_page { ... bits_t wb_unprotected_bits[HEAP_PAGE_BITMAP_LIMIT]; ... };
  15. How GC handling WB-unprotected 16 WB-unprotected objects are never promoted

    to old generation Stay in Young Generation Eternaly Young Object WB-unprotected Object OK
  16. Incremental Marking Re-traverse with Stop-the-World 17 Marking Execution of program

    Marking Marking Perform marking phase of GC by turns Re-Traverse from all unprotected objects
  17. Danger in WB-protected WB-protected object expected complete insertion of write

    barriers 18 Critical Bugs If not? WB-protected object must have complete insertion of write barriers
  18. Outline •Introduction •Memory management in MRI •Extending Ruby with C

    •Introduction of our tool: wb_check •Implementation of wb_check 19
  19. C Extension Implemented in C, invoked from Ruby Interact with

    MRI via the C API 20 #include "ruby.h" #include "extconf.h" VALUE rb_hello() { VALUE hello = rb_str_new2("hello, from C"); return hello; } void Init_hello() { VALUE module = rb_define_module("Greeting"); rb_define_method(module, "hello", rb_hello, 0); } require './hello' include Greeting puts hello() # hello, from C C Ruby
  20. VALUE type Represent Ruby values as VALUE type in C

    21 VALUE number = INT2NUM(3) # Fixnum 3 VALUE string = rb_str_new2("RubyKaigi") # String 'RubyKaigi' VALUE data type • Reference to RVALUE data (Object’s data) • Immediate value (nil, true, 'small' numbers,...) TYPE(obj) T_NIL, T_ARRAY, T_OBJECT, … VALUE data type knows what type the object is
  21. Write Barriers in C Extensions In general, no need to

    care about write barriers Write barriers are required only in specific cases → T_DATA (Typed Data) 22 C Extension Code Ruby Object via C API Use write barrier if needed
  22. T_DATA Ruby object that wraps C data 23 VALUE obj

    = TypedData_Make_Struct(klass, struct SomeStruct, &data_type, str); SomeStruct *ptr; TypedData_Get_Struct(obj, …, ptr) # allocate and wrap struct SomeStruct to obj # allocate and wrap struct SomeStruct to obj
  23. protected T_DATA Object static const rb_data_type_t data_type= { “SomeStruct", {d_lite_gc_mark,

    RUBY_TYPED_DEFAULT_FREE, d_lite_memsize,}, 0, 0, RUBY_TYPED_FREE_IMMEDIATELY| | RUBY_TYPED_FROZEN_SHAREABLE, }; 24 RUBY_TYPED_WB_PROTECTED
  24. T_DATA Object with Write Barriers 25 When write barrier can

    be used T_DATA objects holds VALUE field inside C Structure When insertion of write barrier have reasonable merit • object may be long-lived • object may be numerous • object may have many references to other objects
  25. Macro for Write Barriers Declare a write barrier: RB_OBJ_WRITTEN(old, oldv,

    young) Assign a young object to a slot, and declare a write barrier: RB_OBJ_WRITE(old, *slot, young) 26
  26. Outline •Introduction •Memory management in MRI •Extending Ruby with C

    •Introduction of our tool: wb_check •Implementation of wb_check 27
  27. What is wb_check? •Static analysis of C extension code •Finds

    places where write barrier is required •Inserts write barriers automatically •Focuses on T_DATA objects •Uses tree-sitter to parse C code 28
  28. tree-sitter •Parser generator and incremental parsing library •Builds concrete syntax

    tree •Using open source C99 grammar 29 tree-sitter : https://tree-sitter.github.io/tree-sitter/ C99 grammar : https://github.com/tree-sitter/tree-sitter-c
  29. Preprocess Directive Expansion Reduce complexity of code #include Expansion Include

    header file into single source code 31 #if, #ifdef, #define … gcc -E main.c #include <main.h> … main.h int hoge() …
  30. Generating Syntax Tree •Use parser generated from tree-sitter and C99

    grammar •Access the syntax tree with ruby-tree-sitter binding 32 C99 grammar: https://github.com/tree-sitter/tree-sitter-c Ruby binding: https://github.com/Faveod/ruby-tree-sitter
  31. Syntax Tree Example 33 :declaration type :primitive declarator :init_declarator declarator

    :identifier value :binary_expression left :identifier right :identifier int c = a + b;
  32. Analysis Strategy First path (shallow traversal): Collect information about struct

    definition, global variables, function signatures, … Second path (deep traversal) Collect more information about locations where WBs are required 34
  33. parameter list 引数リスト Analysis Scope 35 structure definition 構造体定義 function

    declaration 関数宣言 expression 式 local variable declaration ローカル変数宣言 block ブロック global variable declaration グローバル変数宣言 function definition 関数定義 Root func/macro call 関数/マクロ呼出 body 関数本体
  34. When are WBs required? When new reference that T_DATA Object

    to another Object made 36 #define TypedData_Get_Struct(obj,type,data_type,sval) ¥ ((sval) = RBIMPL_CAST((type *)rb_check_typeddata((obj), (data_type)))) ruby/include/ruby/internal/core/rtypeddata.h line: 515 1. Locate pointer to Structure that T_DATA object have When VALUE type field in structure contained in T_DATA object is changed 2. Locate where VALUE type field is changed • Assignements • Function calls / macros
  35. Outline •Introduction •Memory management in MRI •Extending Ruby with C

    •Introduction of our tool: wb_check •Implementation of wb_check 37
  36. Manage Variables and Fields Represent these as Cvar instances •

    Variables • Field of structures 38 class CVar def initialize(...) @type = type @name = name @pointer_count = pointer_count @parenthesis_count = parenthesis_count @is_typeddata = false @parent_obj = nil end
  37. Scope of Local Variables 1. Variables are block-scoped 2. Outer

    variables can be accessed from inner blocks 3. Inner variables take precedence over outer variables 4. After exiting a block, the variables inside the block can’t be accessed anymore 39 def analyze_block(node, function_signature, _vars_in_local) vars_in_local = _vars_in_local.clone ... node.each_named do |child| case child.type when :declaration ... vars.each {|var| vars_in_local[var.name] = var} ... when :if_statement, ... analyze_block(child, function_signature, vars_in_local) ... # Do shallow copy of outer block’s map when new block starts # Pass variable list when new block is found # Overwrite if names are identical
  38. Catch Pointer Assign Locate pointers to structures that contain T_DATA

    objects 40 ((ptr) = ((struct SomeStruct *)rb_check_typeddata(...)))); 1. Locate assignment expressions 2. Check if rhs includes ‘rb_check_typeddata()’ 3. Mark assigned ptr variable
  39. Catch Reference Change 41 Function Body Assign Function Call Function

    Call Assign Function Body Finish if • Function that body not defined in code called • Traverse fell in loop
  40. Analysis with Context 42 Context 1 Context 2 Context 3

    Analyze function call Analyze function call Analyze function call class Context attr_accessor ... def initialize(marked_parameter_index) @marked_argument_index = marked_p... @current_function_params = Array.new @changed_parameter_index = Array.new end end
  41. Analyze Assignments node type description further recursive calls :identifier identifier

    of variable ✕ :parenthesized_expression parenthesis ◯ :assignment_expression assignment ◯ :pointer_expression dereference ◯ :field_expression access to field ◯ 43 Traverse lhs of assignments nodes recursively Return intermediate analysis information return { type_name: type_name, is_pointer_access: false, is_typeddata: cvar.is_typeddata, needWB: false }
  42. Analyze Field Access A operator B (for operators ' ->

    ', ' . ') • A[:is_typeddata] == true • operator == '->' || operator == '.' && A[:is_pointer_access] If B is a structure if that structure has a VALUE type field, a Write Barrier is needed If B isn’t a structure if B is VALUE type field, a Write Barrier is needed 44 *a.B
  43. Check Known APIs Some external function may change reference •

    C APIs • memcpy • strcpy • memset … 45
  44. Insert Write Barrier Insert a write barrier for each changed

    field 46 changed_fields.each do |changed_field| count = count + 1 old = "(VALUE)#{cvar.parent_obj}" oldv = '(VALUE)(((VALUE)RUBY_Qundef))' young = changed_field filename = '"auto_insertion"' line = -1 new_line << "line #{w.line_number }: rb_obj_written(#{old}, #{oldv}, #{young}, #{filename}, #{line})" end
  45. Accuracy Detected / total reference changes (recall) Number of detected

    but unnecessary change points (false positives) 48 date date_core.c json generator.c json parser.c stringio striogio.c recall 3 / 3 1 / 1 1 / 1 0 / 1 date date_core.c json generator.c json parser.c stringio striogio.c unnecessarily checked 7 4 0 0
  46. Future Work •Perform preprocessing automatically •Support cases other than assignments

    to variables •Support known APIs such as memcpy, strcpy, etc. •Reduce unnecessary insertions Feedback will help me a lot! 49
  47. Summary •Perform static analysis on C extension code •Detect reference

    changes: assignments, function calls •Insert write barrier code: assignments 51