Slide 1

Slide 1 text

Hello!

Slide 2

Slide 2 text

Aaron Patterson @tenderlove

Slide 3

Slide 3 text

Speeding up Class#new A new approach

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

I LOVE Ruby!! It’s a Fact

Slide 6

Slide 6 text

Can Ruby be faster than C? Answer: Yes. But how?

Slide 7

Slide 7 text

Can Class#new be Ruby? Answer: Yes. But is it fast enough?

Slide 8

Slide 8 text

Study Class#new implementation We can’t speed it up if we don’t know how it works!

Slide 9

Slide 9 text

Inline Caches What they are, how they work.

Slide 10

Slide 10 text

Calling Conventions What they are, and impact on method calls.

Slide 11

Slide 11 text

Class#new implementation Implement a new new in Ruby and performance test it

Slide 12

Slide 12 text

Why Class#new?

Slide 13

Slide 13 text

It’s implemented in C

Slide 14

Slide 14 text

It’s very popular

Slide 15

Slide 15 text

Examples of using Class#new Creating new objects User.new(name: "Aaron") # haha business YourBusinessObject.new(1, 2)

Slide 16

Slide 16 text

100 Objects?

Slide 17

Slide 17 text

100,000 objects?

Slide 18

Slide 18 text

500 trillion objects!!!

Slide 19

Slide 19 text

Nobody complains if it’s faster (স)

Slide 20

Slide 20 text

Class#new implementation

Slide 21

Slide 21 text

Allocate an instance Pass parameters to initialize Returns the instance

Slide 22

Slide 22 text

class Class def new(...) allocate.initialize(...) end end class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => ??? :hello Possible Implementation First, buggy implementation

Slide 23

Slide 23 text

Possible Implementation Second, less buggy implementation class Class def new(...) obj = allocate obj.initialize(...) obj end end class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => #

Slide 24

Slide 24 text

Possible Implementation Things we need to know about class Class def new(...) obj = allocate obj.initialize(...) obj end end class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => # method call method call 2 inline caches Calling conventions

Slide 25

Slide 25 text

Actual Implementation Class#new implementation in C VALUE rb_class_new_instance_pass_kw(int argc, const VALUE *argv, VALUE klass) { VALUE obj; obj = rb_class_alloc(klass); rb_obj_call_init_kw(obj, argc, argv, RB_PASS_CALLED_KEYWORDS); return obj; } class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => # VALUE rb_class_new_instance_pass_kw(int argc, const VALUE *argv, VALUE klass) { VALUE obj; obj = rb_class_alloc(klass); rb_obj_call_init_kw(obj, argc, argv, RB_PASS_CALLED_KEYWORDS); return obj; } class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => # Allocation happens here

Slide 26

Slide 26 text

VALUE rb_class_new_instance_pass_kw(int argc, const VALUE *argv, VALUE klass) { VALUE obj; obj = rb_class_alloc(klass); rb_obj_call_init_kw(obj, argc, argv, RB_PASS_CALLED_KEYWORDS); return obj; } class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => # Actual Implementation Class#new implementation in C

Slide 27

Slide 27 text

VALUE rb_class_new_instance_pass_kw(int argc, const VALUE *argv, VALUE klass) { VALUE obj; obj = rb_class_alloc(klass); rb_obj_call_init_kw(obj, argc, argv, RB_PASS_CALLED_KEYWORDS); return obj; } class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => # Actual Implementation Class#new implementation in C

Slide 28

Slide 28 text

VALUE rb_class_new_instance_pass_kw(int argc, const VALUE *argv, VALUE klass) { VALUE obj; obj = rb_class_alloc(klass); rb_obj_call_init_kw(obj, argc, argv, RB_PASS_CALLED_KEYWORDS); return obj; } class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => # Actual Implementation Class#new implementation in C

Slide 29

Slide 29 text

VALUE rb_class_new_instance_pass_kw(int argc, const VALUE *argv, VALUE klass) { VALUE obj; obj = rb_class_alloc(klass); rb_obj_call_init_kw(obj, argc, argv, RB_PASS_CALLED_KEYWORDS); return obj; } class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => # Actual Implementation Class#new implementation in C C Ruby Ruby

Slide 30

Slide 30 text

Different Calling Conventions Crossing language border requires translation C Ruby Ruby Translate to C calling convention Translate back to Ruby calling convention

Slide 31

Slide 31 text

Crossing Language Barrier Can Cost Language barrier means “translation” C Ruby Ruby Ruby Ruby Ruby

Slide 32

Slide 32 text

Crossing Language Barrier Can Cost Language barrier means “translation” C Ruby Ruby Ruby Ruby Ruby Time Time Savings

Slide 33

Slide 33 text

Where is time spent? Arrows are impacted by calling convention C Ruby Ruby Ruby Ruby Ruby Time Savings

Slide 34

Slide 34 text

Where is time spent? Methods are impacted by VM speed / method lookup (inline caches) C Ruby Ruby Ruby Ruby Ruby Time Savings

Slide 35

Slide 35 text

Inline Caches Making method lookups faster

Slide 36

Slide 36 text

Where is “baz”? Methods must be located, and we can cache the location class Foo def bar self.baz end def baz end end Where is the baz method?

Slide 37

Slide 37 text

Where is “baz”? Method lookup routine def find_method(receiver, method_name) # Loop through ancestors until we find the method while !receiver.method_defined?(method_name) receiver = receiver.superclass end # Return the method receiver.method(method_name) end def call_method(receiver, method_name) # Call the method find_method(receiver, method_name).call end

Slide 38

Slide 38 text

Method lookup can be linear We have to scan ancestors looking for the method class A; def baz; end end class B < A; end class C < B; end class D < C; end class E < D; end class F < E; end # .... class Foo < Z def bar self.baz end end 😭

Slide 39

Slide 39 text

Cache the Method!

Slide 40

Slide 40 text

Receiver type is our cache key Create a cache entry where the key is the class, and the value is the method class A; def baz; end end class B < A; end class C < B; end class D < C; end class E < D; end class F < E; end # .... class Foo < Z def bar self.baz end end Cache key is the class of self: Foo Cache value is the method entry

Slide 41

Slide 41 text

Cache: Stored inline with bytecode That’s why it’s called “inline” cache == disasm: # 0000 putself ( 12)[LiCa] 0001 opt_send_without_block 0003 leave ( 13)[Re]

Slide 42

Slide 42 text

Cache: Stored inline with bytecode That’s why it’s called “inline” cache == disasm: # 0000 putself ( 12)[LiCa] 0001 opt_send_without_block 0003 leave ( 13)[Re]

Slide 43

Slide 43 text

Inline Cache Object Graph bar method points at cache, cache points at method entry class A; def baz; end end class B < A; end class C < B; end class D < C; end class E < D; end class F < E; end # .... class Foo < Z def bar self.baz end end bar method A#baz inline cache A#baz method Weak Reference Ruby Objects class A

Slide 44

Slide 44 text

Measure In line Cache Allocations Objects can be allocated on the fi rst call, but not subsequent def baz; end def bar # First time this is called, an object gets allocated self.baz end def measure x = GC.stat(:total_allocated_objects) yield GC.stat(:total_allocated_objects) - x end measure { } # heat p measure { bar } # => 2 p measure { bar } # => 0 First call allocates

Slide 45

Slide 45 text

Warmups are important 🏋

Slide 46

Slide 46 text

Inline Cache: Only holds one object Inline caches are “monomorphic”, they only cache one item class A; def baz; end end class B < A; end class C < B; end class D < C; end class E < D; end class F < E; end # .... class Foo < Z def bar self.baz end end bar method A#baz inline cache A#baz method Weak Reference class A

Slide 47

Slide 47 text

Inline Cache: Only holds one object Inline caches are “monomorphic”, they only cache one item class A def bar; end end class B def bar; end end def run_it(obj); obj.bar; end run_it(A.new) run_it(B.new) run_it method A#bar inline cache A#bar method B#bar inline cache B#bar method

Slide 48

Slide 48 text

Cache hit / miss examples Cache size is one, so we only hit when repeating types def run_it(obj); obj.bar; end run_it(A.new) # cache miss run_it(A.new) # cache hit run_it(A.new) # cache hit run_it(A.new) # cache hit run_it(B.new) # cache miss run_it(B.new) # cache hit run_it(B.new) # cache hit run_it(B.new) # cache hit run_it(A.new) # cache miss run_it(B.new) # cache miss run_it(A.new) # cache miss run_it(B.new) # cache miss run_it(A.new) # cache miss run_it(B.new) # cache miss

Slide 49

Slide 49 text

Cache hit / miss comparison Compare always hitting to never hitting class A; def bar; end; end class B; def bar; end; end def run_it(obj); obj.bar; end a = A.new b = B.new Benchmark.ips { |x| x.report("always hit") { run_it(a); run_it(a); run_it(a); run_it(a) run_it(a); run_it(a); run_it(a); run_it(a) run_it(a); run_it(a); run_it(a); run_it(a) run_it(a); run_it(a); run_it(a); run_it(a) } x.report("never hit") { run_it(a); run_it(b); run_it(a); run_it(b) run_it(a); run_it(b); run_it(a); run_it(b) run_it(a); run_it(b); run_it(a); run_it(b) run_it(a); run_it(b); run_it(a); run_it(b) } x.compare! } always call with a alternate between a and b

Slide 50

Slide 50 text

Benchmark Results Never hitting is about 40% slower $ ruby test.rb ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24] Warming up -------------------------------------- always hit 319.277k i/100ms never hit 224.288k i/100ms Calculating ------------------------------------- always hit 3.189M (± 2.4%) i/s (313.58 ns/i) - 15.964M in 5.008714s never hit 2.280M (± 1.8%) i/s (438.55 ns/i) - 11.439M in 5.018054s Comparison: always hit: 3188993.5 i/s never hit: 2280264.0 i/s - 1.40x slower $ ruby test.rb ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24] Warming up -------------------------------------- always hit 319.277k i/100ms never hit 224.288k i/100ms Calculating ------------------------------------- always hit 3.189M (± 2.4%) i/s (313.58 ns/i) - 15.964M in 5.008714s never hit 2.280M (± 1.8%) i/s (438.55 ns/i) - 11.439M in 5.018054s Comparison: always hit: 3188993.5 i/s never hit: 2280264.0 i/s - 1.40x slower

Slide 51

Slide 51 text

Inline caches are important for performance

Slide 52

Slide 52 text

Class#new is written in C!

Slide 53

Slide 53 text

VALUE rb_class_new_instance_pass_kw(int argc, const VALUE *argv, VALUE klass) { VALUE obj; obj = rb_class_alloc(klass); rb_obj_call_init_kw(obj, argc, argv, RB_PASS_CALLED_KEYWORDS); return obj; } class Foo def initialize(a, b) :hello end end Foo.new(1, 2) # => # Initialize is called from C Class#new implementation in C

Slide 54

Slide 54 text

rb_obj_call_init_kw Calls your initialize method void rb_obj_call_init_kw(VALUE obj, int argc, const VALUE *argv, int kw_splat) { PASS_PASSED_BLOCK_HANDLER(); rb_funcallv_kw(obj, idInitialize, argc, argv, kw_splat); } Call a m ethod O n this object M ethod nam e Param eters

Slide 55

Slide 55 text

rb_funcallv_kw Calls any method VALUE rb_funcallv_kw(VALUE recv, ID mid, int argc, const VALUE *argv, int kw_splat) { VM_ASSERT(ruby_thread_has_gvl_p()); return rb_call(recv, mid, argc, argv, kw_splat ? CALL_FCALL_KW : CALL_FCALL); } VALUE rb_funcallv_kw(VALUE recv, ID mid, int argc, const VALUE *argv, int kw_splat) { VM_ASSERT(ruby_thread_has_gvl_p()); return rb_call(recv, mid, argc, argv, kw_splat ? CALL_FCALL_KW : CALL_FCALL); }

Slide 56

Slide 56 text

rb_call Just passes everything to rb_call0 static inline VALUE rb_call(VALUE recv, ID mid, int argc, const VALUE *argv, call_type scope) { rb_execution_context_t *ec = GET_EC(); return rb_call0(ec, recv, mid, argc, argv, scope, ec->cfp->self); } static inline VALUE rb_call(VALUE recv, ID mid, int argc, const VALUE *argv, call_type scope) { rb_execution_context_t *ec = GET_EC(); return rb_call0(ec, recv, mid, argc, argv, scope, ec->cfp->self); } Why is it named rb_call0????? 😂

Slide 57

Slide 57 text

rb_call0 Is actually complicated, and I’ve omitted some of it static inline VALUE rb_call0(rb_execution_context_t *ec, VALUE recv, ID mid, int argc, const VALUE *argv, call_type call_scope, VALUE self) { /* Snip */ struct rb_callinfo ci; scope_to_ci(scope, mid, argc, &ci); const struct rb_callcache *cc = gccct_method_search(ec, recv, mid, &ci); /* Snip */ return vm_call0_cc(ec, recv, mid, argc, argv, cc, kw_splat); } static inline VALUE rb_call0(rb_execution_context_t *ec, VALUE recv, ID mid, int argc, const VALUE *argv, call_type call_scope, VALUE self) { /* Snip */ struct rb_callinfo ci; scope_to_ci(scope, mid, argc, &ci); const struct rb_callcache *cc = gccct_method_search(ec, recv, mid, &ci); /* Snip */ return vm_call0_cc(ec, recv, mid, argc, argv, cc, kw_splat); }

Slide 58

Slide 58 text

If Ruby methods store caches in byte code, where do C functions store them?

Slide 59

Slide 59 text

gccct_method_search gccct: Global call cache cache table static inline const struct rb_callcache * gccct_method_search(rb_execution_context_t *ec, VALUE recv, ID mid, const struct rb_callinfo *ci) { /* snip */ // search global method cache unsigned int index = (unsigned int)(gccct_hash(klass, mid) % VM_GLOBAL_CC_CACHE_TABLE_SIZE); rb_vm_t *vm = rb_ec_vm_ptr(ec); const struct rb_callcache *cc = vm->global_cc_cache_table[index]; if (LIKELY(cc)) { if (LIKELY(vm_cc_class_check(cc, klass))) { const rb_callable_method_entry_t *cme = vm_cc_cme(cc); if (LIKELY(!METHOD_ENTRY_INVALIDATED(cme) && cme->called_id == mid)) { VM_ASSERT(vm_cc_check_cme(cc, rb_callable_method_entry(klass, mid))); return cc; } } } return gccct_method_search_slowpath(vm, klass, index, ci); } static inline const struct rb_callcache * gccct_method_search(rb_execution_context_t *ec, VALUE recv, ID mid, const struct rb_callinfo *ci) { /* snip */ // search global method cache unsigned int index = (unsigned int)(gccct_hash(klass, mid) % VM_GLOBAL_CC_CACHE_TABLE_SIZE); rb_vm_t *vm = rb_ec_vm_ptr(ec); const struct rb_callcache *cc = vm->global_cc_cache_table[index]; if (LIKELY(cc)) { if (LIKELY(vm_cc_class_check(cc, klass))) { const rb_callable_method_entry_t *cme = vm_cc_cme(cc); if (LIKELY(!METHOD_ENTRY_INVALIDATED(cme) && cme->called_id == mid)) { VM_ASSERT(vm_cc_check_cme(cc, rb_callable_method_entry(klass, mid))); return cc; } } } return gccct_method_search_slowpath(vm, klass, index, ci); } static inline const struct rb_callcache * gccct_method_search(rb_execution_context_t *ec, VALUE recv, ID mid, const struct rb_callinfo *ci) { /* snip */ // search global method cache unsigned int index = (unsigned int)(gccct_hash(klass, mid) % VM_GLOBAL_CC_CACHE_TABLE_SIZE); rb_vm_t *vm = rb_ec_vm_ptr(ec); const struct rb_callcache *cc = vm->global_cc_cache_table[index]; if (LIKELY(cc)) { if (LIKELY(vm_cc_class_check(cc, klass))) { const rb_callable_method_entry_t *cme = vm_cc_cme(cc); if (LIKELY(!METHOD_ENTRY_INVALIDATED(cme) && cme->called_id == mid)) { VM_ASSERT(vm_cc_check_cme(cc, rb_callable_method_entry(klass, mid))); return cc; } } } return gccct_method_search_slowpath(vm, klass, index, ci); } static inline const struct rb_callcache * gccct_method_search(rb_execution_context_t *ec, VALUE recv, ID mid, const struct rb_callinfo *ci) { /* snip */ // search global method cache unsigned int index = (unsigned int)(gccct_hash(klass, mid) % VM_GLOBAL_CC_CACHE_TABLE_SIZE); rb_vm_t *vm = rb_ec_vm_ptr(ec); const struct rb_callcache *cc = vm->global_cc_cache_table[index]; if (LIKELY(cc)) { if (LIKELY(vm_cc_class_check(cc, klass))) { const rb_callable_method_entry_t *cme = vm_cc_cme(cc); if (LIKELY(!METHOD_ENTRY_INVALIDATED(cme) && cme->called_id == mid)) { VM_ASSERT(vm_cc_check_cme(cc, rb_callable_method_entry(klass, mid))); return cc; } } } return gccct_method_search_slowpath(vm, klass, index, ci); } Lookup cache in global table Is the cache entry good? Slow path

Slide 60

Slide 60 text

C methods calling Ruby methods It has “method caches” Stored in a global table Cache entries limited by table size

Slide 61

Slide 61 text

Calling Conventions What they are, and impact on method calls.

Slide 62

Slide 62 text

A “convention” for connect methods Calling convention de fi nes where parameters and return values will be def bar(x, y, z) x + y + z end def foo bar(1, 2, 3) end Arguments are stored in a certain place Parameters are read from a certain place Return value is stored in a certain place Return value is read from a certain place

Slide 63

Slide 63 text

Decouple caller and callee

Slide 64

Slide 64 text

Languages can define their own calling conventions

Slide 65

Slide 65 text

Ruby calling convention

Slide 66

Slide 66 text

Ruby: Stack based VM

Slide 67

Slide 67 text

def bar(a, b, c) a + b + c end def foo bar(5, 7, 9) end Ruby Code Calling Convention: Caller’s Side Stack values become method parameters def bar(a, b, c) a + b + c end def foo bar(5, 7, 9) end Ruby Code Push 5 Instructions Stack 5 Push 7 Push 9 7 9 Call bar

Slide 68

Slide 68 text

def bar(a, b, c) a + b + c end def foo bar(5, 7, 9) end Ruby Code Calling Convention: Callee’s side Stack values become method parameters Instructions Stack 5 Getlocal -3 Getlocal -2 Add Getlocal -1 Add 7 9 5 12 21 Return 7 9

Slide 69

Slide 69 text

Calling Convention: Keyword Arguments Stack values become method parameters def bar(a:, b:, c:) a + b + c end def foo bar(a: 5, b: 7, c: 9) end Ruby Code Instructions Stack 5 Getlocal -3 Getlocal -2 Add Getlocal -1 Add 7 9 5 12 21 Return

Slide 70

Slide 70 text

Calling Convention: Keyword Arguments Stack values become method parameters def bar(a:, b:, c:) a + b + c end def foo bar(c: 9, a: 5, b: 7) end Ruby Code Instructions Stack 9 Push 9 Push 5 Push 7 Call bar 5 7

Slide 71

Slide 71 text

Keyword Args can be slower (Please DO NOT change your code)

Slide 72

Slide 72 text

C: No concept of KW args, splats, defaults, etc

Slide 73

Slide 73 text

Friction Between Languages

Slide 74

Slide 74 text

Conversion takes time

Slide 75

Slide 75 text

Keyword Arguments Normally don’t require allocations def bar(a:, b:, c:) a + b + c end def foo bar(c: 9, a: 5, b: 7) end count_allocs { foo } # heat p count_allocs { foo } # => 0 Ruby Ruby

Slide 76

Slide 76 text

Keyword Arguments + Initialize We expect one allocation, Foo, but there are two allocations class Foo def initialize(a:, b:, c:) end end def foo Foo.new(c: 9, a: 5, b: 7) end count_allocs { foo } # heat p count_allocs { foo } # => 2 Ruby C Ruby

Slide 77

Slide 77 text

Hash is allocated across language barrier We have to convert kwargs to a hash, then back to stack positions 5 7 9 { a: 5, b: 7, c: 9 } 5 7 9

Slide 78

Slide 78 text

Class#new implementation Implement a new new in Ruby and performance test it

Slide 79

Slide 79 text

Class#new First implementation class Class def new(...) instance = allocate instance.initialize(...) instance end end Initialize is private!

Slide 80

Slide 80 text

Trying to call Initialize won’t work Because it’s a private method 😭 class Foo def initialize end private def bar; end end Foo.allocate.initialize $ ruby thing.rb thing.rb:10:in '': private method 'initialize' called for an instance of Foo (NoMethodError) Foo.allocate.initialize ^^^^^^^^^^^

Slide 81

Slide 81 text

Class#new, second attempt Also doesn’t work class Class def new(...) instance = allocate instance.send(:initialize, ...) instance end end No “send” on BasicObject

Slide 82

Slide 82 text

BasicObject is very Basic It doesn’t support send! obj = BasicObject.new obj.send :initialize $ ruby thing.rb thing.rb:2:in '': undefined method 'send' for an instance of BasicObject (NoMethodError) obj.send :initialize ^^^^^

Slide 83

Slide 83 text

We need to call a private method without using send.

Slide 84

Slide 84 text

We cheat!

Slide 85

Slide 85 text

Special flags on send instruction FCALL means “ignore method visibility” class Foo def initialize bar end private def bar; end end Ruby Code == disasm: # 0000 putself ( 3)[LiCa] 0001 send , nil 0004 leave ( 4)[Re] VM Instructions == disasm: # 0000 putself ( 3)[LiCa] 0001 send , nil 0004 leave ( 4)[Re] VM Instructions

Slide 86

Slide 86 text

Class#new First implementation class Class def new(...) instance = allocate instance.initialize(...) instance end end

Slide 87

Slide 87 text

Class#new Instructions We need an FCALL fl ag == disasm: # local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 2] "..."@0 [ 1] instance@1 0000 putself ( 3)[LiCa] 0001 send , nil 0004 setlocal instance@1, 0 0007 getlocal instance@1, 0 ( 4)[Li] 0010 getlocal "..."@0, 0 0013 sendforward , nil 0016 pop 0017 getlocal instance@1, 0 ( 5)[Li] 0020 leave ( 6)[Re] == disasm: # local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 2] "..."@0 [ 1] instance@1 0000 putself ( 3)[LiCa] 0001 send , nil 0004 setlocal instance@1, 0 0007 getlocal instance@1, 0 ( 4)[Li] 0010 getlocal "..."@0, 0 0013 sendforward , nil 0016 pop 0017 getlocal instance@1, 0 ( 5)[Li] 0020 leave ( 6)[Re] Add FCALL here

Slide 88

Slide 88 text

New Primitive

Slide 89

Slide 89 text

Class#new take three With a primitive class Class def new(...) obj = allocate Primitive.send_delegate!( obj, :initialize, ...) obj end end

Slide 90

Slide 90 text

== disasm: #:2 (2,2)-(8,5)> local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 2] "..."@0 [ 1] obj@1 0000 putself ( 3)[LiCa] 0001 opt_send_without_block 0003 setlocal_WC_0 obj@1 0005 getlocal_WC_0 obj@1 ( 5)[Li] 0007 getlocal_WC_0 "..."@0 ( 4) 0009 sendforward , nil 0012 pop 0013 getlocal_WC_0 obj@1 ( 7)[Li] 0015 leave ( 8)[Re] Class#new take 3 Instructions == disasm: #:2 (2,2)-(8,5)> local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 2] "..."@0 [ 1] obj@1 0000 putself ( 3)[LiCa] 0001 opt_send_without_block 0003 setlocal_WC_0 obj@1 0005 getlocal_WC_0 obj@1 ( 5)[Li] 0007 getlocal_WC_0 "..."@0 ( 4) 0009 sendforward , nil 0012 pop 0013 getlocal_WC_0 obj@1 ( 7)[Li] 0015 leave ( 8)[Re] FCALL

Slide 91

Slide 91 text

Initialize without send!

Slide 92

Slide 92 text

Class#new take three Doesn’t pass tests class Class def new(...) obj = allocate Primitive.send_delegate!( obj, :initialize, ...) obj end end People monkey patch allocate

Slide 93

Slide 93 text

Monkeypatch to allocate is ignored This code works fi ne class Class def allocate raise "hahahaha" end end class Foo end Foo.new # works fine

Slide 94

Slide 94 text

New Primitive!

Slide 95

Slide 95 text

Allocate an object without calling “allocate” Primitive is not impacted by monkey patches class Class def new(...) obj = Primitive.rb_class_alloc2 Primitive.send_delegate!( obj, :initialize, ...) obj end end class Class def new(...) obj = Primitive.rb_class_alloc2 Primitive.send_delegate!( obj, :initialize, ...) obj end end

Slide 96

Slide 96 text

Class#new instruction sequence Allocating a new object: 8 instructions obj = Primitive.rb_class_alloc2 Primitive.send_delegate!( obj, :initialize, ...) obj Ruby Code allocate Instructions Stack setlocal getlocal getlocal send pop getlocal leave new instance ...

Slide 97

Slide 97 text

Ruby without Ruby Inlining Class#new

Slide 98

Slide 98 text

Ruby Code is YARV Our Ruby code is converted to YARV instructions class Class def new(...) obj = Primitive.rb_class_alloc2 Primitive.send_delegate!( obj, :initialize, ...) obj end end Ruby Code YARV Instructions allocate send pop leave dup Foo.new Ruby Code putobject Foo send

Slide 99

Slide 99 text

YARV Instructions Paste “Class#new” code at the call site allocate send pop dup Foo.new Ruby Code putobject Foo

Slide 100

Slide 100 text

Implement Class#new in Ruby without Ruby

Slide 101

Slide 101 text

Compiler Modifications - PUSH_SEND_R(ret, location, method_id, INT2FIX(orig_argc), block_iseq, INT2FIX(flags), kw_arg); + LABEL *not_basic_new = NEW_LABEL(location.line); + LABEL *not_basic_new_finish = NEW_LABEL(location.line); + + bool inline_new = ISEQ_COMPILE_DATA(iseq)->option->specialized_instruction && + method_id == rb_intern("new") && + call_node->block == NULL && + !(flags & VM_CALL_ARGS_BLOCKARG); + + if (inline_new) { + if (LAST_ELEMENT(ret) == opt_new_prelude) { + PUSH_INSN(ret, location, putnil); + PUSH_INSN(ret, location, swap); + } + else { + ELEM_INSERT_NEXT(opt_new_prelude, &new_insn_body(iseq, location.line, location.node_id, BIN(swap), 0)->link); + ELEM_INSERT_NEXT(opt_new_prelude, &new_insn_body(iseq, location.line, location.node_id, BIN(putnil), 0)->link); + } + + // Jump unless the receiver uses the "basic" implementation of "new" + VALUE ci; + if (flags & VM_CALL_FORWARDING) { + ci = (VALUE)new_callinfo(iseq, method_id, orig_argc + 1, flags, kw_arg, 0); + } + else { + ci = (VALUE)new_callinfo(iseq, method_id, orig_argc, flags, kw_arg, 0); + } + + PUSH_INSN2(ret, location, opt_new, ci, not_basic_new); + LABEL_REF(not_basic_new); + // optimized path + PUSH_SEND_R(ret, location, rb_intern("initialize"), INT2FIX(orig_argc), block_iseq, INT2FIX(flags | VM_CALL_FCALL), kw_arg); + PUSH_INSNL(ret, location, jump, not_basic_new_finish); + Is this a call to “new”? Insert special instructions If not new, jump to slow path

Slide 102

Slide 102 text

“Object.new” Before and after inlining Instructions for “new” are inserted at the call site > ruby --dump=insns -e'Object.new' == disasm: #@-e:1 (1,0)-(1,10)> 0000 opt_getconstant_path ( 1)[Li] 0002 opt_send_without_block 0004 leave Before > ./ruby --dump=insns -e'Object.new' == disasm: #@-e:1 (1,0)-(1,10)> 0000 opt_getconstant_path ( 1)[Li] 0002 putnil 0003 swap 0004 opt_new , 11 0007 opt_send_without_block 0009 jump 14 0011 opt_send_without_block 0013 swap 0014 pop 0015 leave After

Slide 103

Slide 103 text

How does it perform?

Slide 104

Slide 104 text

Keyword Arguments Only one object allocated class Foo def initialize(a:, b:, c:) end end def foo Foo.new(c: 9, a: 5, b: 7) end count_allocs { foo } # heat p count_allocs { foo } # => 1 Only One Allocation!!!

Slide 105

Slide 105 text

Allocations per second

Slide 106

Slide 106 text

Increase the number of parameters Foo.new Foo.new(1) Foo.new(1, 2) Foo.new(1, 2, 3)

Slide 107

Slide 107 text

Change the type of parameters Foo.new Foo.new(a: 1) Foo.new(a: 1, b: 2) Foo.new(a: 1, b: 2, c: 3)

Slide 108

Slide 108 text

Vary the type of object Foo.new Bar.new Foo.new Bar.new Foo.new Bar.new Foo.new Bar.new

Slide 109

Slide 109 text

Positional Parameters Allocations per Second by Ruby version Allocations Per Second 0 9500000 19000000 28500000 38000000 Number of Parameters 0 1 2 3 4 5 6 7 8 9 10 Ruby 3.5 + inlining Ruby 3.4

Slide 110

Slide 110 text

~1.8x Faster

Slide 111

Slide 111 text

Keyword Parameters Allocations per second by Ruby version Allocations Per Second 0 10000000 20000000 30000000 40000000 Number of Parameters 0 1 2 3 4 5 6 7 8 9 10 Ruby 3.5+inlining Ruby 3.4

Slide 112

Slide 112 text

3 Keyword Params: 3.2x faster

Slide 113

Slide 113 text

10 Keyword Params: 6.2x faster

Slide 114

Slide 114 text

Positional Parameters + Varied Classes Allocations per second by Ruby version
 (varying allocated class) Allocations per Second 0 9500000 19000000 28500000 38000000 Number of Parameters 0 1 2 3 4 5 6 7 8 9 10 Ruby 3.5+Inlining Ruby 3.4

Slide 115

Slide 115 text

Keyword Parameters + Varied Classes Allocations per second by Ruby version
 (varying allocated class) Allocations Per Second 0 10000000 20000000 30000000 40000000 Number of Parameters 0 1 2 3 4 5 6 7 8 9 10 Ruby 3.5+Inlining Ruby 3.4

Slide 116

Slide 116 text

Speedup depends on parameters

Slide 117

Slide 117 text

Minimum: 1.4x faster

Slide 118

Slide 118 text

Downsides

Slide 119

Slide 119 text

More memory

Slide 120

Slide 120 text

Measure ISeq size How many bytes does the “alloc” method use? require "objspace" def alloc Object.new end m = method(:alloc) insn = RubyVM::InstructionSequence.of(insn) puts ObjectSpace.memsize_of(insn) Ruby 3.5 + inlining: 656 bytes Ruby 3.4: 544 bytes +122 Bytes

Slide 121

Slide 121 text

Different Stack Trace

Slide 122

Slide 122 text

Stack Trace is Different Class#new is missing class Foo def initialize puts caller end end def hello Foo.new end hello > ruby test.rb test.rb:8:in 'Class#new' test.rb:8:in 'Object#hello' test.rb:11:in '' Ruby 3.4 > ./ruby test.rb test.rb:8:in 'Object#hello' test.rb:11:in '' Ruby 3.5 + inlining

Slide 123

Slide 123 text

Let’s wrap this up

Slide 124

Slide 124 text

Class#new The C implementation

Slide 125

Slide 125 text

Inline caches How they help speed up method calls

Slide 126

Slide 126 text

Calling conventions What they are and their impact on our code

Slide 127

Slide 127 text

Class#new + Inlining In Ruby, and its performance characteristics

Slide 128

Slide 128 text

Excited about Ruby internals

Slide 129

Slide 129 text

Excited about RUBY!

Slide 130

Slide 130 text

THANK YOU!!!