Slide 1

Slide 1 text

Feature #20425 Optimizing forwarding methods

Slide 2

Slide 2 text

Targeted methods def recv(a, b) a + b end def call(...) recv(...) end … receivers … call sites

Slide 3

Slide 3 text

Implementation

Slide 4

Slide 4 text

Targeted methods ISEQs and call sites are tagged def recv(a, b) a + b end def call(...) recv(...) end call(1, 2) ISEQs tagged as forwardable Call sites tagged as forwarding

Slide 5

Slide 5 text

Targeted methods ISEQs and call sites are tagged def recv(a, b) a + b end def call(...) recv(...) end call(1, 2) Stack self 1 2 ME BH TYPE self 1 2 CALL_INFO memcpy( ) ISEQs tagged as forwardable Call sites tagged as forwarding CALL_INFO

Slide 6

Slide 6 text

All Forwardable Callers Supported All … forms are supported (callers and callees) def call(...) recv(...) end call(a: 1, b: 2) call(**foo) call(1, 2) { } # etc

Slide 7

Slide 7 text

All Forwarding Callers Supported Forwarding allows more parameters def call(...) recv("hello", ...) end def call2(...) x = [3, 4, 5] recv(*x, ...) end

Slide 8

Slide 8 text

Stack Escape Works No GC Modi fi cations Required def recv(a, b) a + b end def call(...) lambda { |x| recv(x, ...) } end call(1).call(2) Stack self 1 CALL_INFO Escapes with lambda

Slide 9

Slide 9 text

YJIT Implementation is Simple: Just SP Math

Slide 10

Slide 10 text

Benchmark Test calling in to a … method (positional parameters) def recv(a, b) a + b end def call(...) recv(...) end # def run # call(1, 2) # call(1, 2) # call(1, 2) # ... eval "def run; " + 200.times.map { "call(1, 2)" }.join("; ") + "; end" 200000.times do run end

Slide 11

Slide 11 text

Benchmark Results (~ 2x faster) $ hyperfine 'fwd/miniruby -v test2.rb' 'master/miniruby -v test2.rb' Benchmark 1: fwd/miniruby -v test2.rb Time (mean ± σ): 1.237 s ± 0.018 s [User: 1.233 s, System: 0.002 s] Range (min … max): 1.223 s … 1.286 s 10 runs Benchmark 2: master/miniruby -v test2.rb Time (mean ± σ): 2.791 s ± 0.010 s [User: 2.779 s, System: 0.008 s] Range (min … max): 2.770 s … 2.803 s 10 runs Summary fwd/miniruby -v test2.rb ran 2.26 ± 0.03 times faster than master/miniruby -v test2.rb

Slide 12

Slide 12 text

Benchmark Test calling in to a … method (keyword parameters) def recv(a:, b:) a + b end def call(...) recv(...) end # def run # call(a: 1, b: 2) # call(a: 1, b: 2) # call(a: 1, b: 2) # call(a: 1, b: 2) # ... eval "def run; " + 200.times.map { "call(a: 1, b: 2)" }.join("; ") + "; end" 200000.times do run end

Slide 13

Slide 13 text

Benchmark Results (~3x faster) Keyword Arguments $ hyperfine 'fwd/miniruby -v test2.rb' 'master/miniruby -v test2.rb' Benchmark 1: fwd/miniruby -v test2.rb Time (mean ± σ): 1.531 s ± 0.021 s [User: 1.527 s, System: 0.002 s] Range (min … max): 1.502 s … 1.577 s 10 runs Benchmark 2: master/miniruby -v test2.rb Time (mean ± σ): 4.863 s ± 0.021 s [User: 4.845 s, System: 0.011 s] Range (min … max): 4.846 s … 4.909 s 10 runs Summary fwd/miniruby -v test2.rb ran 3.18 ± 0.05 times faster than master/miniruby -v test2.rb

Slide 14

Slide 14 text

Impacted Code Paths • Uncached calls • `send` Instruction • `invokesuper` Instruction

Slide 15

Slide 15 text

Benchmark Inline cache misses class A def a; end end class B < A; end a = A.new b = B.new def call_method(obj) obj.a # never hits inline cache end # def run(a, b) # call_method(a) # call_method(b) # call_method(a) # call_method(b) # ... eval "def run(a, b); " + 200.times.map { "call_method(a); call_method(b)" }.join("; ") + "; end" 200000.times do run(a, b) end opt_send_without_block Never hits inline cache

Slide 16

Slide 16 text

Benchmark results Inline cache misses $ hyperfine 'fwd/miniruby -v test.rb' 'master/miniruby -v test.rb' Benchmark 1: fwd/miniruby -v test.rb Time (mean ± σ): 1.694 s ± 0.020 s [User: 1.690 s, System: 0.002 s] Range (min … max): 1.665 s … 1.719 s 10 runs Benchmark 2: master/miniruby -v test.rb Time (mean ± σ): 1.703 s ± 0.015 s [User: 1.698 s, System: 0.002 s] Range (min … max): 1.679 s … 1.723 s 10 runs Summary fwd/miniruby -v test.rb ran 1.00 ± 0.02 times faster than master/miniruby -v test.rb

Slide 17

Slide 17 text

Benchmark Inline cache misses with block class A def a; end end class B < A; end a = A.new b = B.new def call_method(obj) obj.a { } # Always send instruction end # def run(a, b) # call_method(a) # call_method(b) # call_method(a) # call_method(b) # ... eval "def run(a, b); " + 200.times.map { "call_method(a); call_method(b)" }.join("; ") + "; end" 200000.times do run(a, b) end send Never hits inline cache

Slide 18

Slide 18 text

Benchmark results Inline cache misses with block $ hyperfine 'fwd/miniruby -v test.rb' 'master/miniruby -v test.rb' Benchmark 1: fwd/miniruby -v test.rb Time (mean ± σ): 1.871 s ± 0.015 s [User: 1.866 s, System: 0.002 s] Range (min … max): 1.852 s … 1.898 s 10 runs Benchmark 2: master/miniruby -v test.rb Time (mean ± σ): 1.723 s ± 0.007 s [User: 1.719 s, System: 0.002 s] Range (min … max): 1.710 s … 1.734 s 10 runs Summary master/miniruby -v test.rb ran 1.09 ± 0.01 times faster than fwd/miniruby -v test.rb

Slide 19

Slide 19 text

Benchmark Super calls (no cache) class A def a; end end class B < A; def a; super; end end b = B.new def call_method(obj) obj.a # Calls invoke_super end # def run(b) # call_method(b) # call_method(b) # ... eval "def run(b); " + 400.times.map { "call_method(b)" }.join("; ") + "; end" 200000.times do run(b) end invokesuper Never hits inline cache

Slide 20

Slide 20 text

Benchmark results Super calls (no cache) $ hyperfine 'fwd/miniruby -v test.rb' 'master/miniruby -v test.rb' Benchmark 1: fwd/miniruby -v test.rb Time (mean ± σ): 2.553 s ± 0.127 s [User: 2.547 s, System: 0.002 s] Range (min … max): 2.397 s … 2.747 s 10 runs Benchmark 2: master/miniruby -v test.rb Time (mean ± σ): 2.240 s ± 0.055 s [User: 2.234 s, System: 0.002 s] Range (min … max): 2.170 s … 2.310 s 10 runs Summary master/miniruby -v test.rb ran 1.14 ± 0.06 times faster than fwd/miniruby -v test.rb

Slide 21

Slide 21 text

“Forwarding” is known at compile time: forward_send forward_super insns?

Slide 22

Slide 22 text

CI Status Ruby CI ✅ Shopify CI ✅ Shopify CI + YJIT ✅

Slide 23

Slide 23 text

Merge Now, Revert or Rewrite Later

Slide 24

Slide 24 text

Future

Slide 25

Slide 25 text

Class#new in Ruby

Slide 26

Slide 26 text

Initialize with Keyword Args Benchmark class Foo def initialize(a:, b:) @a = a @b = a end end def call(a, b) Foo.new(a:, b:) end # def run # call(1, 2) # call(1, 2) # ... eval "def run; " + 200.times.map { "call(1, 2)" }.join("; ") + "; end" 200000.times do run end

Slide 27

Slide 27 text

Benchmark results: 40% faster Initialize can be faster in Ruby $ hyperfine 'fwd/miniruby -v test2.rb' 'master/miniruby -v test2.rb' Benchmark 1: fwd/miniruby -v test2.rb Time (mean ± σ): 3.737 s ± 0.070 s [User: 3.724 s, System: 0.008 s] Range (min … max): 3.651 s … 3.816 s 10 runs Benchmark 2: master/miniruby -v test2.rb Time (mean ± σ): 5.276 s ± 0.028 s [User: 5.257 s, System: 0.012 s] Range (min … max): 5.235 s … 5.314 s 10 runs Summary fwd/miniruby -v test2.rb ran 1.41 ± 0.03 times faster than master/miniruby -v test2.rb