Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Feature #20425] Speeding up delegate methods

[Feature #20425] Speeding up delegate methods

Slides for the Ruby core dev meeting on speeding up delegates

Aaron Patterson

May 14, 2024
Tweet

More Decks by Aaron Patterson

Other Decks in Programming

Transcript

  1. Targeted methods def recv(a, b) a + b end def

    call(...) recv(...) end … receivers … call sites
  2. Targeted methods ISEQs and call sites are tagged def recv(a,

    b) a + b end def call(...) recv(...) end call(1, 2) ISEQs tagged as forwardable Call sites tagged as forwarding
  3. Targeted methods ISEQs and call sites are tagged def recv(a,

    b) a + b end def call(...) recv(...) end call(1, 2) Stack self 1 2 ME BH TYPE self 1 2 CALL_INFO memcpy( ) ISEQs tagged as forwardable Call sites tagged as forwarding CALL_INFO
  4. All Forwardable Callers Supported All … forms are supported (callers

    and callees) def call(...) recv(...) end call(a: 1, b: 2) call(**foo) call(1, 2) { } # etc
  5. All Forwarding Callers Supported Forwarding allows more parameters def call(...)

    recv("hello", ...) end def call2(...) x = [3, 4, 5] recv(*x, ...) end
  6. Stack Escape Works No GC Modi fi cations Required def

    recv(a, b) a + b end def call(...) lambda { |x| recv(x, ...) } end call(1).call(2) Stack self 1 CALL_INFO Escapes with lambda
  7. Benchmark Test calling in to a … method (positional parameters)

    def recv(a, b) a + b end def call(...) recv(...) end # def run # call(1, 2) # call(1, 2) # call(1, 2) # ... eval "def run; " + 200.times.map { "call(1, 2)" }.join("; ") + "; end" 200000.times do run end
  8. Benchmark Results (~ 2x faster) $ hyperfine 'fwd/miniruby -v test2.rb'

    'master/miniruby -v test2.rb' Benchmark 1: fwd/miniruby -v test2.rb Time (mean ± σ): 1.237 s ± 0.018 s [User: 1.233 s, System: 0.002 s] Range (min … max): 1.223 s … 1.286 s 10 runs Benchmark 2: master/miniruby -v test2.rb Time (mean ± σ): 2.791 s ± 0.010 s [User: 2.779 s, System: 0.008 s] Range (min … max): 2.770 s … 2.803 s 10 runs Summary fwd/miniruby -v test2.rb ran 2.26 ± 0.03 times faster than master/miniruby -v test2.rb
  9. Benchmark Test calling in to a … method (keyword parameters)

    def recv(a:, b:) a + b end def call(...) recv(...) end # def run # call(a: 1, b: 2) # call(a: 1, b: 2) # call(a: 1, b: 2) # call(a: 1, b: 2) # ... eval "def run; " + 200.times.map { "call(a: 1, b: 2)" }.join("; ") + "; end" 200000.times do run end
  10. Benchmark Results (~3x faster) Keyword Arguments $ hyperfine 'fwd/miniruby -v

    test2.rb' 'master/miniruby -v test2.rb' Benchmark 1: fwd/miniruby -v test2.rb Time (mean ± σ): 1.531 s ± 0.021 s [User: 1.527 s, System: 0.002 s] Range (min … max): 1.502 s … 1.577 s 10 runs Benchmark 2: master/miniruby -v test2.rb Time (mean ± σ): 4.863 s ± 0.021 s [User: 4.845 s, System: 0.011 s] Range (min … max): 4.846 s … 4.909 s 10 runs Summary fwd/miniruby -v test2.rb ran 3.18 ± 0.05 times faster than master/miniruby -v test2.rb
  11. Benchmark Inline cache misses class A def a; end end

    class B < A; end a = A.new b = B.new def call_method(obj) obj.a # never hits inline cache end # def run(a, b) # call_method(a) # call_method(b) # call_method(a) # call_method(b) # ... eval "def run(a, b); " + 200.times.map { "call_method(a); call_method(b)" }.join("; ") + "; end" 200000.times do run(a, b) end opt_send_without_block Never hits inline cache
  12. Benchmark results Inline cache misses $ hyperfine 'fwd/miniruby -v test.rb'

    'master/miniruby -v test.rb' Benchmark 1: fwd/miniruby -v test.rb Time (mean ± σ): 1.694 s ± 0.020 s [User: 1.690 s, System: 0.002 s] Range (min … max): 1.665 s … 1.719 s 10 runs Benchmark 2: master/miniruby -v test.rb Time (mean ± σ): 1.703 s ± 0.015 s [User: 1.698 s, System: 0.002 s] Range (min … max): 1.679 s … 1.723 s 10 runs Summary fwd/miniruby -v test.rb ran 1.00 ± 0.02 times faster than master/miniruby -v test.rb
  13. Benchmark Inline cache misses with block class A def a;

    end end class B < A; end a = A.new b = B.new def call_method(obj) obj.a { } # Always send instruction end # def run(a, b) # call_method(a) # call_method(b) # call_method(a) # call_method(b) # ... eval "def run(a, b); " + 200.times.map { "call_method(a); call_method(b)" }.join("; ") + "; end" 200000.times do run(a, b) end send Never hits inline cache
  14. Benchmark results Inline cache misses with block $ hyperfine 'fwd/miniruby

    -v test.rb' 'master/miniruby -v test.rb' Benchmark 1: fwd/miniruby -v test.rb Time (mean ± σ): 1.871 s ± 0.015 s [User: 1.866 s, System: 0.002 s] Range (min … max): 1.852 s … 1.898 s 10 runs Benchmark 2: master/miniruby -v test.rb Time (mean ± σ): 1.723 s ± 0.007 s [User: 1.719 s, System: 0.002 s] Range (min … max): 1.710 s … 1.734 s 10 runs Summary master/miniruby -v test.rb ran 1.09 ± 0.01 times faster than fwd/miniruby -v test.rb
  15. Benchmark Super calls (no cache) class A def a; end

    end class B < A; def a; super; end end b = B.new def call_method(obj) obj.a # Calls invoke_super end # def run(b) # call_method(b) # call_method(b) # ... eval "def run(b); " + 400.times.map { "call_method(b)" }.join("; ") + "; end" 200000.times do run(b) end invokesuper Never hits inline cache
  16. Benchmark results Super calls (no cache) $ hyperfine 'fwd/miniruby -v

    test.rb' 'master/miniruby -v test.rb' Benchmark 1: fwd/miniruby -v test.rb Time (mean ± σ): 2.553 s ± 0.127 s [User: 2.547 s, System: 0.002 s] Range (min … max): 2.397 s … 2.747 s 10 runs Benchmark 2: master/miniruby -v test.rb Time (mean ± σ): 2.240 s ± 0.055 s [User: 2.234 s, System: 0.002 s] Range (min … max): 2.170 s … 2.310 s 10 runs Summary master/miniruby -v test.rb ran 1.14 ± 0.06 times faster than fwd/miniruby -v test.rb
  17. Initialize with Keyword Args Benchmark class Foo def initialize(a:, b:)

    @a = a @b = a end end def call(a, b) Foo.new(a:, b:) end # def run # call(1, 2) # call(1, 2) # ... eval "def run; " + 200.times.map { "call(1, 2)" }.join("; ") + "; end" 200000.times do run end
  18. Benchmark results: 40% faster Initialize can be faster in Ruby

    $ hyperfine 'fwd/miniruby -v test2.rb' 'master/miniruby -v test2.rb' Benchmark 1: fwd/miniruby -v test2.rb Time (mean ± σ): 3.737 s ± 0.070 s [User: 3.724 s, System: 0.008 s] Range (min … max): 3.651 s … 3.816 s 10 runs Benchmark 2: master/miniruby -v test2.rb Time (mean ± σ): 5.276 s ± 0.028 s [User: 5.257 s, System: 0.012 s] Range (min … max): 5.235 s … 5.314 s 10 runs Summary fwd/miniruby -v test2.rb ran 1.41 ± 0.03 times faster than master/miniruby -v test2.rb