end Ruby code putself send :bar, cache: nil leave ISeq Ruby VM Program Counter Call send(cache) { search_method(cache); CALL_METHOD(cache); } C code for instruction
end Ruby code putself send :bar, cache: nil leave ISeq Ruby VM Program Counter Call send(cache) { search_method(cache); CALL_METHOD(cache); } C code for instruction Ruby method push C method call attr_reader attr_writer . . . Which type will be called?
end Ruby code putself send :bar, cache: Ruby leave ISeq Ruby VM Program Counter Call send(cache) { search_method(cache); CALL_METHOD(cache); } C code for instruction Store C function pointer w/ class timestamp Ruby method push C method call attr_reader attr_writer . . .
end Ruby code putself send :bar, cache: Ruby leave Ruby VM Program Counter Call send(cache) { search_method(cache); CALL_METHOD(cache); } C code for instruction Ruby method push Dispatch it by calling function pointer (compiler can't optimize)
end Ruby code putself send :bar, cache: Ruby leave Ruby VM Program Counter Call send(cache) { search_method(cache); CALL_METHOD(cache); } C code for instruction Ruby method push Ruby method push In JIT, we can inline this operation by checking cache in ISeq ISeq
we can bypass method dispatch and inline the C function to push Ruby method frame If it's inlined, C compiler can apply various optimizations to Ruby method call, which is known as slow Optcarrot: 53.84fps -> 57.52fps
if recv.is_a?(Array) { fast_Array#[](recv, key); } else if recv.is_a?(Hash) { fast_Hash#[](recv, key); } else { dispatch(recv, #[], key); } } def show params[:id] end ActionController::Parameters#[] These checks are NOT needed for classes other than Array, Hash
#[] for Array/Hash, but it’s suboptimal for other classes JIT removes the guard for Array/Hash by seeing call cache, and also inlines pushing a method frame The same optimization can be applied to other methods later
limited definitions of C functions in Ruby core I inlined a part of Array#[] definition, and then C compiler could optimize the code Optcarrot: 54.93fps -> 58.41fps
three 1 + 2 end Ruby code ISeq putobject 1 putobject 2 opt_plus leave Ruby VM Program Counter Stack Pointer VM stack 3 How to skip the stack pointer motion in JIT?
err # JIT-ed raise 'error' end def three # JIT-ed 1 + (err rescue 2) end Ruby code main() Call stack in C ruby_vm() (setjmp called) jit_three() stack[nil, nil] in jit_three() VM stack empty
err # JIT-ed raise 'error' end def three # JIT-ed 1 + (err rescue 2) end Ruby code main() Call stack in C ruby_vm() (setjmp called) jit_three() stack[1, nil] in jit_three() Push 1 to array local variable VM stack empty
err # JIT-ed raise 'error' end def three # JIT-ed 1 + (err rescue 2) end Ruby code main() Call stack in C ruby_vm() (setjmp called) jit_three() stack[1, nil] in jit_three() jit_err() VM stack empty
err # JIT-ed raise 'error' end def three # JIT-ed 1 + (err rescue 2) end Ruby code main() Call stack in C ruby_vm() (setjmp called) jit_three() stack[1, nil] in jit_three() jit_err() rb_raise() (call longjmp) VM stack empty 2 VM Stack doesn't have 2 values => SEGV 1 is expired
"catch table" (rescue, ensure, etc.) does not exist, we don't need to resurrect stack values on exception So we can use just C local variables to reproduce the stack of Ruby VM only when catch table does not exist Stack pointer is not moved and compiler can inline values Optcarrot: 57.13fps -> 62.14fps
JIT-ed code is directly called from VM (no C function frames are created yet), we don’t need to call setjmp again Now yield is 1.3x faster than a non-JIT-ed case
end def three 1 + (err rescue 2) end Ruby code Ruby call stack #three #err Program Counter Program Counter #raise Program Counter Program counter is used to resurrect the position after longjmp
threshold of function size to be inlined Some Ruby's instructions (+, -, *, /, ...) are too large to be inlined by default, so I applied an "always inline" attribute In the future, we should reduce the size of code instead Optcarrot: 60.19fps -> 64.92fps
asserting the size of stack when returning from a method, and it's slow We can skip it on JIT because it's already checked by VM Optcarrot: 67.43fps -> 69.92fps
the same way as Ruby method, we can fully inline attr_reader without large compilation time The cost becomes the same as reference to normal instance variables Calling attr_reader is made 4x faster
it's useless if it's slow Generated code is much simplified and made fast by removing program counter and stack pointer motions But it still has some complexity and it blocks significant performance improvement by Ruby method inlining
cancel all JIT-ed calls by longjmp when something unexpected happens I’m going to remove guard for TracePoint and cancel it later It should also be used when all method caches are purged
generated for both Hash#[] and Array#[] We need some instrumentation to detect the type which is passed to an optimized instruction Vladimir's RTL instruction achieves this by dynamic modification of instruction
JIT Depending on how frequently it's called, it may be better to balance compilation time and optimization level Vladimir is working on light JIT compilation Sometimes people deploy an application every 10 minutes
compiled code and generate faster code using the profiling result Using the multi-tier JIT, we may be able to profile code in the first tier and generate faster code in the second tier
becomes slower only during JIT compilation happens The possible cause might be the number of methods to be JIT-ed, compared to some other benchmarks Possibly we should reduce the number of methods to be JIT-ed or reduce frequency of JIT compilation
implement Ruby method inlining, but it increases compilation time I have ideas to implement C method inlining, but which method to be inlined should be solved first