sequence we focus in this talk is a pattern that appears very frequently in a Ruby program. • We propose automatic detection of them, and let the inter- preter optimise that part. • This optimisation enhances some benchmark results, including Rails.
its return value. • Note that every method in Ruby has return value(s). • The value(s) returned however do not have to be used. • Even when a method does not expect its caller to take return values, it has to return something “just in case” the expectation breaks. • Waste of both time and memory.
eliminate return values. • In the wild, there already are methods written in C. • They cannot be modified, and they already return something. • The best we can do is to allow methods to return arbitrary values when they are not used by their callers. • Let each methods decide what to return. We can auto-optimise pure-ruby methods later.
pure-Ruby methods check that flag automatically, so that they can skip rearmost instructions. • For instance when we have: def foo(x) y = bar(x) return y end
called + * with VM_FRAME_FLAG_POPPED. With that flag on, it immediately + * leaves the current stack frame with scratching the topmost n stack + * values. The return value of the iseq for that case is always + * nil. */ +DEFINE_INSN +opt_bailout +(rb_num_t n) +() +() +{ +#ifdef MJIT_HEADER + /* :FIXME: don't know how to make it work with JIT... */ +#else + if (VM_ENV_FLAGS(GET_EP(), VM_FRAME_FLAG_POPPED) && + CURRENT_INSN_IS(opt_bailout) /* <- rule out trace instruction */ ) { + POPN(n); + PUSH(Qnil); + DISPATCH_ORIGINAL_INSN(leave); + } + #endif +} + /**********************************************************/ /* deal with control flow 3: exception */ /**********************************************************/
be skipped when a return value is not used? • Obviously not everything are. • That concept should be identical to what we call “pure” operations, proposed in RubyKaigi 2016.
thinking a method being entirely pure or not, we are gong to focus on each method’s rearmost part that are pure. • Such part, if any, makes no sense when the return value of the method is discarded.
a part of a C method. • But the `VM_FRAME_FLAG_POPPED` flag is always set, no matter the called method is in Ruby or not. • Why not make it visible from C, so that future methods can look at it.
pointer, then returns the matched string. The “matched string” can be omitted by leveraging the flag. • Exact same discussion applies to `String#slice!`
<callinfo!mid:something, argc:0, 0005 pop 0006 putself 0007 send <callinfo!mid:something_another, 0011 pop 0012 putself 0013 send <callinfo!mid:something_else, ar 0017 leave Would like to eliminate those `pop`s
<callinfo!mid:something, argc:0, 0005 putself 0006 send <callinfo!mid:something_another, 0010 putself 0011 send <callinfo!mid:something_else, ar 0015 leave Would like to eliminate those `pop`s
This flag denotes that the pop instruction in the caller was optimised out from the sequence. • Hence when the flag is set, it is the callee’s duty to properly skip pushing return value(s), not its caller’s.
consider 3 (!) distinct situations. • Returning from a method written in C. • Returning from a method written in Ruby. • Returning from inside of a block.
b/insns.def @@ -927,7 +911,7 @@ DEFINE_INSN leave () (VALUE val) -(VALUE val) +(...) /* This is super surprising but when leaving from a frame, we check * for interrupts. If any, that should be executed on top of the * current execution context. This is a method call. */ @@ -939,7 +923,10 @@ leave // attr enum rb_insn_purity purity = rb_insn_is_pure; /* And this instruction handles SP by nature. */ // attr bool handles_sp = true; +// attr rb_snum_t sp_inc = 0; { + bool popit = VM_ENV_FLAGS(GET_EP(), VM_FRAME_FLAG_POPIT); + if (OPT_CHECKED_RUN) { const VALUE *const bp = vm_base_ptr(reg_cfp); if (reg_cfp->sp != bp) { @@ -959,6 +946,9 @@ leave } else { RESTORE_REGS(); + if (! popit) { + PUSH(val); + } } }
the execution to proceed. 2. Rewind the stack. 3. Push the return value onto the stack. 4. Continue executing. This has to be cancelled, however: The flag has been squashed already
are done on this exact machine I am projecting this presentation: 6th gen. ThinkPad X1 Carbon. • They all compare trunk (2.7.0 revision 67168), versus ours (the proposed patch applied against trunk).
almost the same. Either slower or faster, they differ very faintly. • There are a few notable benchmark instances where our proposal clearly outperforms the trunk. • On the other hand it seems no instance shows clear slowdown for our proposal. • This tendency is roughly the same as we saw in 2016.
historically been considered as a benchmark that reflects real-word use-case. • Also did we test mame/optcarrot, which was made for benchmarking various ruby implementations.
the same. • Rdoc got slower, opcarrot got faster. We see these results consistently. • There might be reasons behind them but, … well, isn’t it enough to say that we see no significant changes?
we tested our changeset against it. • The benchmark is basically a series of `ab(1)`. • Discourse is a field-proven real-world Rails application. The benchmark shows how the proposed changeset behaves in the wild. • OTOH this is the greatest LOC among other benchmarks.
method if its return value is used or not. Unused return values are then optimised out from the VM’s value stack. • Our proposal sacrifices process bootup time to yield better runtime performance. • Not only small benchmarks, but also Rails applications can benefit from it.
scenario but turned out it slows things down. • Overheads added by asking “is this a tail call?” every time a method is called turned out to be too heavy. • Some static analysis could be possible. That might reroute the runtime overheads.
applied to our core classes, to gain bonus speed-ups. • Values of blocks called from C (which we still cannot say if they are used or not) should also be considered.