we redefine things as quickly as possible? • Which one is better: everything runs slowly, or 99% of codes run fast and redefinition takes, say, 1,000x more time? • → Introducing deoptimization.
as you can. If things get changed, discard the optimized bit and fallback to vanilla interpreter.” • A technique originally introduced on SELF (a Smalltalk variant), later applied to many other languages, notably JVM. • JRuby and Rubinius both have their own deoptimization engine, hence both run faster than the MRI.
• Just transform VM instruction sequences and let the VM execute them. • Furthermore we restrict to “patch” a sequence; we don’t either shrink nor grow. • Fill nops when needed. The nop instruction is expected to run adequately fast.
issues. • Program counter not affected by the operations. • Hence no need to scan the VM stack. • Saved vanilla sequence can be reused multiple times; the preparation is needed only once.
like method redefinitions, per-VM global timestamp counter is introduced. • This counter is an unsigned integer that is atomically- incremented when any of following activities happen: • Assignments to constants, • (Re-)definition of methods, • Inclusion of modules.
n, long m, VALUE konst) { VALUE *buf = (VALUE *)&pc[-n]; int len = n + m; memcpy(buf, wipeout_pattern, len * sizeof(VALUE)); buf[0] = putobject; buf[1] = konst; } “nop nop nop …”
considered “pure”. • A method is marked to be not pure if … • It writes to variables other than local ones. • It yields. • It is not written in Ruby. • It calls other methods that are not pure.
0 while i < x z = i % 2 == 0 ? 1 : -1 y += z / (2 * i + 1.0) i += 1 end return 4 * y end def m(x, y, z = ' ') n = y - x.length while n > 0 do n -= z.length x = z + x end return x end
not” is, in fact, an oversimplification. • There is a third state: indeterministic. • For instance, one cannot say if a method is pure or not when that method calls something inside, which is not defined, resulting a call to method_missing.
• Each method starts with its purity being not predicted. • While running the method we collect a method’s usage to detect its purity. • When a method’s purity is determined, that info propagates to its callers.
if (! cc->me) { return insn_is_unpredictable; /* method missing */ } else if (! (i = iseq_of_me(cc->me))) { return insn_is_not_pure; /* not written in ruby. */ } else if (! i->body->attributes) { /* Note, we do not recursively analyze. That can lead to infinite * recursion on mutually recursive calls and detecting that is too * expensive in this hot path.*/ return insn_is_unpredictable; } else { return purity_of_VALUE(RB_ISEQ_ANNOTATED_P(i, core::purity)); } }
discarded” are subject to eliminate. • Method calls just check the calling method’s purity; later if the immediately following instruction discards its return value, that preceding method call can be eliminated. • Actual elimination happens in pop, not in send.
b/insns.def @@ -711,9 +722,17 @@ DEFINE_INSN adjuststack (rb_num_t n) (...) (...) // inc -= n { DEC_SP(n); + /* If the immediately precedent instruction was send (or its + * variant), and here we are in adjuststack instruction, this + * means the return value of the method call is silently + * discarded. Then why not just avoid the whole method calling? + * This is possible when the callee method was marked pure. Note + * however that even on such case, evaluation of method arguments + * cannot be skipped, because they can have their own side + * effects. + */ + vm_eliminate_insn(GET_CFP(), GET_PC(), OPN_OF_CURRENT_INSN + 1, n); }
int n, rb_num_t m) { VALUE *buf = (VALUE *)&i->body->iseq_encoded[p->pc]; int len = p->len + n; int argc = p->argc + m; memcpy(buf, wipeout_pattern, len * sizeof(VALUE)); if (argc != 0) { buf[0] = adjuststack; buf[1] = argc; } ISEQ_RESET_ORIGINAL_ISEQ(i); FL_SET(i, ISEQ_NEEDS_ANALYZE); } “nop nop nop …” in case arguments have side effects
but never used later (write-only). • Only methods that are pure can be considered. • Methods with side effects might access bindings. • Blocks might share local variables so writeonly-ness should consider all reachable blocks.
as: • Subexpression elimination; • Variable liveness & escape analysis; • and more. • Allowing to modify program counter would make more rooms for further optimizations.
• Q: does this speed up Rails? • A: not really. • Q: does this work Ruby 3x3 out? • A: it depends (3x3 goal is vague), but I believe I’m on the right path.