@shyouhei • Long time ruby-core committer since 1.8 era. • Maintained ruby 1.8.5-7 (EOL-ed). • Made ruby/ruby repo mirror @ GitHub. • Now a full-time ruby dev @ Money Forward, Inc.
Not optimized == disasm: #@>================================ 0000 putobject_OP_INT2FIX_O_1_C_ ( 1) 0001 putobject 2 0003 opt_plus , 0006 leave This is how `1 + 2` is evaluated
But redefinition rarely happens • Redefinitions must work but, should we redefine things as quickly as possible? • Which one is better: everything runs slowly, or 99% of codes run fast and redefinition takes, say, 1,000x more time? • → Introducing deoptimization.
Deoptimization • “Just forget about redefinitions and go as far as you can. If things get changed, discard the optimized bit and fallback to vanilla interpreter.” • A technique originally introduced on SELF (a Smalltalk variant), later applied to many other languages, notably JVM. • JRuby and Rubinius both have their own deoptimization engine, hence both run faster than the MRI.
Our strategy • No JIT compile to machine native codes. • Just transform VM instruction sequences and let the VM execute them. • Furthermore we restrict to “patch” a sequence; we don’t either shrink nor grow. • Fill nops when needed. The nop instruction is expected to run adequately fast.
What is good • Done in Pure C. No portability issues. • Program counter not affected by the operations. • Hence no need to scan the VM stack. • Saved vanilla sequence can be reused multiple times; the preparation is needed only once.
The VM timestamp • In order to detect evil activities like method redefinitions, per-VM global timestamp counter is introduced. • This counter is an unsigned integer that is atomically- incremented when any of following activities happen: • Assignments to constants, • (Re-)definition of methods, • Inclusion of modules.
Almost no overheads class C def method_missing mid end end obj = C.new i = 0 while i<6_000_000 # benchmark loop 2 i += 1 obj.m; obj.m; obj.m; obj.m; obj.m; obj.m; obj.m; obj.m; end USVOL PVST UJNF 0 1 2 3 2.441 2.412 vm2_method_missing*
Deoptimization • We made a deoptimization engine of ruby. • Its main characteristics include consistency of VM states such as program counter. • Very lightweight.
Folding constants • Constants are already inline-cached. • Just replace the getinlinecache in question with putobject, and fill the rest of sequence with nop.
void iseq_const_fold( const rb_iseq_t *restrict i, const VALUE *pc, int n, long m, VALUE konst) { VALUE *buf = (VALUE *)&pc[-n]; int len = n + m; memcpy(buf, wipeout_pattern, len * sizeof(VALUE)); buf[0] = putobject; buf[1] = konst; } “nop nop nop …”
Method purity • A method eligible to be skipped is considered “pure”. • A method is marked to be not pure if … • It writes to variables other than local ones. • It yields. • It is not written in Ruby. • It calls other methods that are not pure.
Methods that are not pure def m Time.now end def m @foo = self end def m yield end def m { foo: :bar } end rb_define_method(rb_cTCPServer, "sysaccept", tcp_sysaccept, 0);
Methods that are pure def m(x) y = i = 0 while i < x z = i % 2 == 0 ? 1 : -1 y += z / (2 * i + 1.0) i += 1 end return 4 * y end def m(x, y, z = ' ') n = y - x.length while n > 0 do n -= z.length x = z + x end return x end
Method purity • “A method is either pure (optimizable) or not” is, in fact, an oversimplification. • There is a third state: indeterministic. • For instance, one cannot say if a method is pure or not when that method calls something inside, which is not defined, resulting a call to method_missing.
Method purity • So a method’s purity is determined on-the-fly. • Each method starts with its purity being not predicted. • While running the method we collect a method’s usage to detect its purity. • When a method’s purity is determined, that info propagates to its callers.
enum insn_purity purity_of_cc(const struct rb_call_cache *cc) { const rb_iseq_t *i; if (! cc->me) { return insn_is_unpredictable; /* method missing */ } else if (! (i = iseq_of_me(cc->me))) { return insn_is_not_pure; /* not written in ruby. */ } else if (! i->body->attributes) { /* Note, we do not recursively analyze. That can lead to infinite * recursion on mutually recursive calls and detecting that is too * expensive in this hot path.*/ return insn_is_unpredictable; } else { return purity_of_VALUE(RB_ISEQ_ANNOTATED_P(i, core::purity)); } }
Eliminating send-ish instructions • “Method calls whose return values are discarded” are subject to eliminate. • Method calls just check the calling method’s purity; later if the immediately following instruction discards its return value, that preceding method call can be eliminated. • Actual elimination happens in pop, not in send.
diff --git a/insns.def b/insns.def index c9d7204..2b877ff 100644 --- a/insns.def +++ b/insns.def @@ -711,9 +722,17 @@ DEFINE_INSN adjuststack (rb_num_t n) (...) (...) // inc -= n { DEC_SP(n); + /* If the immediately precedent instruction was send (or its + * variant), and here we are in adjuststack instruction, this + * means the return value of the method call is silently + * discarded. Then why not just avoid the whole method calling? + * This is possible when the callee method was marked pure. Note + * however that even on such case, evaluation of method arguments + * cannot be skipped, because they can have their own side + * effects. + */ + vm_eliminate_insn(GET_CFP(), GET_PC(), OPN_OF_CURRENT_INSN + 1, n); }
void iseq_eliminate_insn( const rb_iseq_t *restrict i, struct cfp_last_insn *restrict p, int n, rb_num_t m) { VALUE *buf = (VALUE *)&i->body->iseq_encoded[p->pc]; int len = p->len + n; int argc = p->argc + m; memcpy(buf, wipeout_pattern, len * sizeof(VALUE)); if (argc != 0) { buf[0] = adjuststack; buf[1] = argc; } ISEQ_RESET_ORIGINAL_ISEQ(i); FL_SET(i, ISEQ_NEEDS_ANALYZE); } “nop nop nop …” in case arguments have side effects
Elimination of variables • We eliminate variables that are assigned, but never used later (write-only). • Only methods that are pure can be considered. • Methods with side effects might access bindings. • Blocks might share local variables so writeonly-ness should consider all reachable blocks.
Elimination of variables • There might also be other kinds of variables that are safe to be eliminated, but detection of such variables is very difficult to do precisely on-the-fly.
Optimizations • Fairly basic optimizations are implemented. • All optimizations run on-the-fly, preserve VM states such as exception tables. • There are rooms for other optimization techniques, like subexpression eliminations.
Benchmarks • CAUTION: YMMV • `make benchmark` results on my machine. • Not a brand-new box; its /proc/cpuinfo says “Intel(R) Core(TM)2 Duo CPU T7700”. • Following results show average of 7 executions.
Benchmarks • Most benchmarks show same performance. • The optimizations work drastically for several benchmarks. • There do exist cases of slowdowns, but IMHO marginal amount of overheads.
Future works • Other optimizations can be thought of, such as: • Subexpression elimination; • Variable liveness & escape analysis; • and more. • Allowing to modify program counter would make more rooms for further optimizations.
FAQs • Q: where is the patch? • A: https://github.com/ruby/ruby/pull/1419 • Q: does this speed up Rails? • A: not really. • Q: does this work Ruby 3x3 out? • A: it depends (3x3 goal is vague), but I believe I’m on the right path.