Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The send-pop optimisation

The send-pop optimisation

Urabe Shyouhei

April 20, 2019
Tweet

More Decks by Urabe Shyouhei

Other Decks in Technology

Transcript

  1. In a nutshell, this talk is about… • The “send-pop”

    sequence we focus in this talk is a pattern that appears very frequently in a Ruby program. • We propose automatic detection of them, and let the inter- preter optimise that part. • This optimisation enhances some benchmark results, including Rails.
  2. == disasm: #<ISeq:foo@<compiled>:1 (1,0)-(5,3)> (catch: FALSE) 0000 putself 0001 send

    <callinfo!mid:something, argc:0, 0005 pop 0006 putself 0007 send <callinfo!mid:something_another, 0011 pop 0012 putself 0013 send <callinfo!mid:something_else, ar 0017 leave
  3. The “send-pop” sequence • Calling a method, then immediately discarding

    its return value. • Note that every method in Ruby has return value(s). • The value(s) returned however do not have to be used. • Even when a method does not expect its caller to take return values, it has to return something “just in case” the expectation breaks. • Waste of both time and memory.
  4. % LANG=C sort 2gram.txt | uniq -c | sort -nr

    | head -n 10 69065813 getinstancevariable -> getinstancevariable 65600442 putself -> getinstancevariable 59624140 getinstancevariable -> branchunless 59116388 branchunless -> getinstancevariable 52828407 leave -> pop 50434175 getinstancevariable -> putobject 30368815 pop -> putself 27717161 setinstancevariable -> getinstancevariable 25661090 branchunless -> putself 25165032 getinstancevariable -> branchif
  5. But how often? • By taking 2-grams of a mame/optcarrot

    execution, the sequence in question is #5 most frequent. • This is definitely worth consideration.
  6. First step: allow arbitrary return values • We cannot entirely

    eliminate return values. • In the wild, there already are methods written in C. • They cannot be modified, and they already return something. • The best we can do is to allow methods to return arbitrary values when they are not used by their callers. • Let each methods decide what to return. We can auto-optimise pure-ruby methods later.
  7. Pass 1-bit flag to each method • Every time a

    method is called, some flags are passed to it already. • Why not add another one who describes the usage of its return value(s).
  8. diff --git a/vm_core.h b/vm_core.h index 574837dea0..513b8b85c1 100644 --- a/vm_core.h +++

    b/vm_core.h @@ -1132,11 +1133,11 @@ typedef rb_control_frame_t * enum { /* Frame/Environment flag bits: - * MMMM MMMM MMMM MMMM ____ __FF FFFF EEEX (LSB) + * MMMM MMMM MMMM MMMM ____ _FFF FFFF EEEX (LSB) * * X : tag for GC marking (It seems as Fixnum) * EEE : 3 bits Env flags - * FF..: 6 bits Frame flags + * FF..: 7 bits Frame flags * MM..: 15 bits frame magic (to check frame corruption) */ @@ -1160,6 +1161,7 @@ enum { VM_FRAME_FLAG_CFRAME = 0x0080, VM_FRAME_FLAG_LAMBDA = 0x0100, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM = 0x0200, + VM_FRAME_FLAG_POPPED = 0x0400, /* env flag */ VM_ENV_FLAG_LOCAL = 0x0002,
  9. diff --git a/vm_insnhelper.c b/vm_insnhelper.c index a2f7433029..b024b29fc6 100644 --- a/vm_insnhelper.c +++

    b/vm_insnhelper.c @@ -1767,12 +1767,13 @@ static inline VALUE vm_call_iseq_setup_normal(rb_execution_context_t *ec, rb_control_frame_t *cfp, struct rb_calling_in int opt_pc, int param_size, int local_size) { + int popped = calling->popped; const rb_iseq_t *iseq = def_iseq_ptr(me->def); VALUE *argv = cfp->sp - calling->argc; VALUE *sp = argv + param_size; cfp->sp = argv - 1 /* recv */; - vm_push_frame(ec, iseq, VM_FRAME_MAGIC_METHOD | VM_ENV_FLAG_LOCAL, calling->recv, + vm_push_frame(ec, iseq, VM_FRAME_MAGIC_METHOD | VM_ENV_FLAG_LOCAL | popped, calling->recv, calling->block_handler, (VALUE)me, iseq->body->iseq_encoded + opt_pc, sp, local_size - param_size, @@ -1791,6 +1792,7 @@ vm_call_iseq_setup_tailcall(rb_execution_context_t *ec, rb_control_frame_t *cf VALUE *src_argv = argv; VALUE *sp_orig, *sp; VALUE finish_flag = VM_FRAME_FINISHED_P(cfp) ? VM_FRAME_FLAG_FINISH : 0; + unsigned long popped = VM_ENV_FLAGS(cfp->ep, VM_FRAME_FLAG_POPPED); if (VM_BH_FROM_CFP_P(calling->block_handler, cfp)) { struct rb_captured_block *dst_captured = VM_CFP_TO_CAPTURED_BLOCK(RUBY_VM_PREVIOUS_CONTROL_ @@ -1818,7 +1820,7 @@ vm_call_iseq_setup_tailcall(rb_execution_context_t *ec, rb_control_frame_t *cf *sp++ = src_argv[i];
  10. Let pure-Ruby methods check that flag • We can make

    pure-Ruby methods check that flag automatically, so that they can skip rearmost instructions. • For instance when we have: def foo(x) y = bar(x) return y end
  11. == disasm: #<ISeq:foo@<compiled>:1 (1,2)-(4,5)> (catch: FALSE) local table (size: 2,

    argc: 1 [opts: 0, rest: -1, post: 0, block: [ 2] x@0<Arg> [ 1] y@1 0000 putself 0001 getlocal x@0, 0 0004 send <callinfo!mid:bar, argc:1, FCALL 0008 setlocal y@1, 0 0011 getlocal y@1, 0 0014 leave Waste of time if the value returned is not used
  12. == disasm: #<ISeq:foo@<compiled>:1 (1,2)-(4,5)> (catch: FALSE) local table (size: 2,

    argc: 1 [opts: 0, rest: -1, post: 0, block: [ 2] x@0<Arg> [ 1] y@1 0000 putself 0001 getlocal x@0, 0 0004 send <callinfo!mid:bar, argc:1, FCALL 0008 opt_bailout 1 0010 setlocal y@1, 0 0013 getlocal y@1, 0 ( 3) 0016 leave
  13. +/* This instruction is no-op unless the instruction sequence is

    called + * with VM_FRAME_FLAG_POPPED. With that flag on, it immediately + * leaves the current stack frame with scratching the topmost n stack + * values. The return value of the iseq for that case is always + * nil. */ +DEFINE_INSN +opt_bailout +(rb_num_t n) +() +() +{ +#ifdef MJIT_HEADER + /* :FIXME: don't know how to make it work with JIT... */ +#else + if (VM_ENV_FLAGS(GET_EP(), VM_FRAME_FLAG_POPPED) && + CURRENT_INSN_IS(opt_bailout) /* <- rule out trace instruction */ ) { + POPN(n); + PUSH(Qnil); + DISPATCH_ORIGINAL_INSN(leave); + } + #endif +} + /**********************************************************/ /* deal with control flow 3: exception */ /**********************************************************/
  14. Make the insertion automatic • What operations are safe to

    be skipped when a return value is not used? • Obviously not everything are. • That concept should be identical to what we call “pure” operations, proposed in RubyKaigi 2016.
  15. Automatic bail out of a method • In stead of

    thinking a method being entirely pure or not, we are gong to focus on each method’s rearmost part that are pure. • Such part, if any, makes no sense when the return value of the method is discarded.
  16. == disasm: #<ISeq:foo@<compiled>:1 (1,2)-(4,5)> (catch: FALSE) local table (size: 2,

    argc: 1 [opts: 0, rest: -1, post: 0, block: [ 2] x@0<Arg> [ 1] y@1 0000 putself 0001 getlocal x@0, 0 0004 send <callinfo!mid:bar, argc:1, FCALL 0008 setlocal y@1, 0 0011 getlocal y@1, 0 0014 leave pure pure not pure pure pure pure
  17. == disasm: #<ISeq:foo@<compiled>:1 (1,2)-(4,5)> (catch: FALSE) local table (size: 2,

    argc: 1 [opts: 0, rest: -1, post: 0, block: [ 2] x@0<Arg> [ 1] y@1 0000 putself 0001 getlocal x@0, 0 0004 send <callinfo!mid:bar, argc:1, FCALL 0008 opt_bailout 1 0010 setlocal y@1, 0 0013 getlocal y@1, 0 ( 3) 0016 leave
  18. Can we also optimize C methods? • We cannot auto-skip

    a part of a C method. • But the `VM_FRAME_FLAG_POPPED` flag is always set, no matter the called method is in Ruby or not. • Why not make it visible from C, so that future methods can look at it.
  19. diff --git a/vm.c b/vm.c index c5beed64c0..d33ff98619 100644 --- a/vm.c +++

    b/vm.c @@ -3544,4 +3544,14 @@ vm_collect_usage_register(int reg, int isset) #endif /* #ifndef MJIT_HEADER */ +int +rb_whether_the_return_value_is_used_p(void) +{ + const struct rb_execution_context_struct *ec = GET_EC(); + const struct rb_control_frame_struct *reg_cfp = ec->cfp; + const VALUE *ep = GET_EP(); + + return ! VM_ENV_FLAGS(ep, VM_FRAME_FLAG_POPPED); +} + #include "vm_call_iseq_optimized.inc" /* required from vm_insnhelper.c */
  20. Practical applications • `StringScanner#scan` scans the receiver, advances its internal

    pointer, then returns the matched string. The “matched string” can be omitted by leveraging the flag. • Exact same discussion applies to `String#slice!`
  21. == disasm: #<ISeq:foo@<compiled>:1 (1,0)-(5,3)> (catch: FALSE) 0000 putself 0001 send

    <callinfo!mid:something, argc:0, 0005 pop 0006 putself 0007 send <callinfo!mid:something_another, 0011 pop 0012 putself 0013 send <callinfo!mid:something_else, ar 0017 leave Would like to eliminate those `pop`s
  22. == disasm: #<ISeq:foo@<compiled>:1 (1,0)-(5,3)> (catch: FALSE) 0000 putself 0001 send

    <callinfo!mid:something, argc:0, 0005 putself 0006 send <callinfo!mid:something_another, 0010 putself 0011 send <callinfo!mid:something_else, ar 0015 leave Would like to eliminate those `pop`s
  23. Note however, that: • The elimination is not always possible.

    • That `pop` can be a jump destination. • For an (illustrative) example: def foo self &. x nil end
  24. == disasm: #<ISeq:foo@<compiled>:1 (1,0)-(4,3)> (catch: FALSE) 0000 putself 0001 dup

    0002 branchnil 7 0004 opt_send_without_block <callinfo!mid:x, argc:0, ARG 0007 pop 0008 putnil 0009 leave This `pop` is not optimizable.
  25. Let us add another frame flag • Called `VM_FRAME_FLAG_POPIT`. •

    This flag denotes that the pop instruction in the caller was optimised out from the sequence. • Hence when the flag is set, it is the callee’s duty to properly skip pushing return value(s), not its caller’s.
  26. diff –git a/vm_core.h b/vm_core.h index 0b3f3e06ba..932c70a734 100644 --- a/vm_core.h +++

    b/vm_core.h @@ -1134,11 +1136,11 @@ typedef rb_control_frame_t * enum { /* Frame/Environment flag bits: - * MMMM MMMM MMMM MMMM ____ _FFF FFFF EEEX (LSB) + * MMMM MMMM MMMM MMMM ____ FFFF FFFF EEEX (LSB) * * X : tag for GC marking (It seems as Fixnum) * EEE : 3 bits Env flags - * FF..: 7 bits Frame flags + * FF..: 8 bits Frame flags * MM..: 15 bits frame magic (to check frame corruption) */ @@ -1163,6 +1165,7 @@ enum { VM_FRAME_FLAG_LAMBDA = 0x0100, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM = 0x0200, VM_FRAME_FLAG_POPPED = 0x0400, + VM_FRAME_FLAG_POPIT = 0x0800, /* env flag */ VM_ENV_FLAG_LOCAL = 0x0002,
  27. == disasm: #<ISeq:foo@<compiled>:1 (1,0)-(5,3)> (catch: FALSE) 0000 putself 0001 send

    <callinfo!mid:something, argc:0, 0005 pop 0006 putself 0007 send <callinfo!mid:something_another, 0011 pop 0012 putself 0013 send <callinfo!mid:something_else, ar 0017 leave
  28. == disasm: #<ISeq:foo@<compiled>:1 (1,0)-(5,3)> (catch: FALSE) 0000 putself 0001 send

    <callinfo!mid:something, argc:0, 0005 putself 0006 send <callinfo!mid:something_another, 0010 putself 0011 send <callinfo!mid:something_else, ar 0015 leave
  29. == disasm: #<ISeq:foo@<compiled>:1 (1,0)-(5,3)> (catch: FALSE) 0000 putself ( 2)[LiCa]

    0001 send <callinfo!mid:something, argc:0, FCALL|VCALL|ARGS_SIMPLE, [POPIT]>, <callcache>, nil 0005 putself ( 3)[Li] 0006 send <callinfo!mid:something_another, argc:0, FCALL|VCALL|ARGS_SIMPLE, [POPIT]>, <callcache>, nil 0010 putself ( 4)[Li] 0011 send <callinfo!mid:something_else, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>, nil 0015 leave ( 5)[Re]
  30. In order to properly avoid pushing… • We have to

    consider 3 (!) distinct situations. • Returning from a method written in C. • Returning from a method written in Ruby. • Returning from inside of a block.
  31. C method return values • C methods return values using

    C’s return semantics. Just discarding them should suffice. VALUE foo(VALUE x) { VALUE y = complex_calculation(x); return y; }
  32. diff --git a/tool/ruby_vm/views/_insn_entry.erb b/tool/ruby_vm/views/_insn_entry.erb index cdadd93abc..bbfe539fd2 100644 --- a/tool/ruby_vm/views/_insn_entry.erb +++

    b/tool/ruby_vm/views/_insn_entry.erb @@ -56,7 +58,18 @@ INSN_ENTRY(<%= insn.name %>) /* ### Instruction trailers. ### */ CHECK_VM_STACK_OVERFLOW_FOR_INSN(VM_REG_CFP, INSN_ATTR(retn)); <%= insn.handle_canary "CHECK_CANARY()" -%> -% if insn.handles_sp? +% if insn.sendish? # Then we can safely assume there is only one return value. +% if insn.handles_sp? + if (! (ci->compiled_frame_bits & VM_FRAME_FLAG_POPIT)) { + PUSH(<%= insn.cast_to_VALUE insn.rets.first %>); + } +% else + INC_SP(INSN_ATTR(sp_inc)); + if (! (ci->compiled_frame_bits & VM_FRAME_FLAG_POPIT)) { + TOPN(0) = <%= insn.cast_to_VALUE insn.rets.first %>; + } +% end +% elsif insn.handles_sp? % insn.rets.reverse_each do |ret| PUSH(<%= insn.cast_to_VALUE ret %>); % end
  33. Ruby method return values • Ruby methods (normally) return values

    using `leave` instruction. def foo(x) return x + 1 end
  34. == disasm: #<ISeq:foo@<compiled>:1 (1,0)-(3,3)> (catch: FALSE) local table (size: 1,

    argc: 1 [opts: 0, rest: -1, post: 0, block: [ 1] x@0<Arg> 0000 getlocal x@0, 0 0003 putobject 1 0005 send <callinfo!mid:+, argc:1, ARG 0009 leave
  35. diff --git a/insns.def b/insns.def index a38dc30168..68e7eabfae 100644 --- a/insns.def +++

    b/insns.def @@ -927,7 +911,7 @@ DEFINE_INSN leave () (VALUE val) -(VALUE val) +(...) /* This is super surprising but when leaving from a frame, we check * for interrupts. If any, that should be executed on top of the * current execution context. This is a method call. */ @@ -939,7 +923,10 @@ leave // attr enum rb_insn_purity purity = rb_insn_is_pure; /* And this instruction handles SP by nature. */ // attr bool handles_sp = true; +// attr rb_snum_t sp_inc = 0; { + bool popit = VM_ENV_FLAGS(GET_EP(), VM_FRAME_FLAG_POPIT); + if (OPT_CHECKED_RUN) { const VALUE *const bp = vm_base_ptr(reg_cfp); if (reg_cfp->sp != bp) { @@ -959,6 +946,9 @@ leave } else { RESTORE_REGS(); + if (! popit) { + PUSH(val); + } } }
  36. So far so good… but, • It immediately gets complicated

    when a block has a return statement. def foo(x) x.times do | i | return i end end p foo(42) # => 0
  37. So far so good… but, • It immediately gets complicated

    when a block has a return statement. def foo(x) x.times &-> (i) do return i end end p foo(42) # => 42
  38. What “return-inside-of-a-block” does: 1. Look for the exact place where

    the execution to proceed. 2. Rewind the stack. 3. Push the return value onto the stack. 4. Continue executing. This has to be cancelled, however: The flag has been squashed already
  39. diff --git a/vm.c b/vm.c index 807a20ee5a..057863e5e3 100644 --- a/vm.c +++

    b/vm.c @@ -1926,6 +1926,7 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state, VALUE errinfo, VALUE *initial) { struct vm_throw_data *err = (struct vm_throw_data *)errinfo; + bool popit = false; for (;;) { unsigned int i; @@ -1950,6 +1951,7 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state, rb_vm_frame_method_entry(ec->cfp)->owner, rb_vm_frame_method_entry(ec->cfp)->def->original_id); } + popit = VM_ENV_FLAGS(ec->cfp->ep, VM_FRAME_FLAG_POPIT); rb_vm_pop_frame(ec); } @@ -1983,6 +1985,7 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state, ec->errinfo = Qnil; THROW_DATA_CATCH_FRAME_SET(err, cfp + 1); hook_before_rewind(ec, ec->cfp, TRUE, state, err); + popit = VM_ENV_FLAGS(ec->cfp->ep, VM_FRAME_FLAG_POPIT); rb_vm_pop_frame(ec); return THROW_DATA_VAL(err); } @@ -1994,7 +1997,9 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state, #if OPT_STACK_CACHING *initial = THROW_DATA_VAL(err); #else - *ec->cfp->sp++ = THROW_DATA_VAL(err); + if (! popit) { + *ec->cfp->sp++ = THROW_DATA_VAL(err); + } #endif ec->errinfo = Qnil; return Qundef; @@ -2128,12 +2133,14 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state, hook_before_rewind(ec, ec->cfp, FALSE, state, err); if (VM_FRAME_FINISHED_P(ec->cfp)) { + popit = VM_ENV_FLAGS(ec->cfp->ep, VM_FRAME_FLAG_POPIT); rb_vm_pop_frame(ec); ec->errinfo = (VALUE)err; ec->tag = ec->tag->prev; EC_JUMP_TAG(ec, state); } else { + popit = VM_ENV_FLAGS(ec->cfp->ep, VM_FRAME_FLAG_POPIT); rb_vm_pop_frame(ec); } }
  40. Several benchmarks were exercised. • Caution: YMMV • All benchmarks

    are done on this exact machine I am projecting this presentation: 6th gen. ThinkPad X1 Carbon. • They all compare trunk (2.7.0 revision 67168), versus ours (the proposed patch applied against trunk).
  41. The `make benchmark` results • This set of benchmarks are

    considered micro: consist of many small ruby scripts. They tend to shed some lights on each specific parts of the VM.
  42. 0.000 1.000 so_ackermann so_array so_binary_trees so_concatenate so_exception so_fannkuch so_fasta so_lists

    so_mandelbrot so_matrix so_meteor_contest so_nbody so_nested_loop so_nsieve so_nsieve_bits so_object so_partial_sums so_pidigits so_random so_sieve so_spectralnorm Speedup ratio versus trunk (greater = faster) faster
  43. The `make benchmark` results • Majority of the results are

    almost the same. Either slower or faster, they differ very faintly. • There are a few notable benchmark instances where our proposal clearly outperforms the trunk. • On the other hand it seems no instance shows clear slowdown for our proposal. • This tendency is roughly the same as we saw in 2016.
  44. Mid-sized benchmarks • We tested `time make rdoc`, which has

    historically been considered as a benchmark that reflects real-word use-case. • Also did we test mame/optcarrot, which was made for benchmarking various ruby implementations.
  45. 23.13 23.58 0 5 10 15 20 25 trunk ours

    `time make rdoc` [sec] (greater = slower) faster
  46. faster 42.554 43.276 0 5 10 15 20 25 30

    35 40 45 50 trunk ours Optcarrot Lan_Master.nes [fps] (greater = faster)
  47. Mid-sized benchmarks • We have to say they are almost

    the same. • Rdoc got slower, opcarrot got faster. We see these results consistently. • There might be reasons behind them but, … well, isn’t it enough to say that we see no significant changes?
  48. Rails application • discourse/discourse comes with a benchmark script so

    we tested our changeset against it. • The benchmark is basically a series of `ab(1)`. • Discourse is a field-proven real-world Rails application. The benchmark shows how the proposed changeset behaves in the wild. • OTOH this is the greatest LOC among other benchmarks.
  49. 50 90 0 20 40 60 80 100 120 140

    160 180 categories ours categories trunk home ours home trunk topic ours topic trunk categories_admin ours categories_admin trunk home_admin ours home_admin trunk topic_admin ours topic_admin trunk Discourse benchmark results [msec] (greater = slower) 50 75 90 99 faster
  50. 50 56 69 115 51 62 69 119 0 20

    40 60 80 100 120 140 50 75 90 99 Percentile Discourse home [msec] (greater = slower) trunk ours faster
  51. 80 87 100 157 83 96 102 169 0 20

    40 60 80 100 120 140 160 180 50 75 90 99 Percentile Discourse categories_admin [msec] (greater = slower) trunk ours faster
  52. 3707 3963 0 1000 2000 3000 4000 5000 trunk ours

    Discourse timing loading rails [msec] (greater = slower) faster
  53. Conclusions • Additional method-calling ABIs are introduced to tell each

    method if its return value is used or not. Unused return values are then optimised out from the VM’s value stack. • Our proposal sacrifices process bootup time to yield better runtime performance. • Not only small benchmarks, but also Rails applications can benefit from it.
  54. (more) Aggressive compilation • The automatic insertion of `opt_bailout` proposed

    in this presentation works, but we can think of more. • For instance let us consider: 1.times {|i| x, y = self, i }
  55. == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,29)> (catch: FALSE) == catch table |

    catch type: break st: 0000 ed: 0005 sp: 0000 cont: 0005 | == disasm: #<ISeq:block in <main>@-e:1 (1,8)-(1,29)> (catch: FALSE) | == catch table | | catch type: redo st: 0001 ed: 0014 sp: 0000 cont: 0001 | | catch type: next st: 0001 ed: 0014 sp: 0000 cont: 0014 | |------------------------------------------------------------------------ | local table (size: 3, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) | [ 3] i@0<Arg> | [ 2] x@1 | [ 1] y@2 | 0000 nop ( 1)[Bc] | 0001 putself [Li] | 0002 getlocal_WC_0 i@0 | 0004 newarray 2 | 0006 dup | 0007 expandarray 2, 0 | 0010 setlocal_WC_0 x@1 | 0012 setlocal_WC_0 y@2 | 0014 nop | 0015 leave ( 1)[Br] |------------------------------------------------------------------------ 0000 putobject_INT2FIX_1_ ( 1)[Li] 0001 send <callinfo!mid:times, argc:0>, <callcache>, block in <main> 0005 nop 0006 leave ( 1)
  56. | | catch type: next st: 0001 ed: 0014 sp:

    0000 cont: 0014 | |--------------------------------------------------------------- | local table (size: 3, argc: 1 [opts: 0, rest: -1, post: 0, block | [ 3] i@0<Arg> | [ 2] x@1 | [ 1] y@2 | 0000 nop | 0001 putself [Li] | 0002 getlocal_WC_0 i@0 | 0004 newarray 2 | 0006 dup | 0007 expandarray 2, 0 | 0010 setlocal_WC_0 x@1 | 0012 setlocal_WC_0 y@2 | 0014 nop | 0015 leave |----------------------------------------------------------------- 0000 putobject_INT2FIX_1_ 0001 send <callinfo!mid:times, argc:0>, <c 0005 nop 0006 leave
  57. | | catch type: next st: 0001 ed: 0016 sp:

    0000 cont: 0016 | |--------------------------------------------------------------- | local table (size: 3, argc: 1 [opts: 0, rest: -1, post: 0, block | [ 3] i@0<Arg> | [ 2] x@1 | [ 1] y@2 | 0000 nop | 0001 putself [Li] | 0002 getlocal_WC_0 i@0 | 0004 newarray 2 | 0006 dup | 0007 expandarray 2, 0 | 0010 opt_bailout 3 | 0012 setlocal_WC_0 x@1 | 0014 setlocal_WC_0 y@2 | 0016 nop | 0017 leave |----------------------------------------------------------------- 0000 putobject_INT2FIX_1_ 0001 send <callinfo!mid:times, argc:0>, <c 0005 nop
  58. (more) Aggressive compilation • We can think of better compilation

    so that the `newarray` instruction can be eliminated. • That should decrease GC pressures.
  59. Tail call flag propagation • Think of a method foo,

    which calls another method bar: def bar something return something_else end def foo return bar end foo; nil This must be optimisable
  60. Tail call flag propagation • We have tried optimising this

    scenario but turned out it slows things down. • Overheads added by asking “is this a tail call?” every time a method is called turned out to be too heavy. • Some static analysis could be possible. That might reroute the runtime overheads.
  61. Other future works • The introduced C API could be

    applied to our core classes, to gain bonus speed-ups. • Values of blocks called from C (which we still cannot say if they are used or not) should also be considered.