$30 off During Our Annual Pro Sale. View Details »

The send-pop optimisation

The send-pop optimisation

Urabe Shyouhei

April 20, 2019
Tweet

More Decks by Urabe Shyouhei

Other Decks in Technology

Transcript

  1. The send-pop
    optimisation
    Urabe, Shyouhei
    Photo by Tomoyuki Kengaku

    View Slide

  2. In a nutshell, this talk is about…
    • The “send-pop” sequence we focus in this talk is a pattern
    that appears very frequently in a Ruby program.
    • We propose automatic detection of them, and let the inter-
    preter optimise that part.
    • This optimisation enhances some benchmark results,
    including Rails.

    View Slide

  3. Motivations
    Photo by Tomoyuki Kengaku

    View Slide

  4. def foo
    something
    something_another
    return something_else
    end

    View Slide

  5. == disasm: #:1 (1,0)-(5,3)> (catch: FALSE)
    0000 putself
    0001 send 0005 pop
    0006 putself
    0007 send 0011 pop
    0012 putself
    0013 send 0017 leave

    View Slide

  6. The “send-pop” sequence
    • Calling a method, then immediately discarding its return value.
    • Note that every method in Ruby has return value(s).
    • The value(s) returned however do not have to be used.
    • Even when a method does not expect its caller to take return values,
    it has to return something “just in case” the expectation breaks.
    • Waste of both time and memory.

    View Slide

  7. But how often?
    • By taking 2-grams of a mame/optcarrot execution…

    View Slide

  8. % LANG=C sort 2gram.txt | uniq -c | sort -nr | head -n 10
    69065813 getinstancevariable -> getinstancevariable
    65600442 putself -> getinstancevariable
    59624140 getinstancevariable -> branchunless
    59116388 branchunless -> getinstancevariable
    52828407 leave -> pop
    50434175 getinstancevariable -> putobject
    30368815 pop -> putself
    27717161 setinstancevariable -> getinstancevariable
    25661090 branchunless -> putself
    25165032 getinstancevariable -> branchif

    View Slide

  9. But how often?
    • By taking 2-grams of a mame/optcarrot execution, the
    sequence in question is #5 most frequent.
    • This is definitely worth consideration.

    View Slide

  10. Relax them
    Photo by Tomoyuki Kengaku

    View Slide

  11. First step: allow arbitrary return values
    • We cannot entirely eliminate return values.
    • In the wild, there already are methods written in C.
    • They cannot be modified, and they already return something.
    • The best we can do is to allow methods to return arbitrary
    values when they are not used by their callers.
    • Let each methods decide what to return. We can auto-optimise
    pure-ruby methods later.

    View Slide

  12. Pass 1-bit flag to each method
    • Every time a method is called, some flags are passed to it
    already.
    • Why not add another one who describes the usage of its
    return value(s).

    View Slide

  13. diff --git a/vm_core.h b/vm_core.h
    index 574837dea0..513b8b85c1 100644
    --- a/vm_core.h
    +++ b/vm_core.h
    @@ -1132,11 +1133,11 @@ typedef rb_control_frame_t *
    enum {
    /* Frame/Environment flag bits:
    - * MMMM MMMM MMMM MMMM ____ __FF FFFF EEEX (LSB)
    + * MMMM MMMM MMMM MMMM ____ _FFF FFFF EEEX (LSB)
    *
    * X : tag for GC marking (It seems as Fixnum)
    * EEE : 3 bits Env flags
    - * FF..: 6 bits Frame flags
    + * FF..: 7 bits Frame flags
    * MM..: 15 bits frame magic (to check frame corruption)
    */
    @@ -1160,6 +1161,7 @@ enum {
    VM_FRAME_FLAG_CFRAME = 0x0080,
    VM_FRAME_FLAG_LAMBDA = 0x0100,
    VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM = 0x0200,
    + VM_FRAME_FLAG_POPPED = 0x0400,
    /* env flag */
    VM_ENV_FLAG_LOCAL = 0x0002,

    View Slide

  14. diff --git a/vm_insnhelper.c b/vm_insnhelper.c
    index a2f7433029..b024b29fc6 100644
    --- a/vm_insnhelper.c
    +++ b/vm_insnhelper.c
    @@ -1767,12 +1767,13 @@ static inline VALUE
    vm_call_iseq_setup_normal(rb_execution_context_t *ec, rb_control_frame_t *cfp, struct rb_calling_in
    int opt_pc, int param_size, int local_size)
    {
    + int popped = calling->popped;
    const rb_iseq_t *iseq = def_iseq_ptr(me->def);
    VALUE *argv = cfp->sp - calling->argc;
    VALUE *sp = argv + param_size;
    cfp->sp = argv - 1 /* recv */;
    - vm_push_frame(ec, iseq, VM_FRAME_MAGIC_METHOD | VM_ENV_FLAG_LOCAL, calling->recv,
    + vm_push_frame(ec, iseq, VM_FRAME_MAGIC_METHOD | VM_ENV_FLAG_LOCAL | popped, calling->recv,
    calling->block_handler, (VALUE)me,
    iseq->body->iseq_encoded + opt_pc, sp,
    local_size - param_size,
    @@ -1791,6 +1792,7 @@ vm_call_iseq_setup_tailcall(rb_execution_context_t *ec, rb_control_frame_t *cf
    VALUE *src_argv = argv;
    VALUE *sp_orig, *sp;
    VALUE finish_flag = VM_FRAME_FINISHED_P(cfp) ? VM_FRAME_FLAG_FINISH : 0;
    + unsigned long popped = VM_ENV_FLAGS(cfp->ep, VM_FRAME_FLAG_POPPED);
    if (VM_BH_FROM_CFP_P(calling->block_handler, cfp)) {
    struct rb_captured_block *dst_captured = VM_CFP_TO_CAPTURED_BLOCK(RUBY_VM_PREVIOUS_CONTROL_
    @@ -1818,7 +1820,7 @@ vm_call_iseq_setup_tailcall(rb_execution_context_t *ec, rb_control_frame_t *cf
    *sp++ = src_argv[i];

    View Slide

  15. Use the flag
    Photo by Tomoyuki Kengaku

    View Slide

  16. Let pure-Ruby methods check that flag
    • We can make pure-Ruby methods check that flag
    automatically, so that they can skip rearmost instructions.
    • For instance when we have:
    def foo(x)
    y = bar(x)
    return y
    end

    View Slide

  17. == disasm: #:1 (1,2)-(4,5)> (catch: FALSE)
    local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block:
    [ 2] x@0 [ 1] y@1
    0000 putself
    0001 getlocal x@0, 0
    0004 send 0008 setlocal y@1, 0
    0011 getlocal y@1, 0
    0014 leave
    Waste of time if the value returned is not used

    View Slide

  18. == disasm: #:1 (1,2)-(4,5)> (catch: FALSE)
    local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block:
    [ 2] x@0 [ 1] y@1
    0000 putself
    0001 getlocal x@0, 0
    0004 send 0008 opt_bailout 1
    0010 setlocal y@1, 0
    0013 getlocal y@1, 0 ( 3)
    0016 leave

    View Slide

  19. +/* This instruction is no-op unless the instruction sequence is called
    + * with VM_FRAME_FLAG_POPPED. With that flag on, it immediately
    + * leaves the current stack frame with scratching the topmost n stack
    + * values. The return value of the iseq for that case is always
    + * nil. */
    +DEFINE_INSN
    +opt_bailout
    +(rb_num_t n)
    +()
    +()
    +{
    +#ifdef MJIT_HEADER
    + /* :FIXME: don't know how to make it work with JIT... */
    +#else
    + if (VM_ENV_FLAGS(GET_EP(), VM_FRAME_FLAG_POPPED) &&
    + CURRENT_INSN_IS(opt_bailout) /* <- rule out trace instruction */ ) {
    + POPN(n);
    + PUSH(Qnil);
    + DISPATCH_ORIGINAL_INSN(leave);
    + }
    +
    #endif
    +}
    +
    /**********************************************************/
    /* deal with control flow 3: exception */
    /**********************************************************/

    View Slide

  20. Automatic
    insertion of it
    Photo by Tomoyuki Kengaku

    View Slide

  21. Make the insertion automatic
    • What operations are safe to be skipped when a return value
    is not used?
    • Obviously not everything are.
    • That concept should be identical to what we call “pure”
    operations, proposed in RubyKaigi 2016.

    View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. Recap

    View Slide

  26. View Slide

  27. Automatic bail out of a method
    • In stead of thinking a method being entirely pure or not, we
    are gong to focus on each method’s rearmost part that are
    pure.
    • Such part, if any, makes no sense when the return value of
    the method is discarded.

    View Slide

  28. == disasm: #:1 (1,2)-(4,5)> (catch: FALSE)
    local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block:
    [ 2] x@0 [ 1] y@1
    0000 putself
    0001 getlocal x@0, 0
    0004 send 0008 setlocal y@1, 0
    0011 getlocal y@1, 0
    0014 leave
    pure
    pure
    not pure
    pure
    pure
    pure

    View Slide

  29. == disasm: #:1 (1,2)-(4,5)> (catch: FALSE)
    local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block:
    [ 2] x@0 [ 1] y@1
    0000 putself
    0001 getlocal x@0, 0
    0004 send 0008 opt_bailout 1
    0010 setlocal y@1, 0
    0013 getlocal y@1, 0 ( 3)
    0016 leave

    View Slide

  30. C API
    (nit-picky)
    Photo by Tomoyuki Kengaku

    View Slide

  31. Can we also optimize C methods?
    • We cannot auto-skip a part of a C method.
    • But the `VM_FRAME_FLAG_POPPED` flag is always set, no
    matter the called method is in Ruby or not.
    • Why not make it visible from C, so that future methods can
    look at it.

    View Slide

  32. diff --git a/vm.c b/vm.c
    index c5beed64c0..d33ff98619 100644
    --- a/vm.c
    +++ b/vm.c
    @@ -3544,4 +3544,14 @@ vm_collect_usage_register(int reg, int isset)
    #endif
    /* #ifndef MJIT_HEADER */
    +int
    +rb_whether_the_return_value_is_used_p(void)
    +{
    + const struct rb_execution_context_struct *ec = GET_EC();
    + const struct rb_control_frame_struct *reg_cfp = ec->cfp;
    + const VALUE *ep = GET_EP();
    +
    + return ! VM_ENV_FLAGS(ep, VM_FRAME_FLAG_POPPED);
    +}
    +
    #include "vm_call_iseq_optimized.inc" /* required from vm_insnhelper.c */

    View Slide

  33. Practical applications
    • `StringScanner#scan` scans the receiver, advances its
    internal pointer, then returns the matched string. The
    “matched string” can be omitted by leveraging the flag.
    • Exact same discussion applies to `String#slice!`

    View Slide

  34. Eliminating
    `pop`s
    Photo by Tomoyuki Kengaku

    View Slide

  35. def foo
    something
    something_another
    return something_else
    end
    Recap:

    View Slide

  36. == disasm: #:1 (1,0)-(5,3)> (catch: FALSE)
    0000 putself
    0001 send 0005 pop
    0006 putself
    0007 send 0011 pop
    0012 putself
    0013 send 0017 leave
    Would like to eliminate those `pop`s

    View Slide

  37. == disasm: #:1 (1,0)-(5,3)> (catch: FALSE)
    0000 putself
    0001 send 0005 putself
    0006 send 0010 putself
    0011 send 0015 leave
    Would like to eliminate those `pop`s

    View Slide

  38. Note however, that:
    • The elimination is not always possible.
    • That `pop` can be a jump destination.
    • For an (illustrative) example:
    def foo
    self &. x
    nil
    end

    View Slide

  39. == disasm: #:1 (1,0)-(4,3)> (catch: FALSE)
    0000 putself
    0001 dup
    0002 branchnil 7
    0004 opt_send_without_block 0007 pop
    0008 putnil
    0009 leave
    This `pop` is not optimizable.

    View Slide

  40. Let us add another frame flag
    • Called `VM_FRAME_FLAG_POPIT`.
    • This flag denotes that the pop instruction in the caller was
    optimised out from the sequence.
    • Hence when the flag is set, it is the callee’s duty to properly
    skip pushing return value(s), not its caller’s.

    View Slide

  41. diff –git a/vm_core.h b/vm_core.h
    index 0b3f3e06ba..932c70a734 100644
    --- a/vm_core.h
    +++ b/vm_core.h
    @@ -1134,11 +1136,11 @@ typedef rb_control_frame_t *
    enum {
    /* Frame/Environment flag bits:
    - * MMMM MMMM MMMM MMMM ____ _FFF FFFF EEEX (LSB)
    + * MMMM MMMM MMMM MMMM ____ FFFF FFFF EEEX (LSB)
    *
    * X : tag for GC marking (It seems as Fixnum)
    * EEE : 3 bits Env flags
    - * FF..: 7 bits Frame flags
    + * FF..: 8 bits Frame flags
    * MM..: 15 bits frame magic (to check frame corruption)
    */
    @@ -1163,6 +1165,7 @@ enum {
    VM_FRAME_FLAG_LAMBDA = 0x0100,
    VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM = 0x0200,
    VM_FRAME_FLAG_POPPED = 0x0400,
    + VM_FRAME_FLAG_POPIT = 0x0800,
    /* env flag */
    VM_ENV_FLAG_LOCAL = 0x0002,

    View Slide

  42. == disasm: #:1 (1,0)-(5,3)> (catch: FALSE)
    0000 putself
    0001 send 0005 pop
    0006 putself
    0007 send 0011 pop
    0012 putself
    0013 send 0017 leave

    View Slide

  43. == disasm: #:1 (1,0)-(5,3)> (catch: FALSE)
    0000 putself
    0001 send 0005 putself
    0006 send 0010 putself
    0011 send 0015 leave

    View Slide

  44. == disasm: #:1 (1,0)-(5,3)> (catch: FALSE)
    0000 putself ( 2)[LiCa]
    0001 send , , nil
    0005 putself ( 3)[Li]
    0006 send , , nil
    0010 putself ( 4)[Li]
    0011 send , , nil
    0015 leave ( 5)[Re]

    View Slide

  45. Avoid pushing
    return values
    Photo by Tomoyuki Kengaku

    View Slide

  46. In order to properly avoid pushing…
    • We have to consider 3 (!) distinct situations.
    • Returning from a method written in C.
    • Returning from a method written in Ruby.
    • Returning from inside of a block.

    View Slide

  47. C method return values
    • C methods return values using C’s return semantics. Just
    discarding them should suffice.
    VALUE
    foo(VALUE x)
    {
    VALUE y = complex_calculation(x);
    return y;
    }

    View Slide

  48. diff --git a/tool/ruby_vm/views/_insn_entry.erb b/tool/ruby_vm/views/_insn_entry.erb
    index cdadd93abc..bbfe539fd2 100644
    --- a/tool/ruby_vm/views/_insn_entry.erb
    +++ b/tool/ruby_vm/views/_insn_entry.erb
    @@ -56,7 +58,18 @@ INSN_ENTRY(<%= insn.name %>)
    /* ### Instruction trailers. ### */
    CHECK_VM_STACK_OVERFLOW_FOR_INSN(VM_REG_CFP, INSN_ATTR(retn));
    <%= insn.handle_canary "CHECK_CANARY()" -%>
    -% if insn.handles_sp?
    +% if insn.sendish? # Then we can safely assume there is only one return value.
    +% if insn.handles_sp?
    + if (! (ci->compiled_frame_bits & VM_FRAME_FLAG_POPIT)) {
    + PUSH(<%= insn.cast_to_VALUE insn.rets.first %>);
    + }
    +% else
    + INC_SP(INSN_ATTR(sp_inc));
    + if (! (ci->compiled_frame_bits & VM_FRAME_FLAG_POPIT)) {
    + TOPN(0) = <%= insn.cast_to_VALUE insn.rets.first %>;
    + }
    +% end
    +% elsif insn.handles_sp?
    % insn.rets.reverse_each do |ret|
    PUSH(<%= insn.cast_to_VALUE ret %>);
    % end

    View Slide

  49. Ruby method return values
    • Ruby methods (normally) return values using `leave`
    instruction.
    def foo(x)
    return x + 1
    end

    View Slide

  50. == disasm: #:1 (1,0)-(3,3)> (catch: FALSE)
    local table (size: 1, argc: 1 [opts: 0, rest: -1, post: 0, block:
    [ 1] x@0
    0000 getlocal x@0, 0
    0003 putobject 1
    0005 send 0009 leave

    View Slide

  51. diff --git a/insns.def b/insns.def
    index a38dc30168..68e7eabfae 100644
    --- a/insns.def
    +++ b/insns.def
    @@ -927,7 +911,7 @@ DEFINE_INSN
    leave
    ()
    (VALUE val)
    -(VALUE val)
    +(...)
    /* This is super surprising but when leaving from a frame, we check
    * for interrupts. If any, that should be executed on top of the
    * current execution context. This is a method call. */
    @@ -939,7 +923,10 @@ leave
    // attr enum rb_insn_purity purity = rb_insn_is_pure;
    /* And this instruction handles SP by nature. */
    // attr bool handles_sp = true;
    +// attr rb_snum_t sp_inc = 0;
    {
    + bool popit = VM_ENV_FLAGS(GET_EP(), VM_FRAME_FLAG_POPIT);
    +
    if (OPT_CHECKED_RUN) {
    const VALUE *const bp = vm_base_ptr(reg_cfp);
    if (reg_cfp->sp != bp) {
    @@ -959,6 +946,9 @@ leave
    }
    else {
    RESTORE_REGS();
    + if (! popit) {
    + PUSH(val);
    + }
    }
    }

    View Slide

  52. So far so good… but,
    • It immediately gets complicated when a block has a return
    statement.
    def foo(x)
    x.times do | i |
    return i
    end
    end
    p foo(42) # => 0

    View Slide

  53. So far so good… but,
    • It immediately gets complicated when a block has a return
    statement.
    def foo(x)
    x.times &-> (i) do
    return i
    end
    end
    p foo(42) # => 42

    View Slide

  54. What “return-inside-of-a-block” does:
    1. Look for the exact place where the execution to proceed.
    2. Rewind the stack.
    3. Push the return value onto the stack.
    4. Continue executing.
    This has to be cancelled, however:
    The flag has been squashed already

    View Slide

  55. diff --git a/vm.c b/vm.c
    index 807a20ee5a..057863e5e3 100644
    --- a/vm.c
    +++ b/vm.c
    @@ -1926,6 +1926,7 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state,
    VALUE errinfo, VALUE *initial)
    {
    struct vm_throw_data *err = (struct vm_throw_data *)errinfo;
    + bool popit = false;
    for (;;) {
    unsigned int i;
    @@ -1950,6 +1951,7 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state,
    rb_vm_frame_method_entry(ec->cfp)->owner,
    rb_vm_frame_method_entry(ec->cfp)->def->original_id);
    }
    + popit = VM_ENV_FLAGS(ec->cfp->ep, VM_FRAME_FLAG_POPIT);
    rb_vm_pop_frame(ec);
    }
    @@ -1983,6 +1985,7 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state,
    ec->errinfo = Qnil;
    THROW_DATA_CATCH_FRAME_SET(err, cfp + 1);
    hook_before_rewind(ec, ec->cfp, TRUE, state, err);
    + popit = VM_ENV_FLAGS(ec->cfp->ep, VM_FRAME_FLAG_POPIT);
    rb_vm_pop_frame(ec);
    return THROW_DATA_VAL(err);
    }
    @@ -1994,7 +1997,9 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state,
    #if OPT_STACK_CACHING
    *initial = THROW_DATA_VAL(err);
    #else
    - *ec->cfp->sp++ = THROW_DATA_VAL(err);
    + if (! popit) {
    + *ec->cfp->sp++ = THROW_DATA_VAL(err);
    + }
    #endif
    ec->errinfo = Qnil;
    return Qundef;
    @@ -2128,12 +2133,14 @@ vm_exec_handle_exception(rb_execution_context_t *ec, enum ruby_tag_type state,
    hook_before_rewind(ec, ec->cfp, FALSE, state, err);
    if (VM_FRAME_FINISHED_P(ec->cfp)) {
    + popit = VM_ENV_FLAGS(ec->cfp->ep, VM_FRAME_FLAG_POPIT);
    rb_vm_pop_frame(ec);
    ec->errinfo = (VALUE)err;
    ec->tag = ec->tag->prev;
    EC_JUMP_TAG(ec, state);
    }
    else {
    + popit = VM_ENV_FLAGS(ec->cfp->ep, VM_FRAME_FLAG_POPIT);
    rb_vm_pop_frame(ec);
    }
    }

    View Slide

  56. Benchmarks
    Photo by Tomoyuki Kengaku

    View Slide

  57. Several benchmarks were exercised.
    • Caution: YMMV
    • All benchmarks are done on this exact machine I am projecting
    this presentation: 6th gen. ThinkPad X1 Carbon.
    • They all compare trunk (2.7.0 revision 67168), versus ours
    (the proposed patch applied against trunk).

    View Slide

  58. The `make benchmark` results
    • This set of benchmarks are considered micro: consist of
    many small ruby scripts. They tend to shed some lights on
    each specific parts of the VM.

    View Slide

  59. faster

    View Slide

  60. 0.000 1.000
    so_ackermann
    so_array
    so_binary_trees
    so_concatenate
    so_exception
    so_fannkuch
    so_fasta
    so_lists
    so_mandelbrot
    so_matrix
    so_meteor_contest
    so_nbody
    so_nested_loop
    so_nsieve
    so_nsieve_bits
    so_object
    so_partial_sums
    so_pidigits
    so_random
    so_sieve
    so_spectralnorm
    Speedup ratio versus trunk (greater = faster)
    faster

    View Slide

  61. 0.000 1.000
    vm1_attr_ivar_set
    vm1_gc_wb_obj
    vm2_method
    vm2_method_with_block
    vm2_send
    vm2_struct_small_aref
    Speedup ratio versus trunk (greater = faster)
    faster

    View Slide

  62. faster

    View Slide

  63. The `make benchmark` results
    • Majority of the results are almost the same. Either slower
    or faster, they differ very faintly.
    • There are a few notable benchmark instances where our
    proposal clearly outperforms the trunk.
    • On the other hand it seems no instance shows clear
    slowdown for our proposal.
    • This tendency is roughly the same as we saw in 2016.

    View Slide

  64. Mid-sized benchmarks
    • We tested `time make rdoc`, which has historically been
    considered as a benchmark that reflects real-word use-case.
    • Also did we test mame/optcarrot, which was made for
    benchmarking various ruby implementations.

    View Slide

  65. 23.13
    23.58
    0 5 10 15 20 25
    trunk
    ours
    `time make rdoc` [sec] (greater = slower)
    faster

    View Slide

  66. faster
    42.554
    43.276
    0 5 10 15 20 25 30 35 40 45 50
    trunk
    ours
    Optcarrot Lan_Master.nes [fps] (greater = faster)

    View Slide

  67. Mid-sized benchmarks
    • We have to say they are almost the same.
    • Rdoc got slower, opcarrot got faster. We see these results
    consistently.
    • There might be reasons behind them but, … well, isn’t it
    enough to say that we see no significant changes?

    View Slide

  68. Rails application
    • discourse/discourse comes with a benchmark script so we
    tested our changeset against it.
    • The benchmark is basically a series of `ab(1)`.
    • Discourse is a field-proven real-world Rails application. The
    benchmark shows how the proposed changeset behaves in
    the wild.
    • OTOH this is the greatest LOC among other benchmarks.

    View Slide

  69. 50
    90
    0
    20
    40
    60
    80
    100
    120
    140
    160
    180
    categories ours categories trunk home ours home trunk topic ours topic trunk categories_admin
    ours
    categories_admin
    trunk
    home_admin ours home_admin trunk topic_admin ours topic_admin trunk
    Discourse benchmark results [msec] (greater = slower)
    50
    75
    90
    99
    faster

    View Slide

  70. 50
    56
    69
    115
    51
    62
    69
    119
    0 20 40 60 80 100 120 140
    50
    75
    90
    99
    Percentile
    Discourse home [msec] (greater = slower)
    trunk
    ours
    faster

    View Slide

  71. 80
    87
    100
    157
    83
    96
    102
    169
    0 20 40 60 80 100 120 140 160 180
    50
    75
    90
    99
    Percentile
    Discourse categories_admin [msec] (greater = slower)
    trunk
    ours
    faster

    View Slide

  72. 3707
    3963
    0 1000 2000 3000 4000 5000
    trunk
    ours
    Discourse timing loading rails [msec] (greater = slower)
    faster

    View Slide

  73. Conclusions
    Photo by Tomoyuki Kengaku

    View Slide

  74. Conclusions
    • Additional method-calling ABIs are introduced to tell each
    method if its return value is used or not. Unused return
    values are then optimised out from the VM’s value stack.
    • Our proposal sacrifices process bootup time to yield better
    runtime performance.
    • Not only small benchmarks, but also Rails applications can
    benefit from it.

    View Slide

  75. View Slide

  76. Future works
    Photo by Tomoyuki Kengaku

    View Slide

  77. (more) Aggressive compilation
    • The automatic insertion of `opt_bailout` proposed in this
    presentation works, but we can think of more.
    • For instance let us consider:
    1.times {|i| x, y = self, i }

    View Slide

  78. == disasm: #@-e:1 (1,0)-(1,29)> (catch: FALSE)
    == catch table
    | catch type: break st: 0000 ed: 0005 sp: 0000 cont: 0005
    | == disasm: #@-e:1 (1,8)-(1,29)> (catch: FALSE)
    | == catch table
    | | catch type: redo st: 0001 ed: 0014 sp: 0000 cont: 0001
    | | catch type: next st: 0001 ed: 0014 sp: 0000 cont: 0014
    | |------------------------------------------------------------------------
    | local table (size: 3, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
    | [ 3] i@0 | [ 2] x@1 | [ 1] y@2
    | 0000 nop ( 1)[Bc]
    | 0001 putself [Li]
    | 0002 getlocal_WC_0 i@0
    | 0004 newarray 2
    | 0006 dup
    | 0007 expandarray 2, 0
    | 0010 setlocal_WC_0 x@1
    | 0012 setlocal_WC_0 y@2
    | 0014 nop
    | 0015 leave ( 1)[Br]
    |------------------------------------------------------------------------
    0000 putobject_INT2FIX_1_ ( 1)[Li]
    0001 send , , block in
    0005 nop
    0006 leave ( 1)

    View Slide

  79. | | catch type: next st: 0001 ed: 0014 sp: 0000 cont: 0014
    | |---------------------------------------------------------------
    | local table (size: 3, argc: 1 [opts: 0, rest: -1, post: 0, block
    | [ 3] i@0 | [ 2] x@1 | [ 1] y@2
    | 0000 nop
    | 0001 putself [Li]
    | 0002 getlocal_WC_0 i@0
    | 0004 newarray 2
    | 0006 dup
    | 0007 expandarray 2, 0
    | 0010 setlocal_WC_0 x@1
    | 0012 setlocal_WC_0 y@2
    | 0014 nop
    | 0015 leave
    |-----------------------------------------------------------------
    0000 putobject_INT2FIX_1_
    0001 send , 0005 nop
    0006 leave

    View Slide

  80. | | catch type: next st: 0001 ed: 0016 sp: 0000 cont: 0016
    | |---------------------------------------------------------------
    | local table (size: 3, argc: 1 [opts: 0, rest: -1, post: 0, block
    | [ 3] i@0 | [ 2] x@1 | [ 1] y@2
    | 0000 nop
    | 0001 putself [Li]
    | 0002 getlocal_WC_0 i@0
    | 0004 newarray 2
    | 0006 dup
    | 0007 expandarray 2, 0
    | 0010 opt_bailout 3
    | 0012 setlocal_WC_0 x@1
    | 0014 setlocal_WC_0 y@2
    | 0016 nop
    | 0017 leave
    |-----------------------------------------------------------------
    0000 putobject_INT2FIX_1_
    0001 send , 0005 nop

    View Slide

  81. (more) Aggressive compilation
    • We can think of better compilation so that the `newarray`
    instruction can be eliminated.
    • That should decrease GC pressures.

    View Slide

  82. Tail call flag propagation
    • Think of a method foo, which calls another method bar:
    def bar
    something
    return something_else
    end
    def foo
    return bar
    end
    foo; nil
    This must be optimisable

    View Slide

  83. Tail call flag propagation
    • We have tried optimising this scenario but turned out it
    slows things down.
    • Overheads added by asking “is this a tail call?” every time a
    method is called turned out to be too heavy.
    • Some static analysis could be possible. That might reroute
    the runtime overheads.

    View Slide

  84. Other future works
    • The introduced C API could be applied to our core classes,
    to gain bonus speed-ups.
    • Values of blocks called from C (which we still cannot say if
    they are used or not) should also be considered.

    View Slide

  85. View Slide