$30 off During Our Annual Pro Sale. View Details »

Method JIT Compiler for MRI

Method JIT Compiler for MRI

RubyElixirConf 2018
https://2018.rubyconf.tw/

Takashi Kokubun

April 27, 2018
Tweet

More Decks by Takashi Kokubun

Other Decks in Programming

Transcript

  1. Method JIT
    Compiler for MRI
    RubyElixirConf Taiwan 2018
    ~ Optimizations in Ruby 2.6.0 preview1, 2 ~
    @k0kubun / Treasure Data Inc.

    View Slide

  2. @k0kubun
    Treasure Data Inc.
    ERB maintainer, developing Ruby’s JIT

    View Slide

  3. The history of MRI JIT

    View Slide

  4. March 2017: RTL & MJIT

    View Slide

  5. October 2017: YARV-MJIT
    [email protected]
    GQT





    Ruby 2.0 Ruby 2.5 YARV-MJIT RTL MJIT




    https://github.com/k0kubun/yarv-mjit/tree/master-171211#optcarrot-benchmark

    View Slide

  6. February 2018:
    Merge MJIT infrastructure

    View Slide

  7. February 2018:
    Released in 2.6.0-preview1

    View Slide

  8. How does it work?

    View Slide

  9. Optionally enabled by "--jit"
    Tips: RUBYOPT="--jit" ruby … works too

    View Slide

  10. New runtime dependency:
    gcc / clang

    View Slide

  11. How Ruby’s method JIT works
    Methods
    Interpret

    View Slide

  12. Methods
    Interpret
    Frequent calls
    !
    How Ruby’s method JIT works

    View Slide

  13. Methods
    Compile Machine
    code
    Interpret
    How Ruby’s method JIT works

    View Slide

  14. Methods
    Machine
    code
    Interpret
    Call
    How Ruby’s method JIT works

    View Slide

  15. Methods
    Machine
    code
    Interpret
    Call
    How Ruby’s method JIT works
    Compile

    View Slide

  16. Methods
    Machine
    code
    Call
    How Ruby’s method JIT works
    Compile

    View Slide

  17. Machine
    code
    Call
    How Ruby’s method JIT works

    View Slide

  18. Latest Ruby’s
    performance benchmarks

    View Slide

  19. Ruby 2.6.0-preview1
    https://benchmark-driver.github.io/benchmarks/optcarrot/releases.html

    View Slide

  20. Ruby trunk
    https://benchmark-driver.github.io/benchmarks/optcarrot/commits.html

    View Slide

  21. Ruby trunk
    https://benchmark-driver.github.io/benchmarks/optcarrot/commits.html
    2.6.0
    Preview1
    2.6.0
    Preview2 ?

    View Slide

  22. Micro benchmark: while
    5.7x faster
    2.6.0
    Preview1
    2.6.0
    Preview2 ?
    https://benchmark-driver.github.io/benchmarks/mjit/commits.html

    View Slide

  23. [email protected]
    GQT





    Ruby 2.0 trunk trunk+JIT RTL+JIT Ruby 3x3





    But… we’re still far from Ruby 3x3
    https://gist.github.com/k0kubun/7074ad434d0affd1bd98edaaa011ac1d
    39fps to go

    View Slide

  24. How to get there?
    Just inlining method doesn’t help if code is too complex
    We need more effort to exploit C compiler optimizations
    Let’s see what we’ve done so far

    View Slide

  25. 2.6.0-Preview1 Optimizations

    View Slide

  26. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code

    View Slide

  27. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code
    ISeq
    Compile
    putself
    send :bar, cache: nil
    leave

    View Slide

  28. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: nil
    leave
    ISeq
    Ruby VM
    Program Counter
    Interpret

    View Slide

  29. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: nil
    leave
    ISeq
    Ruby VM
    Program Counter
    Call
    putself() {
    val = GET_SELF();
    }
    C code for instruction

    View Slide

  30. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: nil
    leave
    ISeq
    Ruby VM
    Program Counter
    Interpret

    View Slide

  31. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: nil
    leave
    ISeq
    Ruby VM
    Program Counter
    Call
    send(cache) {
    search_method(cache);
    CALL_METHOD(cache);
    }
    C code for instruction

    View Slide

  32. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: nil
    leave
    ISeq
    Ruby VM
    Program Counter
    Call
    send(cache) {
    search_method(cache);
    CALL_METHOD(cache);
    }
    C code for instruction
    Ruby method push
    C method call
    attr_reader
    attr_writer
    .
    .
    .
    Which type
    will be called?

    View Slide

  33. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: Ruby
    leave
    ISeq
    Ruby VM
    Program Counter
    Call
    send(cache) {
    search_method(cache);
    CALL_METHOD(cache);
    }
    C code for instruction
    Store C function pointer
    w/ class timestamp
    Ruby method push
    C method call
    attr_reader
    attr_writer
    .
    .
    .

    View Slide

  34. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: Ruby
    leave
    Ruby VM
    Program Counter
    Call
    send(cache) {
    search_method(cache);
    CALL_METHOD(cache);
    }
    C code for instruction
    Ruby method push
    Dispatch it by calling function pointer
    (compiler can't optimize)

    View Slide

  35. 1. Basic inlining of Ruby method (r62197)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: Ruby
    leave
    Ruby VM
    Program Counter
    Call
    send(cache) {
    search_method(cache);
    CALL_METHOD(cache);
    }
    C code for instruction
    Ruby method push
    Ruby method push
    In JIT, we can inline
    this operation by
    checking cache in ISeq
    ISeq

    View Slide

  36. 1. Basic inlining of Ruby method (r62197)
    Using “method cache”, we can bypass method dispatch
    and inline the C function to push Ruby method frame
    If it's inlined, C compiler can apply various optimizations
    to Ruby method call, which is known as slow
    Optcarrot: 53.84fps -> 57.52fps

    View Slide

  37. 2. Bypass Array/Hash check for #[] (r62398)
    optimized_#[](recv, key) {
    if recv.is_a?(Array) {
    fast_Array#[](recv, key);
    }
    else if recv.is_a?(Hash) {
    fast_Hash#[](recv, key);
    }
    else {
    dispatch(recv, #[], key);
    }
    }

    View Slide

  38. 2. Bypass Array/Hash check for #[] (r62398)
    optimized_#[](recv, key) {
    if recv.is_a?(Array) {
    fast_Array#[](recv, key);
    }
    else if recv.is_a?(Hash) {
    fast_Hash#[](recv, key);
    }
    else {
    dispatch(recv, #[], key);
    }
    }
    array = [1,2,3]
    array[1]

    View Slide

  39. 2. Bypass Array/Hash check for #[] (r62398)
    optimized_#[](recv, key) {
    if recv.is_a?(Array) {
    fast_Array#[](recv, key);
    }
    else if recv.is_a?(Hash) {
    fast_Hash#[](recv, key);
    }
    else {
    dispatch(recv, #[], key);
    }
    }
    hash = { foo: 1}
    hash[:foo]

    View Slide

  40. 2. Bypass Array/Hash check for #[] (r62398)
    optimized_#[](recv, key) {
    if recv.is_a?(Array) {
    fast_Array#[](recv, key);
    }
    else if recv.is_a?(Hash) {
    fast_Hash#[](recv, key);
    }
    else {
    dispatch(recv, #[], key);
    }
    }
    def show
    params[:id]
    end
    ActionController::Parameters#[]

    View Slide

  41. 2. Bypass Array/Hash check for #[] (r62398)
    optimized_#[](recv, key) {
    if recv.is_a?(Array) {
    fast_Array#[](recv, key);
    }
    else if recv.is_a?(Hash) {
    fast_Hash#[](recv, key);
    }
    else {
    dispatch(recv, #[], key);
    }
    }
    def show
    params[:id]
    end
    ActionController::Parameters#[]
    These checks are NOT
    needed for classes
    other than Array, Hash

    View Slide

  42. 2. Bypass Array/Hash check for #[] (r62398)
    jit_#[](recv, key) {
    dispatch(recv, #[], key);
    }
    def show
    params[:id]
    end
    ActionController::Parameters#[]

    View Slide

  43. 2. Bypass Array/Hash check for #[] (r62398)
    Ruby always optimizes #[] for Array/Hash, but it’s suboptimal
    for other classes
    JIT removes the guard for Array/Hash by seeing call cache,
    and also inlines pushing a method frame
    The same optimization can be applied to other methods later

    View Slide

  44. 3. Inline Array#[] with Integer (r62388)
    optimized_#[](recv, key) {
    if recv.is_a?(Array) {
    fast_Array#[](recv, key); // extern
    } else if recv.is_a?(Hash) {
    fast_Hash#[](recv, key);
    } else {
    dispatch(recv, #[], key);
    }
    }
    It's not inlined and
    optimized well by compiler

    View Slide

  45. 3. Inline Array#[] with Integer (r62388)
    optimized_#[](recv, key) {
    if recv.is_a?(Array) {
    if key.is_a?(Integer) {
    Array#[Integer](recv, key); // inline
    } else {
    fast_Array#[](recv, key); // extern
    }
    } else if recv.is_a?(Hash) {
    fast_Hash#[](recv, key);
    } else {
    dispatch(recv, #[], key);
    }
    }
    This special path is inlined
    and optimized well on JIT

    View Slide

  46. 3. Inline Array#[] with Integer (r62388)
    Currently “JIT header“ has limited definitions of C
    functions in Ruby core
    I inlined a part of Array#[] definition, and then C compiler
    could optimize the code
    Optcarrot: 54.93fps -> 58.41fps

    View Slide

  47. 2.6.0-Preview1 wrap up
    I mainly worked for portability, stability, maintainability
    Fix SEGV and deadlock, remove broken optimizations…
    Notable optimizations were only 3, so it wasn't fast yet

    View Slide

  48. 2.6.0-Preview2 Optimizations

    View Slide

  49. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code

    View Slide

  50. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave

    View Slide

  51. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    Ruby VM
    Program Counter
    Stack Pointer
    VM stack
    empty

    View Slide

  52. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    Ruby VM
    Program Counter
    Stack Pointer
    VM stack
    1

    View Slide

  53. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    Ruby VM
    Program Counter
    Stack Pointer
    VM stack
    1

    View Slide

  54. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    Ruby VM
    Program Counter
    Stack Pointer
    VM stack
    1
    2

    View Slide

  55. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    Ruby VM
    Program Counter
    Stack Pointer
    VM stack
    1
    2

    View Slide

  56. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    Ruby VM
    Program Counter
    Stack Pointer
    VM stack
    3

    View Slide

  57. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    Ruby VM
    Program Counter
    Stack Pointer
    VM stack
    3

    View Slide

  58. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    Ruby VM
    Program Counter
    Stack Pointer
    VM stack
    3
    How to skip the stack
    pointer motion in JIT?

    View Slide

  59. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    jit_three() {
    }
    JIT-ed code: before

    View Slide

  60. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    jit_three() {
    *sp = 1;
    sp++;
    }
    JIT-ed code: before

    View Slide

  61. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    jit_three() {
    *sp = 1;
    sp++;
    *sp = 2;
    sp++;
    }
    JIT-ed code: before

    View Slide

  62. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    jit_three() {
    *sp = 1;
    sp++;
    *sp = 2;
    sp++;
    *(sp-2) = opt_plus(
    *(sp-2),*(sp-1));
    sp--;
    }
    JIT-ed code: before

    View Slide

  63. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    jit_three() {
    *sp = 1;
    sp++;
    *sp = 2;
    sp++;
    *(sp-2) = opt_plus(
    *(sp-2),*(sp-1));
    sp--;
    return *(sp-1);
    }
    JIT-ed code: before

    View Slide

  64. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    jit_three() {
    *sp = 1;
    sp++;
    *sp = 2;
    sp++;
    *(sp-2) = opt_plus(
    *(sp-2),*(sp-1));
    sp--;
    return *(sp-1);
    }
    JIT-ed code: before
    jit_three() {
    VALUE stack[2];
    stack[0] = 1;
    stack[1] = 2;
    stack[0] = opt_plus(
    stack[0], stack[1]);
    return stack[0];
    }
    JIT-ed code: after

    View Slide

  65. 1. Use C local variable for VM stack (r62655)
    def three
    1 + 2
    end
    Ruby code
    ISeq
    putobject 1
    putobject 2
    opt_plus
    leave
    jit_three() {
    *sp = 1;
    sp++;
    *sp = 2;
    sp++;
    *(sp-2) = opt_plus(
    *(sp-2),*(sp-1));
    sp--;
    return *(sp-1);
    }
    JIT-ed code: before
    jit_three() {
    VALUE stack[2];
    stack[0] = 1;
    stack[1] = 2;
    stack[0] = opt_plus(
    stack[0], stack[1]);
    return stack[0];
    }
    JIT-ed code: after
    Array local variable
    This seems okay for
    just "1 + 2", but...

    View Slide

  66. 1. Use C local variable for VM stack (r62655)
    def err
    raise 'error'
    end
    def three
    1 + (err rescue 2)
    end
    Ruby code

    View Slide

  67. 1. Use C local variable for VM stack (r62655)
    def err # JIT-ed
    raise 'error'
    end
    def three # JIT-ed
    1 + (err rescue 2)
    end
    Ruby code
    main()
    Call stack in C
    VM stack
    empty

    View Slide

  68. 1. Use C local variable for VM stack (r62655)
    def err # JIT-ed
    raise 'error'
    end
    def three # JIT-ed
    1 + (err rescue 2)
    end
    Ruby code
    main()
    Call stack in C
    ruby_vm() (setjmp called) VM stack
    empty

    View Slide

  69. 1. Use C local variable for VM stack (r62655)
    def err # JIT-ed
    raise 'error'
    end
    def three # JIT-ed
    1 + (err rescue 2)
    end
    Ruby code
    main()
    Call stack in C
    ruby_vm() (setjmp called)
    jit_three()
    stack[nil, nil] in jit_three()
    VM stack
    empty

    View Slide

  70. 1. Use C local variable for VM stack (r62655)
    def err # JIT-ed
    raise 'error'
    end
    def three # JIT-ed
    1 + (err rescue 2)
    end
    Ruby code
    main()
    Call stack in C
    ruby_vm() (setjmp called)
    jit_three()
    stack[1, nil] in jit_three()
    Push 1 to array
    local variable
    VM stack
    empty

    View Slide

  71. 1. Use C local variable for VM stack (r62655)
    def err # JIT-ed
    raise 'error'
    end
    def three # JIT-ed
    1 + (err rescue 2)
    end
    Ruby code
    main()
    Call stack in C
    ruby_vm() (setjmp called)
    jit_three()
    stack[1, nil] in jit_three()
    jit_err()
    VM stack
    empty

    View Slide

  72. 1. Use C local variable for VM stack (r62655)
    def err # JIT-ed
    raise 'error'
    end
    def three # JIT-ed
    1 + (err rescue 2)
    end
    Ruby code
    main()
    Call stack in C
    ruby_vm() (setjmp called)
    jit_three()
    stack[1, nil] in jit_three()
    jit_err()
    rb_raise() (call longjmp)
    VM stack
    empty

    View Slide

  73. 1. Use C local variable for VM stack (r62655)
    def err # JIT-ed
    raise 'error'
    end
    def three # JIT-ed
    1 + (err rescue 2)
    end
    Ruby code
    main()
    Call stack in C
    ruby_vm() (setjmp called)
    jit_three()
    stack[1, nil] in jit_three()
    jit_err()
    rb_raise() (call longjmp)
    longjmp
    purges
    JIT-ed frames
    VM stack
    empty

    View Slide

  74. 1. Use C local variable for VM stack (r62655)
    def err # JIT-ed
    raise 'error'
    end
    def three # JIT-ed
    1 + (err rescue 2)
    end
    Ruby code
    main()
    Call stack in C
    ruby_vm() (setjmp called)
    jit_three()
    stack[1, nil] in jit_three()
    jit_err()
    rb_raise() (call longjmp)
    VM stack
    empty
    2

    View Slide

  75. 1. Use C local variable for VM stack (r62655)
    def err # JIT-ed
    raise 'error'
    end
    def three # JIT-ed
    1 + (err rescue 2)
    end
    Ruby code
    main()
    Call stack in C
    ruby_vm() (setjmp called)
    jit_three()
    stack[1, nil] in jit_three()
    jit_err()
    rb_raise() (call longjmp)
    VM stack
    empty
    2
    VM Stack doesn't
    have 2 values
    => SEGV
    1 is expired

    View Slide

  76. 1. Use C local variable for VM stack (r62655)
    When "catch table" (rescue, ensure, etc.) does not exist, we
    don't need to resurrect stack values on exception
    So we can use just C local variables to reproduce the stack
    of Ruby VM only when catch table does not exist
    Stack pointer is not moved and compiler can inline values
    Optcarrot: 57.13fps -> 62.14fps

    View Slide

  77. 2. Bypass setjmp for yield (r62643)
    setjmp is slow
    If JIT-ed code is directly called from VM (no C function
    frames are created yet), we don’t need to call setjmp again
    Now yield is 1.3x faster than a non-JIT-ed case

    View Slide

  78. 3. Skip moving program counter (r62678)
    def err
    raise 'error'
    end
    def three
    1 + (err rescue 2)
    end
    Ruby code

    View Slide

  79. 3. Skip moving program counter (r62678)
    def err
    raise 'error'
    end
    def three
    1 + (err rescue 2)
    end
    Ruby code Ruby call stack
    #three
    Program Counter

    View Slide

  80. 3. Skip moving program counter (r62678)
    def err
    raise 'error'
    end
    def three
    1 + (err rescue 2)
    end
    Ruby code Ruby call stack
    #three
    Program Counter

    View Slide

  81. 3. Skip moving program counter (r62678)
    def err
    raise 'error'
    end
    def three
    1 + (err rescue 2)
    end
    Ruby code Ruby call stack
    #three
    Program Counter

    View Slide

  82. 3. Skip moving program counter (r62678)
    def err
    raise 'error'
    end
    def three
    1 + (err rescue 2)
    end
    Ruby code Ruby call stack
    #three
    #err
    Program Counter
    Program Counter

    View Slide

  83. 3. Skip moving program counter (r62678)
    def err
    raise 'error'
    end
    def three
    1 + (err rescue 2)
    end
    Ruby code Ruby call stack
    #three
    #err
    Program Counter
    Program Counter

    View Slide

  84. 3. Skip moving program counter (r62678)
    def err
    raise 'error'
    end
    def three
    1 + (err rescue 2)
    end
    Ruby code Ruby call stack
    #three
    #err
    Program Counter
    Program Counter
    #raise
    Program Counter
    longjmp

    View Slide

  85. 3. Skip moving program counter (r62678)
    def err
    raise 'error'
    end
    def three
    1 + (err rescue 2)
    end
    Ruby code Ruby call stack
    #three
    #err
    Program Counter
    Program Counter
    #raise
    Program Counter
    Program counter is used to
    resurrect the position after longjmp

    View Slide

  86. 3. Skip moving program counter (r62678)
    Same as the stack value's situation, we don't move the
    program counter only when catch table does not exist
    (rescue, ensure, etc.)
    Optcarrot: 64.92fps -> 68.08fps

    View Slide

  87. 4. Force inlining arithmetic instructions (r62677)
    C compiler has a threshold of function size to be inlined
    Some Ruby's instructions (+, -, *, /, ...) are too large to be
    inlined by default, so I applied an "always inline" attribute
    In the future, we should reduce the size of code instead
    Optcarrot: 60.19fps -> 64.92fps

    View Slide

  88. 5. Force inlining ivar instructions (r62693)
    Not only arithmetic instructions, but also instructions for
    instance variable are large too, so I force-inlined it
    Optcarrot: 67.04fps -> 68.20fps

    View Slide

  89. 6. Disable stack consistency check (r63092)
    Ruby VM is always asserting the size of stack when returning
    from a method, and it's slow
    We can skip it on JIT because it's already checked by VM
    Optcarrot: 67.43fps -> 69.92fps

    View Slide

  90. 7. Inline attr_reader method call (r63212)
    .
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: nil
    leave
    ISeq
    Ruby VM
    Program Counter
    Call
    send(cache) {
    search_method(cache);
    CALL_METHOD(cache);
    }
    C code for instruction
    Ruby method push
    C method call
    attr_reader
    attr_writer
    .
    .

    View Slide

  91. 7. Inline attr_reader method call (r63212)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: attr
    leave
    ISeq
    Ruby VM
    Program Counter
    Call
    send(cache) {
    search_method(cache);
    CALL_METHOD(cache);
    }
    C code for instruction
    Ruby method push
    C method call
    attr_reader
    attr_writer
    .
    .
    .

    View Slide

  92. 7. Inline attr_reader method call (r63212)
    def foo
    bar
    end
    Ruby code
    putself
    send :bar, cache: attr
    leave
    ISeq
    Ruby VM
    Program Counter
    Call
    send(cache) {
    search_method(cache);
    CALL_METHOD(cache);
    }
    C code for instruction
    get_istance_variable()
    attr_reader

    View Slide

  93. 7. Inline attr_reader method call (r63212)
    Using call cache in the same way as Ruby method, we
    can fully inline attr_reader without large compilation time
    The cost becomes the same as reference to normal
    instance variables
    Calling attr_reader is made 4x faster

    View Slide

  94. 2.6.0-Preview2 (trunk) wrap up
    I've mainly worked on performance because it's useless if
    it's slow
    Generated code is much simplified and made fast by
    removing program counter and stack pointer motions
    But it still has some complexity and it blocks significant
    performance improvement by Ruby method inlining

    View Slide

  95. Future of Ruby's JIT

    View Slide

  96. 1. Deoptimization by longjmp
    We can generate aggressive code and cancel all JIT-ed calls
    by longjmp when something unexpected happens
    I’m going to remove guard for TracePoint and cancel it later
    It should also be used when all method caches are purged

    View Slide

  97. 2. Instruction specialization for types
    Currently the same code is generated for both Hash#[]
    and Array#[]
    We need some instrumentation to detect the type which is
    passed to an optimized instruction
    Vladimir's RTL instruction achieves this by dynamic
    modification of instruction

    View Slide

  98. 3. Multi-tier JIT
    Some other languages have multiple stages for JIT
    Depending on how frequently it's called, it may be better to
    balance compilation time and optimization level
    Vladimir is working on light JIT compilation
    Sometimes people deploy an application every 10 minutes

    View Slide

  99. 4. Profile-guided JIT
    C compiler has a feature to profile compiled code and
    generate faster code using the profiling result
    Using the multi-tier JIT, we may be able to profile code in
    the first tier and generate faster code in the second tier

    View Slide

  100. 5. Better JIT scheduler for Rails
    In Rails, an application becomes slower only during JIT
    compilation happens
    The possible cause might be the number of methods to
    be JIT-ed, compared to some other benchmarks
    Possibly we should reduce the number of methods to be
    JIT-ed or reduce frequency of JIT compilation

    View Slide

  101. 6. Ruby / C method inlining
    I already succeeded to implement Ruby method inlining,
    but it increases compilation time
    I have ideas to implement C method inlining, but which
    method to be inlined should be solved first

    View Slide

  102. 7. Exploit more C compiler optimizations
    Loop invariant motion
    Folding Ruby's constant
    Type check removal by type inference
    Reduce unnecessary memory accesses to VM registers

    View Slide

  103. Conclusion
    2.6.0-preview2 will be much faster than 2.6.0-preview1
    (Still not ready for Rails)
    We still have so many things to be done for Ruby 3x3

    View Slide