Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Method JIT Compiler

The Method JIT Compiler

RubyKaigi 2018
http://rubykaigi.org/2018

Takashi Kokubun

June 02, 2018
Tweet

More Decks by Takashi Kokubun

Other Decks in Programming

Transcript

  1. T R E A S U R E D A T A
    The Method JIT Compiler for Ruby 2.6
    Takashi Kokubun / @k0kubun
    RubyKaigi 2018

    View full-size slide

  2. T R E A S U R E D A T A
    Maintainer of ERB, Haml
    Developing JIT compiler for Ruby 2.6
    @k0kubun

    View full-size slide

  3. • 2017 Sep: LLVM JIT (EN)
    • 2017 Nov: YARV MJIT (EN)
    • 2017 Dec: YARV MJIT (JA)
    • 2018 Feb: ERB generation (JA)
    • 2018 Apr: Preview2 optimizations (EN)
    .ZQBTUUBMLTBCPVU3VCZT+*5
    https://speakerdeck.com/k0kubun

    View full-size slide

  4. 1. Current Status
    2. JIT on Rails
    3. Dive Into Native Code
    4. Method Inlining
    5PEBZTUBML

    View full-size slide

  5. 1. CURRENT STATUS

    View full-size slide

  6. • JIT in 2.6.0-preview2 is not production ready yet
    • Fixing bugs by a race condition
    • I'll introduce current status of:
    • Implementation
    • Portability
    • Performance
    $VSSFOUTUBUVT

    View full-size slide

  7. IMPLEMENTATION

    View full-size slide

  8. .+*53VCZ`T+*5BSDIJUFDUVSF
    Ruby Process
    Disk Memory
    MJIT worker
    Thread
    Method #1
    Bytecode
    Ruby VM
    Thread
    Interpret

    View full-size slide

  9. .+*53VCZ`T+*5BSDIJUFDUVSF
    Ruby Process
    Disk Memory
    MJIT worker
    Thread
    Method #1
    Bytecode
    Ruby VM
    Thread
    Interpret
    Request
    JIT-ing #1

    View full-size slide

  10. .+*53VCZ`T+*5BSDIJUFDUVSF
    Ruby Process
    Disk Memory
    Method #1
    Bytecode
    Ruby VM
    Thread
    Interpret
    Request
    JIT-ing #1
    Method #1
    C code
    MJIT worker
    Thread
    Generate

    View full-size slide

  11. .+*53VCZ`T+*5BSDIJUFDUVSF
    Ruby Process
    Disk Memory
    Method #1
    Bytecode
    Ruby VM
    Thread
    Interpret
    Request
    JIT-ing #1
    Method #1
    C code
    MJIT worker
    Thread
    Generate
    Method #1
    SO file
    Run C compiler

    View full-size slide

  12. .+*53VCZ`T+*5BSDIJUFDUVSF
    Ruby Process
    Disk Memory
    Method #1
    Bytecode
    Ruby VM
    Thread
    Interpret
    Request
    JIT-ing #1
    Method #1
    C code
    MJIT worker
    Thread
    Generate
    Method #1
    SO file
    Run C compiler
    Method #1
    Native code
    Load

    View full-size slide

  13. .+*53VCZ`T+*5BSDIJUFDUVSF
    Ruby Process
    Disk Memory
    Method #1
    Bytecode
    Interpret
    Request
    JIT-ing #1
    Method #1
    C code
    MJIT worker
    Thread
    Generate
    Method #1
    SO file
    Run C compiler
    Method #1
    Native code
    Load
    Ruby VM
    Thread
    Call

    View full-size slide

  14. )PXJTUIJTJNQMFNFOUFE
    Ruby Process
    Disk Memory
    Method #1
    Bytecode
    Interpret
    Request
    JIT-ing #1
    Method #1
    C code
    MJIT worker
    Thread
    Generate
    Method #1
    SO file
    Run C compiler
    Method #1
    Native code
    Load
    Ruby VM
    Thread
    Call

    View full-size slide

  15. )PXJTUIJTJNQMFNFOUFE
    Ruby Code
    def three
    1 + 2
    end
    Bytecode
    putobject 1
    putobject 2
    opt_plus
    leave
    =

    View full-size slide

  16. )PXJTUIJTJNQMFNFOUFE
    Ruby Code
    def three
    1 + 2
    end
    Bytecode
    putobject 1
    putobject 2
    opt_plus
    leave
    =
    C code
    three() {
    VALUE stack[2];
    /* putobject 1 */
    stack[0] = 1;
    }

    View full-size slide

  17. )PXJTUIJTJNQMFNFOUFE
    Ruby Code
    def three
    1 + 2
    end
    Bytecode
    putobject 1
    putobject 2
    opt_plus
    leave
    =
    C code
    three() {
    VALUE stack[2];
    /* putobject 1 */
    stack[0] = 1;
    /* putobject 2 */
    stack[1] = 2;
    }

    View full-size slide

  18. )PXJTUIJTJNQMFNFOUFE
    Ruby Code
    def three
    1 + 2
    end
    Bytecode
    putobject 1
    putobject 2
    opt_plus
    leave
    =
    C code
    three() {
    VALUE stack[2];
    /* putobject 1 */
    stack[0] = 1;
    /* putobject 2 */
    stack[1] = 2;
    /* opt_plus */
    stack[0] = opt_plus(
    stack[0], stack[1]
    );
    }

    View full-size slide

  19. )PXJTUIJTJNQMFNFOUFE
    Ruby Code
    def three
    1 + 2
    end
    Bytecode
    putobject 1
    putobject 2
    opt_plus
    leave
    =
    C code
    three() {
    VALUE stack[2];
    /* putobject 1 */
    stack[0] = 1;
    /* putobject 2 */
    stack[1] = 2;
    /* opt_plus */
    stack[0] = opt_plus(
    stack[0], stack[1]
    );
    /* leave */
    return stack[0];
    }

    View full-size slide

  20. $DPEFHFOFSBUPS
    ERB template: mjit_compile.inc.erb

    View full-size slide

  21. $DPEFHFOFSBUPS
    ERB template: mjit_compile.inc.erb
    VM instructions: insns.def

    View full-size slide

  22. $DPEFHFOFSBUPS
    ERB template: mjit_compile.inc.erb
    VM instructions: insns.def
    C code generator: mjit_compile.inc
    Render Copy definition

    View full-size slide

  23. • Based on Ruby 2.5’s Ruby VM
    • If JIT is disabled, everything must work in 2.6
    • JIT implementation is automatically generated
    • To keep up with frequent Ruby VM changes
    3VCZT+*5EFTJHO

    View full-size slide

  24. $DPNQJMFSTVQQPSUT
    GCC Clang Visual C++
    Intel C++
    Compiler
    MJIT worker ○ ○ ○ ○
    JIT header ○ ○ × ○
    CLI support ○ ○ ○ ×
    Support plan Done Done Next Later
    Now MJIT worker (native thread, dynamic loading) runs on Windows and UNIX

    View full-size slide

  25. 1MBUGPSNTVQQPSUTXJUI($$
    Linux MinGW Solaris NetBSD FreeBSD
    JIT header ○ ˚ ○ ○ ○
    test_jit.rb ○ ○ ○ ? ×
    MinGW header is not minimized and thus compilation speed is slow.
    I guess NetBSD works but we don’t have NetBSD RubyCI. GCC on FreeBSD is crashing.

    View full-size slide

  26. 1MBUGPSNTVQQPSUTXJUI$MBOH
    Linux macOS OpenBSD
    JIT header ○ ○ ○
    test_jit.rb ○ ○ ?
    I guess OpenBSD works but we don’t have OpenBSD RubyCI

    View full-size slide

  27. 3VCZ,BJHJ-5--7.+*5

    View full-size slide

  28. 3VCZ,BJHJ-5--7.+*5

    View full-size slide

  29. 3VCZ,BJHJ 5IJTUBML

    https://benchmark-driver.github.io/benchmarks/mjit/commits.html

    View full-size slide

  30. 3VCZ,BJHJ 5IJTUBML

    https://benchmark-driver.github.io/benchmarks/mjit/commits.html
    2.6.0
    Preview1
    2.6.0
    Preview2
    5.7x faster

    View full-size slide

  31. 0QUDBSSPU
    GQT





    Ruby 2.0 trunk trunk+JIT



    1.49x → 2.03x
    https://gist.github.com/k0kubun/95c81358af6f34b4d0a71425da871178

    View full-size slide

  32. 3BJMT %JTDPVSTF

    View full-size slide

  33. 3BJMT %JTDPVSTF

    View full-size slide

  34. 2. JIT ON RAILS

    View full-size slide

  35. • Generated code should be faster in general
    • What's different from Optcarrot?
    8IZ3BJMTCFDPNFTTMPXXJUI+*5

    View full-size slide

  36. 1. longjmp by exception is slow
    2. Profiling method calls has overhead
    3. JIT-ed call is canceled too often
    4. JIT compilation has overhead
    5. Calling JIT-ed code has overhead
    .ZIZQPUIFTJT

    View full-size slide

  37. • When a method is returned from its child block,
    it calls longjmp(3)
    • VM is implemented with just return statement
    and may be faster in that case
    MPOHKNQCZFYDFQUJPOJTTMPX

    View full-size slide

  38. -FUTDIFDLJGMPOHKNQJTDBMMFE
    • Fortunately, longjmp was not used in this
    Discourse endpoint

    View full-size slide

  39. • MJIT counts method calls to decide which
    method to compile with JIT enabled
    • This was suspected in [Bug #14490]
    1SPGJMJOHNFUIPEDBMMTIBTPWFSIFBE

    View full-size slide

  40. -FUTDPVOUJUFWFOJG+*5JTEJTBCMFE

    View full-size slide

  41. /PCJHEJGGFSFODFCZQSPGJMJOHNFUIPEDBMMT
    trunk
    No options
    modified
    No options
    trunk
    --jit
    JIT × × ○
    Profiling × ○ ○
    Percentile: ms
    GET /:
    50: 58.4ms
    75: 65.4ms
    90: 67.9ms
    99: 131.1ms
    GET /:
    50: 58.5ms
    75: 64.6ms
    90: 67.8ms
    99: 127.3ms
    GET /:
    50: 66.3ms
    75: 72.3ms
    90: 77.0ms
    99: 133.3ms
    `ruby script/simple_bench.rb 1000` with:
    https://github.com/k0kubun/discourse/tree/20fc03558f16aff94c6c017347783374cf4a0ca8

    View full-size slide

  42. • MJIT has a kind of de-optimization to fallback to
    VM interpretation when any assumption is not met
    • ex) Method redefinition, etc.
    • Such fallback might be an overhead
    +*5FEDBMMJTDBODFMMFEUPPPGUFO

    View full-size slide

  43. -FUTMPHBMM+*5DBODFMMBUJPO

    View full-size slide

  44. 5IFSBUJPPG+*5DBODFMMBUJPO
    JIT-ed calls
    Cancel by
    opt_xxx
    Cancel by
    call cache
    Optcarrot 49,171,765
    786,842
    (1.60%)
    0
    (0.00%)
    Discourse
    1,000 requests
    168,925,050
    19,394,792
    (11.5%)
    10,092,254
    (5.97%)
    JIT cancel reasons:
    • opt_xxx: Non-core class is given to +, -, *, /, #[], etc.
    • call cache: Method redefinition, receiver class is changed

    View full-size slide

  45. 8IZ+*5DBODFMIBQQFOTTPPGUFO
    • Current JIT doesn't discard any JIT-ed code
    whose assumption is not met
    • opt_xxx is performing badly when a receiver is not
    a core class like Integer, Float, String, Array, Hash

    View full-size slide

  46. 8IZ+*5DBODFMIBQQFOTTPPGUFO
    • Current JIT doesn't discard any JIT-ed code
    whose assumption is not met
    • opt_xxx is performing badly when a receiver is not
    a core class like Integer, Float, String, Array, Hash
    There are many #[] for non Hash/Array classes in Rails

    View full-size slide

  47. *GJYFEUIJTJTTVFGPS<> S

    View full-size slide

  48. PQU@YYYDBODFMJTEFDSFBTFENVDI
    JIT-ed calls
    Cancel by
    opt_xxx
    Cancel by
    call cache
    Discourse
    Before
    168,925,050
    19,394,792
    (11.5%)
    10,092,254
    (5.97%)
    Discourse
    After
    75,150,482
    2,849,825
    (3.79%)
    3,072,673
    (4.09%)
    #[] has a major impact on Rails. Others are to be improved...

    View full-size slide

  49. • Appending a method to JIT-ed queue may have
    overhead
    • GCC or Clang may use the same CPU core, or it
    may cost to transfer data to another core
    +*5DPNQJMBUJPOIBTPWFSIFBE

    View full-size slide

  50. 1SFQBSF3VCZ7..+*5TUPQBOE
    Stop JIT compilation

    View full-size slide

  51. +*5FOBCMFEWT+*5TUPQQFE
    JIT enabled
    1000 requests
    JIT enabled
    1000 requests
    JIT stopped
    1000 requests
    RubyVM::MJIT.stop
    Measure

    View full-size slide

  52. +*5DPNQJMBUJPOIBEPWFSIFBE
    No options --jit → Stop --jit
    Code is JIT-ed × ○ ○
    JIT is going on × × ○
    Percentile: ms
    GET /:
    50: 60.4ms
    75: 66.9ms
    90: 69.6ms
    99: 125.4ms
    GET /:
    50: 65.1ms
    75: 72.4ms
    90: 75.8ms
    99: 145.6ms
    GET /:
    50: 68.4ms
    75: 74.8ms
    90: 80.0ms
    99: 137.2ms
    But this overhead is excluded from [Bug #14490] degradation…

    View full-size slide

  53. • JIT-ed code behaves slower only on an exception
    or JIT cancellation, but they weren’t culprit
    • JIT compilation does not dominate the slowness
    • Then, calling native code has overhead…?
    $BMMJOH+*5FEDPEFIBTPWFSIFBE

    View full-size slide

  54. -FU`TDIFDL+*5FEDBMMPWFSIFBE

    View full-size slide

  55. -FU`TDIFDL+*5FEDBMMPWFSIFBE

    View full-size slide

  56. $BMMJOH+*5FEDPEFXBTTMPX
    JIT disabled JIT enabled
    Duration 2.17s 2.45s

    View full-size slide

  57. -FU`TQSPGJMFXJUIQFSG

    View full-size slide

  58. (VBSEGPSKJUXBJUUBLFTUJNF

    View full-size slide

  59. 4LJQKJUXBJUDIFDLJONBJOCSBODI
    r63480

    View full-size slide

  60. *NQSPWFEBMJUUMF
    JIT disabled JIT enabled
    Duration 2.17s
    2.31s
    (-0.14s)

    View full-size slide

  61. 3FNBJOJOHTXBTGPS
    Additional memory access here
    …But it wasn’t a big deal in Rails

    View full-size slide

  62. 8IBUJGUIFSFBSFBMPUPGNFUIPET

    View full-size slide

  63. $BMMJOHNBOZEJGGFSFOUNFUIPETJTTMPX
    Called methods 1 method 15 methods
    JIT disabled 3.69s 3.71s
    JIT enabled 3.79s 5.34s
    Duration with the same total calls

    View full-size slide

  64. $BMMJOHNBOZEJGGFSFOUNFUIPETJTTMPX
    0
    1.5
    3
    4.5
    6
    1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
    VM JIT

    View full-size slide

  65. $BMMJOHNBOZEJGGFSFOUNFUIPETJTTMPX
    0
    1.5
    3
    4.5
    6
    1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
    VM JIT
    6 12 19

    View full-size slide

  66. )PUNFUIPETPG0QUDBSSPU
    Top 6 methods
    dominate 50%

    View full-size slide

  67. )PUNFUIPETPG%JTDPVSTF
    Top 6 methods are only 18%
    They are not so hot.

    View full-size slide

  68. 8IZEPFTUIJTIBQQFO
    0
    1.5
    3
    4.5
    6
    1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
    VM JIT
    6 12 19

    View full-size slide

  69. -FU`TTFFlQFSGTUBUz
    6 methods
    40 methods

    View full-size slide

  70. lJOTOQFSDZDMFzJTWFSZEJGGFSFOU
    6 methods
    40 methods

    View full-size slide

  71. YDZDMFTGPSBMNPTUUIFTBNFJOTOT
    6 methods
    40 methods

    View full-size slide

  72. &BDINFUIPE TPGJMF
    JTVTJOHPOFQBHF .#

    View full-size slide

  73. 40 methods w/ the same so file PoC

    View full-size slide

  74. • Ongoing JIT compilation may have overhead
    • JIT cancel is happening frequently (to be fixed)
    • It stalls to load many different methods (to be fixed)
    3FBTPOPG3BJMTTMPXEPXOPO+*5

    View full-size slide

  75. 3. DIVE INTO NATIVE CODE

    View full-size slide

  76. &YBNQMF
    Ruby Code
    def three
    1 + 2
    end
    Bytecode
    putobject 1
    putobject 2
    opt_plus
    leave
    =

    View full-size slide

  77. &YBNQMF
    Ruby Code
    def three
    1 + 2
    end
    Bytecode
    putobject 1
    putobject 2
    opt_plus
    leave
    =
    C code
    three() {
    VALUE stack[2];
    /* putobject 1 */
    stack[0] = 1;
    /* putobject 2 */
    stack[1] = 2;
    /* opt_plus */
    stack[0] = opt_plus(
    stack[0], stack[1]
    );
    /* leave */
    return stack[0];
    }

    View full-size slide

  78. More detailed definition
    before inlining opt_plus

    View full-size slide

  79. opt_plus inlined

    View full-size slide

  80. Native code
    generated by GCC
    (output of perf)

    View full-size slide

  81. Integer#+
    redefinition check

    View full-size slide

  82. JIT cancel handler
    Let's ignore this
    Integer#+
    redefinition check

    View full-size slide

  83. Integer#+
    redefinition check

    View full-size slide

  84. SET_SP: VM's behavior
    which can be removed
    Integer#+
    redefinition check

    View full-size slide

  85. Check interrupts like
    SIGINT, another thread
    Interruption handler
    Integer#+
    redefinition check
    SET_SP: VM's behavior
    which can be removed

    View full-size slide

  86. Interruption handler
    Check interrupts like
    SIGINT, another thread
    Pop VM call frame
    Integer#+
    redefinition check
    SET_SP: VM's behavior
    which can be removed

    View full-size slide

  87. Pop VM call frame
    Interruption handler
    Check interrupts like
    SIGINT, another thread
    Return 3
    FIX2INT(0x7) == 3
    Integer#+
    redefinition check
    SET_SP: VM's behavior
    which can be removed

    View full-size slide

  88. Instruction dispatch
    Instruction dispatch
    Instruction dispatch
    Instruction dispatch
    1. Instruction dispatch
    cost is removed

    View full-size slide

  89. Program counter motion
    Program counter motion
    Program counter motion
    Program counter motion
    2. No program counter
    motion

    View full-size slide

  90. Stack pointer motion
    Stack pointer motion
    Stack pointer motion
    Forgot to remove this
    3. Stack pointer motion
    is reduced

    View full-size slide

  91. 4. This optimization is
    delegated to GCC

    View full-size slide

  92. 8IBUJGJUTNPSFDPNQMFY
    def six
    4 + 8 - 3 * 4 / 2
    end

    View full-size slide

  93. Fixnum#+ redefinition check
    Fixnum#* redefinition check
    Fixnum#/ redefinition check
    Fixnum#- redefinition check
    Return 6
    FIX2INT(0xd) == 6

    View full-size slide

  94. -BTU&YBNQMFXIJMFMPPQ
    def while_loop
    i = 0
    while i < 1000000
    i += 1
    end
    end
    i = 0
    while i < 2000
    while_loop
    i += 1
    end

    View full-size slide

  95. -BTU&YBNQMFXIJMFMPPQ
    VM: 22.9s JIT: 2.8s
    Why it becomes so faster?
    8.18x faster

    View full-size slide

  96. -FUTTFFUIFOBUJWFDPEFPGXIJMF@MPPQ
    i = 0
    while i < 1000000
    i += 1
    end

    View full-size slide

  97. j
    They are JIT cancel handlers
    Let's ignore them

    View full-size slide

  98. They are interruption handlers
    Let's ignore them

    View full-size slide

  99. They are write-barrier-related slow paths
    Let's ignore them

    View full-size slide

  100. It is Bignum promotion handler
    Let's ignore them

    View full-size slide

  101. "HBJO MFUTTFFUIFOBUJWFDPEFPG
    i = 0
    while i < 1000000
    i += 1
    end

    View full-size slide

  102. i = 0
    check
    interrupts
    c

    View full-size slide

  103. i = 0 c
    Fixnum?(i)
    for #<
    check
    interrupts

    View full-size slide

  104. i = 0 c
    check
    interrupts
    Fixnum#<
    redefined?
    Fixnum?(i)
    for #<

    View full-size slide

  105. i = 0 c
    check
    interrupts
    i < 1000000
    Fixnum?(i)
    for #<
    Fixnum#<
    redefined?

    View full-size slide

  106. i = 0 c
    i < 1000000
    check
    interrupts
    check
    interrupts
    Fixnum?(i)
    for #<
    Fixnum#<
    redefined?

    View full-size slide

  107. i = 0 c
    i < 1000000
    check
    interrupts
    check
    interrupts
    Fixnum#+
    redefined?
    Fixnum?(i)
    for #<
    Fixnum#<
    redefined?

    View full-size slide

  108. i = 0 c
    i < 1000000
    check
    interrupts
    check
    interrupts
    i + 1
    Fixnum?(i)
    for #<
    Fixnum#<
    redefined?
    Fixnum#+
    redefined?

    View full-size slide

  109. i = 0 c
    i < 1000000
    check
    interrupts
    check
    interrupts
    Int overflow?
    i + 1
    Fixnum?(i)
    for #<
    Fixnum#<
    redefined?
    Fixnum#+
    redefined?

    View full-size slide

  110. i = 0 c
    i < 1000000
    check
    interrupts
    check
    interrupts
    Int overflow?
    can't optimize #+ ?
    i + 1
    Fixnum?(i)
    for #<
    Fixnum#<
    redefined?
    Fixnum#+
    redefined?

    View full-size slide

  111. i = 0 c
    i < 1000000
    check
    interrupts
    check
    interrupts
    Int overflow?
    can't optimize #+ ?
    i + 1
    Fixnum?(i)
    for #<
    Fixnum#<
    redefined?
    Fixnum#+
    redefined?
    set i for VM
    + check WB

    View full-size slide

  112. i = 0 c
    i < 1000000
    check
    interrupts
    check
    interrupts
    Int overflow?
    can't optimize #+ ?
    i + 1
    Fixnum?(i)
    for #<
    Fixnum#<
    redefined?
    Fixnum#+
    redefined?
    set i for VM
    + check WB
    set i for JIT

    View full-size slide

  113. i = 0 c
    i < 1000000
    check
    interrupts
    check
    interrupts
    Int overflow?
    i + 1
    Fixnum?(i)
    for #<
    Fixnum#<
    redefined?
    Fixnum#+
    redefined?
    can't optimize #+ ?
    set i for VM
    + check WB
    set i for JIT

    View full-size slide

  114. • #+ and #< are performed on not VM stack but registers
    • #+ and #< share some instructions to check redefinition
    • Unnecessary type checks are omitted from the loop
    8IZXIJMFMPPQCFDPNFTGBTUFS

    View full-size slide

  115. 4. METHOD INLINING

    View full-size slide

  116. • Many optimizations are possible because C
    compiler can know definitions
    • If we could inline methods, C compiler would
    be able to optimize more
    -FU$DPNQJMFSXPSLIBSE

    View full-size slide

  117. 1. x
    2. x
    3. x
    8IFOJTNFUIPEJOMJOJOHQPTTJCMF

    View full-size slide

  118. 1. JIT compiler can know definitions
    2. JIT compiler can modify code to call a method
    3. Inlined code can be invalidated
    8IFOJTNFUIPEJOMJOJOHQPTTJCMF

    View full-size slide

  119. 1. JIT compiler can know definitions
    2. JIT compiler can modify code to call a method
    3. Inlined code can be invalidated
    8IFOJTNFUIPEJOMJOJOHQPTTJCMF

    View full-size slide

  120. 1. JIT compiler can know definitions
    2. JIT compiler can modify code to call a method
    3. Inlined code can be invalidated
    8IFOJTNFUIPEJOMJOJOHQPTTJCMF

    View full-size slide

  121. • Ruby method
    • called by Ruby method
    • called by C method
    • Ruby block
    • yield-ed by Ruby method
    • called by C method
    • C method
    • called by Ruby method
    • called by C method
    .BKPSJOMJOFUBSHFUT

    View full-size slide

  122. • Ruby method
    • called by Ruby method => easy
    • called by C method
    • Ruby block
    • yield-ed by Ruby method
    • called by C method
    • C method
    • called by Ruby method
    • called by C method
    .BKPSJOMJOFUBSHFUT
    JIT compiler can deal
    with bytecode easily
    Method cache can be
    used for invalidation

    View full-size slide

  123. • Ruby method
    • called by Ruby method => easy
    • called by C method
    • Ruby block
    • yield-ed by Ruby method => medium
    • called by C method
    • C method
    • called by Ruby method => medium
    • called by C method
    .BKPSJOMJOFUBSHFUT
    yield doesn't
    have cache
    Sometimes it's hard
    to know definitions

    View full-size slide

  124. • Ruby method
    • called by Ruby method => easy
    • called by C method => hard
    • Ruby block
    • yield-ed by Ruby method => medium
    • called by C method => hard
    • C method
    • called by Ruby method => medium
    • called by C method => hard
    .BKPSJOMJOFUBSHFUT
    There is no cache
    key for invalidation
    How to modify
    C code?

    View full-size slide

  125. 3VCZˠ$ˠ3VCZJOMJOJOHQSPCMFN
    ret = 0
    1000000.times do |i|
    ret += i
    end
    ret

    View full-size slide

  126. ret = 0
    1000000.times do |i|
    ret += i
    end
    ret
    Ruby -> C method call
    medium
    Integer#times is defined with C
    3VCZˠ$ˠ3VCZJOMJOJOHQSPCMFN

    View full-size slide

  127. ret = 0
    1000000.times do |i|
    ret += i
    end
    ret
    Ruby -> C method call
    medium
    Integer#times is defined with C
    C -> Ruby block call
    hard
    3VCZˠ$ˠ3VCZJOMJOJOHQSPCMFN

    View full-size slide

  128. What if...
    Ruby can be faster than C?

    View full-size slide

  129. What if...
    Ruby can be faster than C?

    View full-size slide

  130. -FUTEFGJOF*OUFHFSUJNFTXJUI3VCZ
    https://github.com/rubinius/rubinius/blob/master/core/integer.rb

    View full-size slide

  131. • Ruby method
    • called by Ruby method => easy
    • called by C method => hard
    • Ruby block
    • yield-ed by Ruby method => medium
    • called by C method => hard
    • C method
    • called by Ruby method => medium
    • called by C method => hard
    *JNQMFNFOUFEBQSPUPUZQFUPJOMJOFUIJT
    https://github.com/k0kubun/ruby/commits/mjit-inline-send-yield

    View full-size slide

  132. 5JNFUPCFODINBSL

    View full-size slide

  133. *OUFHFSUJNFTCFODINBSLSFTVMUT
    Integer#times
    in C
    Integer#times
    in Ruby
    VM
    145.44s
    1.00x
    156.38s
    0.93x
    JIT
    time ruby --disable-gems times_loop.rb

    View full-size slide

  134. *OUFHFSUJNFTCFODINBSLSFTVMUT
    Integer#times
    in C
    Integer#times
    in Ruby
    VM
    145.44s
    1.00x
    156.38s
    0.93x
    JIT
    104.80s
    1.39x
    time ruby --disable-gems times_loop.rb

    View full-size slide

  135. *OUFHFSUJNFTCFODINBSLSFTVMUT
    Integer#times
    in C
    Integer#times
    in Ruby
    VM
    145.44s
    1.00x
    156.38s
    0.93x
    JIT
    104.80s
    1.39x
    56.46s
    2.56x
    time ruby --disable-gems times_loop.rb

    View full-size slide

  136. C language is dead

    View full-size slide

  137. • Rails performance is going to be improved
    • JIT can eliminate many instructions
    • C language will be useless in the future
    $PODMVTJPO

    View full-size slide