The Method JIT Compiler

The Method JIT Compiler

RubyKaigi 2018
http://rubykaigi.org/2018

08d5432a5bc31e6d9edec87b94cb1db1?s=128

Takashi Kokubun

June 02, 2018
Tweet

Transcript

  1. 1.

    T R E A S U R E D A

    T A The Method JIT Compiler for Ruby 2.6 Takashi Kokubun / @k0kubun RubyKaigi 2018
  2. 2.

    T R E A S U R E D A

    T A Maintainer of ERB, Haml Developing JIT compiler for Ruby 2.6 @k0kubun
  3. 3.
  4. 4.

    • 2017 Sep: LLVM JIT (EN) • 2017 Nov: YARV

    MJIT (EN) • 2017 Dec: YARV MJIT (JA) • 2018 Feb: ERB generation (JA) • 2018 Apr: Preview2 optimizations (EN) .ZQBTUUBMLTBCPVU3VCZT+*5 https://speakerdeck.com/k0kubun
  5. 5.

    1. Current Status 2. JIT on Rails 3. Dive Into

    Native Code 4. Method Inlining 5PEBZTUBML
  6. 7.

    • JIT in 2.6.0-preview2 is not production ready yet •

    Fixing bugs by a race condition • I'll introduce current status of: • Implementation • Portability • Performance $VSSFOUTUBUVT
  7. 10.
  8. 11.

    .+*53VCZ`T+*5BSDIJUFDUVSF Ruby Process Disk Memory Method #1 Bytecode Ruby VM

    Thread Interpret Request JIT-ing #1 Method #1 C code MJIT worker Thread Generate
  9. 12.

    .+*53VCZ`T+*5BSDIJUFDUVSF Ruby Process Disk Memory Method #1 Bytecode Ruby VM

    Thread Interpret Request JIT-ing #1 Method #1 C code MJIT worker Thread Generate Method #1 SO file Run C compiler
  10. 13.

    .+*53VCZ`T+*5BSDIJUFDUVSF Ruby Process Disk Memory Method #1 Bytecode Ruby VM

    Thread Interpret Request JIT-ing #1 Method #1 C code MJIT worker Thread Generate Method #1 SO file Run C compiler Method #1 Native code Load
  11. 14.

    .+*53VCZ`T+*5BSDIJUFDUVSF Ruby Process Disk Memory Method #1 Bytecode Interpret Request

    JIT-ing #1 Method #1 C code MJIT worker Thread Generate Method #1 SO file Run C compiler Method #1 Native code Load Ruby VM Thread Call
  12. 15.

    )PXJTUIJTJNQMFNFOUFE Ruby Process Disk Memory Method #1 Bytecode Interpret Request

    JIT-ing #1 Method #1 C code MJIT worker Thread Generate Method #1 SO file Run C compiler Method #1 Native code Load Ruby VM Thread Call
  13. 16.

    )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave =
  14. 17.

    )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; }
  15. 18.

    )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; /* putobject 2 */ stack[1] = 2; }
  16. 19.

    )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; /* putobject 2 */ stack[1] = 2; /* opt_plus */ stack[0] = opt_plus( stack[0], stack[1] ); }
  17. 20.

    )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; /* putobject 2 */ stack[1] = 2; /* opt_plus */ stack[0] = opt_plus( stack[0], stack[1] ); /* leave */ return stack[0]; }
  18. 24.

    • Based on Ruby 2.5’s Ruby VM • If JIT

    is disabled, everything must work in 2.6 • JIT implementation is automatically generated • To keep up with frequent Ruby VM changes 3VCZT+*5EFTJHO
  19. 26.

    $DPNQJMFSTVQQPSUT GCC Clang Visual C++ Intel C++ Compiler MJIT worker

    ◦ ◦ ◦ ◦ JIT header ◦ ◦ × ◦ CLI support ◦ ◦ ◦ × Support plan Done Done Next Later Now MJIT worker (native thread, dynamic loading) runs on Windows and UNIX
  20. 27.

    1MBUGPSNTVQQPSUTXJUI($$ Linux MinGW Solaris NetBSD FreeBSD JIT header ◦ ˚

    ◦ ◦ ◦ test_jit.rb ◦ ◦ ◦ ? × MinGW header is not minimized and thus compilation speed is slow. I guess NetBSD works but we don’t have NetBSD RubyCI. GCC on FreeBSD is crashing.
  21. 28.

    1MBUGPSNTVQQPSUTXJUI$MBOH Linux macOS OpenBSD JIT header ◦ ◦ ◦ test_jit.rb

    ◦ ◦ ? I guess OpenBSD works but we don’t have OpenBSD RubyCI
  22. 34.

    0QUDBSSPU GQT      Ruby 2.0 trunk

    trunk+JIT    1.49x → 2.03x https://gist.github.com/k0kubun/95c81358af6f34b4d0a71425da871178
  23. 38.

    • Generated code should be faster in general • What's

    different from Optcarrot? 8IZ3BJMTCFDPNFTTMPXXJUI+*5
  24. 39.

    1. longjmp by exception is slow 2. Profiling method calls

    has overhead 3. JIT-ed call is canceled too often 4. JIT compilation has overhead 5. Calling JIT-ed code has overhead .ZIZQPUIFTJT
  25. 40.

    • When a method is returned from its child block,

    it calls longjmp(3) • VM is implemented with just return statement and may be faster in that case MPOHKNQCZFYDFQUJPOJTTMPX
  26. 42.

    • MJIT counts method calls to decide which method to

    compile with JIT enabled • This was suspected in [Bug #14490] 1SPGJMJOHNFUIPEDBMMTIBTPWFSIFBE
  27. 44.

    /PCJHEJGGFSFODFCZQSPGJMJOHNFUIPEDBMMT trunk No options modified No options trunk --jit JIT

    × × ◦ Profiling × ◦ ◦ Percentile: ms GET /: 50: 58.4ms 75: 65.4ms 90: 67.9ms 99: 131.1ms GET /: 50: 58.5ms 75: 64.6ms 90: 67.8ms 99: 127.3ms GET /: 50: 66.3ms 75: 72.3ms 90: 77.0ms 99: 133.3ms `ruby script/simple_bench.rb 1000` with: https://github.com/k0kubun/discourse/tree/20fc03558f16aff94c6c017347783374cf4a0ca8
  28. 45.

    • MJIT has a kind of de-optimization to fallback to

    VM interpretation when any assumption is not met • ex) Method redefinition, etc. • Such fallback might be an overhead +*5FEDBMMJTDBODFMMFEUPPPGUFO
  29. 47.

    5IFSBUJPPG+*5DBODFMMBUJPO JIT-ed calls Cancel by opt_xxx Cancel by call cache

    Optcarrot 49,171,765 786,842 (1.60%) 0 (0.00%) Discourse 1,000 requests 168,925,050 19,394,792 (11.5%) 10,092,254 (5.97%) JIT cancel reasons: • opt_xxx: Non-core class is given to +, -, *, /, #[], etc. • call cache: Method redefinition, receiver class is changed
  30. 48.

    8IZ+*5DBODFMIBQQFOTTPPGUFO • Current JIT doesn't discard any JIT-ed code whose

    assumption is not met • opt_xxx is performing badly when a receiver is not a core class like Integer, Float, String, Array, Hash
  31. 49.

    8IZ+*5DBODFMIBQQFOTTPPGUFO • Current JIT doesn't discard any JIT-ed code whose

    assumption is not met • opt_xxx is performing badly when a receiver is not a core class like Integer, Float, String, Array, Hash There are many #[] for non Hash/Array classes in Rails
  32. 51.

    PQU@YYYDBODFMJTEFDSFBTFENVDI JIT-ed calls Cancel by opt_xxx Cancel by call cache

    Discourse Before 168,925,050 19,394,792 (11.5%) 10,092,254 (5.97%) Discourse After 75,150,482 2,849,825 (3.79%) 3,072,673 (4.09%) #[] has a major impact on Rails. Others are to be improved...
  33. 52.

    • Appending a method to JIT-ed queue may have overhead

    • GCC or Clang may use the same CPU core, or it may cost to transfer data to another core +*5DPNQJMBUJPOIBTPWFSIFBE
  34. 55.

    +*5DPNQJMBUJPOIBEPWFSIFBE No options --jit → Stop --jit Code is JIT-ed

    × ◦ ◦ JIT is going on × × ◦ Percentile: ms GET /: 50: 60.4ms 75: 66.9ms 90: 69.6ms 99: 125.4ms GET /: 50: 65.1ms 75: 72.4ms 90: 75.8ms 99: 145.6ms GET /: 50: 68.4ms 75: 74.8ms 90: 80.0ms 99: 137.2ms But this overhead is excluded from [Bug #14490] degradation…
  35. 56.

    • JIT-ed code behaves slower only on an exception or

    JIT cancellation, but they weren’t culprit • JIT compilation does not dominate the slowness • Then, calling native code has overhead…? $BMMJOH+*5FEDPEFIBTPWFSIFBE
  36. 66.

    $BMMJOHNBOZEJGGFSFOUNFUIPETJTTMPX Called methods 1 method 15 methods JIT disabled 3.69s

    3.71s JIT enabled 3.79s 5.34s Duration with the same total calls
  37. 67.

    $BMMJOHNBOZEJGGFSFOUNFUIPETJTTMPX 0 1.5 3 4.5 6 1 3 5 7

    9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 VM JIT
  38. 68.

    $BMMJOHNBOZEJGGFSFOUNFUIPETJTTMPX 0 1.5 3 4.5 6 1 3 5 7

    9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 VM JIT 6 12 19
  39. 71.

    8IZEPFTUIJTIBQQFO 0 1.5 3 4.5 6 1 3 5 7

    9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 VM JIT 6 12 19
  40. 75.
  41. 79.

    • Ongoing JIT compilation may have overhead • JIT cancel

    is happening frequently (to be fixed) • It stalls to load many different methods (to be fixed) 3FBTPOPG3BJMTTMPXEPXOPO+*5
  42. 81.

    &YBNQMF  Ruby Code def three 1 + 2 end

    Bytecode putobject 1 putobject 2 opt_plus leave =
  43. 82.

    &YBNQMF  Ruby Code def three 1 + 2 end

    Bytecode putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; /* putobject 2 */ stack[1] = 2; /* opt_plus */ stack[0] = opt_plus( stack[0], stack[1] ); /* leave */ return stack[0]; }
  44. 91.

    Interruption handler Check interrupts like SIGINT, another thread Pop VM

    call frame Integer#+ redefinition check SET_SP: VM's behavior which can be removed
  45. 92.

    Pop VM call frame Interruption handler Check interrupts like SIGINT,

    another thread Return 3 FIX2INT(0x7) == 3 Integer#+ redefinition check SET_SP: VM's behavior which can be removed
  46. 93.
  47. 96.

    Stack pointer motion Stack pointer motion Stack pointer motion Forgot

    to remove this 3. Stack pointer motion is reduced
  48. 100.
  49. 102.

    -BTU&YBNQMFXIJMFMPPQ def while_loop i = 0 while i < 1000000

    i += 1 end end i = 0 while i < 2000 while_loop i += 1 end
  50. 105.
  51. 107.
  52. 109.
  53. 111.
  54. 113.
  55. 114.
  56. 116.
  57. 120.
  58. 121.

    i = 0 c i < 1000000 check interrupts check

    interrupts Fixnum?(i) for #< Fixnum#< redefined?
  59. 122.

    i = 0 c i < 1000000 check interrupts check

    interrupts Fixnum#+ redefined? Fixnum?(i) for #< Fixnum#< redefined?
  60. 123.

    i = 0 c i < 1000000 check interrupts check

    interrupts i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined?
  61. 124.

    i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined?
  62. 125.

    i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? can't optimize #+ ? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined?
  63. 126.

    i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? can't optimize #+ ? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined? set i for VM + check WB
  64. 127.

    i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? can't optimize #+ ? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined? set i for VM + check WB set i for JIT
  65. 128.

    i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined? can't optimize #+ ? set i for VM + check WB set i for JIT
  66. 129.

    • #+ and #< are performed on not VM stack

    but registers • #+ and #< share some instructions to check redefinition • Unnecessary type checks are omitted from the loop 8IZXIJMFMPPQCFDPNFTGBTUFS
  67. 131.

    • Many optimizations are possible because C compiler can know

    definitions • If we could inline methods, C compiler would be able to optimize more -FU$DPNQJMFSXPSLIBSE
  68. 133.

    1. JIT compiler can know definitions 2. JIT compiler can

    modify code to call a method 3. Inlined code can be invalidated 8IFOJTNFUIPEJOMJOJOHQPTTJCMF
  69. 134.

    1. JIT compiler can know definitions 2. JIT compiler can

    modify code to call a method 3. Inlined code can be invalidated 8IFOJTNFUIPEJOMJOJOHQPTTJCMF
  70. 135.

    1. JIT compiler can know definitions 2. JIT compiler can

    modify code to call a method 3. Inlined code can be invalidated 8IFOJTNFUIPEJOMJOJOHQPTTJCMF
  71. 136.

    • Ruby method • called by Ruby method • called

    by C method • Ruby block • yield-ed by Ruby method • called by C method • C method • called by Ruby method • called by C method .BKPSJOMJOFUBSHFUT
  72. 137.

    • Ruby method • called by Ruby method => easy

    • called by C method • Ruby block • yield-ed by Ruby method • called by C method • C method • called by Ruby method • called by C method .BKPSJOMJOFUBSHFUT JIT compiler can deal with bytecode easily Method cache can be used for invalidation
  73. 138.

    • Ruby method • called by Ruby method => easy

    • called by C method • Ruby block • yield-ed by Ruby method => medium • called by C method • C method • called by Ruby method => medium • called by C method .BKPSJOMJOFUBSHFUT yield doesn't have cache Sometimes it's hard to know definitions
  74. 139.

    • Ruby method • called by Ruby method => easy

    • called by C method => hard • Ruby block • yield-ed by Ruby method => medium • called by C method => hard • C method • called by Ruby method => medium • called by C method => hard .BKPSJOMJOFUBSHFUT There is no cache key for invalidation How to modify C code?
  75. 141.

    ret = 0 1000000.times do |i| ret += i end

    ret Ruby -> C method call medium Integer#times is defined with C 3VCZˠ$ˠ3VCZJOMJOJOHQSPCMFN
  76. 142.

    ret = 0 1000000.times do |i| ret += i end

    ret Ruby -> C method call medium Integer#times is defined with C C -> Ruby block call hard 3VCZˠ$ˠ3VCZJOMJOJOHQSPCMFN
  77. 146.

    • Ruby method • called by Ruby method => easy

    • called by C method => hard • Ruby block • yield-ed by Ruby method => medium • called by C method => hard • C method • called by Ruby method => medium • called by C method => hard *JNQMFNFOUFEBQSPUPUZQFUPJOMJOFUIJT https://github.com/k0kubun/ruby/commits/mjit-inline-send-yield
  78. 148.
  79. 149.

    *OUFHFSUJNFTCFODINBSLSFTVMUT Integer#times in C Integer#times in Ruby VM 145.44s 1.00x

    156.38s 0.93x JIT 104.80s 1.39x time ruby --disable-gems times_loop.rb
  80. 150.

    *OUFHFSUJNFTCFODINBSLSFTVMUT Integer#times in C Integer#times in Ruby VM 145.44s 1.00x

    156.38s 0.93x JIT 104.80s 1.39x 56.46s 2.56x time ruby --disable-gems times_loop.rb
  81. 152.

    • Rails performance is going to be improved • JIT

    can eliminate many instructions • C language will be useless in the future $PODMVTJPO