Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Method JIT Compiler

The Method JIT Compiler

RubyKaigi 2018
http://rubykaigi.org/2018

Takashi Kokubun

June 02, 2018
Tweet

More Decks by Takashi Kokubun

Other Decks in Programming

Transcript

  1. T R E A S U R E D A

    T A The Method JIT Compiler for Ruby 2.6 Takashi Kokubun / @k0kubun RubyKaigi 2018
  2. T R E A S U R E D A

    T A Maintainer of ERB, Haml Developing JIT compiler for Ruby 2.6 @k0kubun
  3. • 2017 Sep: LLVM JIT (EN) • 2017 Nov: YARV

    MJIT (EN) • 2017 Dec: YARV MJIT (JA) • 2018 Feb: ERB generation (JA) • 2018 Apr: Preview2 optimizations (EN) .ZQBTUUBMLTBCPVU3VCZT+*5 https://speakerdeck.com/k0kubun
  4. 1. Current Status 2. JIT on Rails 3. Dive Into

    Native Code 4. Method Inlining 5PEBZTUBML
  5. • JIT in 2.6.0-preview2 is not production ready yet •

    Fixing bugs by a race condition • I'll introduce current status of: • Implementation • Portability • Performance $VSSFOUTUBUVT
  6. .+*53VCZ`T+*5BSDIJUFDUVSF Ruby Process Disk Memory Method #1 Bytecode Ruby VM

    Thread Interpret Request JIT-ing #1 Method #1 C code MJIT worker Thread Generate
  7. .+*53VCZ`T+*5BSDIJUFDUVSF Ruby Process Disk Memory Method #1 Bytecode Ruby VM

    Thread Interpret Request JIT-ing #1 Method #1 C code MJIT worker Thread Generate Method #1 SO file Run C compiler
  8. .+*53VCZ`T+*5BSDIJUFDUVSF Ruby Process Disk Memory Method #1 Bytecode Ruby VM

    Thread Interpret Request JIT-ing #1 Method #1 C code MJIT worker Thread Generate Method #1 SO file Run C compiler Method #1 Native code Load
  9. .+*53VCZ`T+*5BSDIJUFDUVSF Ruby Process Disk Memory Method #1 Bytecode Interpret Request

    JIT-ing #1 Method #1 C code MJIT worker Thread Generate Method #1 SO file Run C compiler Method #1 Native code Load Ruby VM Thread Call
  10. )PXJTUIJTJNQMFNFOUFE Ruby Process Disk Memory Method #1 Bytecode Interpret Request

    JIT-ing #1 Method #1 C code MJIT worker Thread Generate Method #1 SO file Run C compiler Method #1 Native code Load Ruby VM Thread Call
  11. )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave =
  12. )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; }
  13. )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; /* putobject 2 */ stack[1] = 2; }
  14. )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; /* putobject 2 */ stack[1] = 2; /* opt_plus */ stack[0] = opt_plus( stack[0], stack[1] ); }
  15. )PXJTUIJTJNQMFNFOUFE Ruby Code def three 1 + 2 end Bytecode

    putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; /* putobject 2 */ stack[1] = 2; /* opt_plus */ stack[0] = opt_plus( stack[0], stack[1] ); /* leave */ return stack[0]; }
  16. • Based on Ruby 2.5’s Ruby VM • If JIT

    is disabled, everything must work in 2.6 • JIT implementation is automatically generated • To keep up with frequent Ruby VM changes 3VCZT+*5EFTJHO
  17. $DPNQJMFSTVQQPSUT GCC Clang Visual C++ Intel C++ Compiler MJIT worker

    ◦ ◦ ◦ ◦ JIT header ◦ ◦ × ◦ CLI support ◦ ◦ ◦ × Support plan Done Done Next Later Now MJIT worker (native thread, dynamic loading) runs on Windows and UNIX
  18. 1MBUGPSNTVQQPSUTXJUI($$ Linux MinGW Solaris NetBSD FreeBSD JIT header ◦ ˚

    ◦ ◦ ◦ test_jit.rb ◦ ◦ ◦ ? × MinGW header is not minimized and thus compilation speed is slow. I guess NetBSD works but we don’t have NetBSD RubyCI. GCC on FreeBSD is crashing.
  19. 1MBUGPSNTVQQPSUTXJUI$MBOH Linux macOS OpenBSD JIT header ◦ ◦ ◦ test_jit.rb

    ◦ ◦ ? I guess OpenBSD works but we don’t have OpenBSD RubyCI
  20. 0QUDBSSPU GQT      Ruby 2.0 trunk

    trunk+JIT    1.49x → 2.03x https://gist.github.com/k0kubun/95c81358af6f34b4d0a71425da871178
  21. • Generated code should be faster in general • What's

    different from Optcarrot? 8IZ3BJMTCFDPNFTTMPXXJUI+*5
  22. 1. longjmp by exception is slow 2. Profiling method calls

    has overhead 3. JIT-ed call is canceled too often 4. JIT compilation has overhead 5. Calling JIT-ed code has overhead .ZIZQPUIFTJT
  23. • When a method is returned from its child block,

    it calls longjmp(3) • VM is implemented with just return statement and may be faster in that case MPOHKNQCZFYDFQUJPOJTTMPX
  24. • MJIT counts method calls to decide which method to

    compile with JIT enabled • This was suspected in [Bug #14490] 1SPGJMJOHNFUIPEDBMMTIBTPWFSIFBE
  25. /PCJHEJGGFSFODFCZQSPGJMJOHNFUIPEDBMMT trunk No options modified No options trunk --jit JIT

    × × ◦ Profiling × ◦ ◦ Percentile: ms GET /: 50: 58.4ms 75: 65.4ms 90: 67.9ms 99: 131.1ms GET /: 50: 58.5ms 75: 64.6ms 90: 67.8ms 99: 127.3ms GET /: 50: 66.3ms 75: 72.3ms 90: 77.0ms 99: 133.3ms `ruby script/simple_bench.rb 1000` with: https://github.com/k0kubun/discourse/tree/20fc03558f16aff94c6c017347783374cf4a0ca8
  26. • MJIT has a kind of de-optimization to fallback to

    VM interpretation when any assumption is not met • ex) Method redefinition, etc. • Such fallback might be an overhead +*5FEDBMMJTDBODFMMFEUPPPGUFO
  27. 5IFSBUJPPG+*5DBODFMMBUJPO JIT-ed calls Cancel by opt_xxx Cancel by call cache

    Optcarrot 49,171,765 786,842 (1.60%) 0 (0.00%) Discourse 1,000 requests 168,925,050 19,394,792 (11.5%) 10,092,254 (5.97%) JIT cancel reasons: • opt_xxx: Non-core class is given to +, -, *, /, #[], etc. • call cache: Method redefinition, receiver class is changed
  28. 8IZ+*5DBODFMIBQQFOTTPPGUFO • Current JIT doesn't discard any JIT-ed code whose

    assumption is not met • opt_xxx is performing badly when a receiver is not a core class like Integer, Float, String, Array, Hash
  29. 8IZ+*5DBODFMIBQQFOTTPPGUFO • Current JIT doesn't discard any JIT-ed code whose

    assumption is not met • opt_xxx is performing badly when a receiver is not a core class like Integer, Float, String, Array, Hash There are many #[] for non Hash/Array classes in Rails
  30. PQU@YYYDBODFMJTEFDSFBTFENVDI JIT-ed calls Cancel by opt_xxx Cancel by call cache

    Discourse Before 168,925,050 19,394,792 (11.5%) 10,092,254 (5.97%) Discourse After 75,150,482 2,849,825 (3.79%) 3,072,673 (4.09%) #[] has a major impact on Rails. Others are to be improved...
  31. • Appending a method to JIT-ed queue may have overhead

    • GCC or Clang may use the same CPU core, or it may cost to transfer data to another core +*5DPNQJMBUJPOIBTPWFSIFBE
  32. +*5DPNQJMBUJPOIBEPWFSIFBE No options --jit → Stop --jit Code is JIT-ed

    × ◦ ◦ JIT is going on × × ◦ Percentile: ms GET /: 50: 60.4ms 75: 66.9ms 90: 69.6ms 99: 125.4ms GET /: 50: 65.1ms 75: 72.4ms 90: 75.8ms 99: 145.6ms GET /: 50: 68.4ms 75: 74.8ms 90: 80.0ms 99: 137.2ms But this overhead is excluded from [Bug #14490] degradation…
  33. • JIT-ed code behaves slower only on an exception or

    JIT cancellation, but they weren’t culprit • JIT compilation does not dominate the slowness • Then, calling native code has overhead…? $BMMJOH+*5FEDPEFIBTPWFSIFBE
  34. $BMMJOHNBOZEJGGFSFOUNFUIPETJTTMPX Called methods 1 method 15 methods JIT disabled 3.69s

    3.71s JIT enabled 3.79s 5.34s Duration with the same total calls
  35. $BMMJOHNBOZEJGGFSFOUNFUIPETJTTMPX 0 1.5 3 4.5 6 1 3 5 7

    9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 VM JIT
  36. $BMMJOHNBOZEJGGFSFOUNFUIPETJTTMPX 0 1.5 3 4.5 6 1 3 5 7

    9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 VM JIT 6 12 19
  37. 8IZEPFTUIJTIBQQFO 0 1.5 3 4.5 6 1 3 5 7

    9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 VM JIT 6 12 19
  38. • Ongoing JIT compilation may have overhead • JIT cancel

    is happening frequently (to be fixed) • It stalls to load many different methods (to be fixed) 3FBTPOPG3BJMTTMPXEPXOPO+*5
  39. &YBNQMF  Ruby Code def three 1 + 2 end

    Bytecode putobject 1 putobject 2 opt_plus leave =
  40. &YBNQMF  Ruby Code def three 1 + 2 end

    Bytecode putobject 1 putobject 2 opt_plus leave = C code three() { VALUE stack[2]; /* putobject 1 */ stack[0] = 1; /* putobject 2 */ stack[1] = 2; /* opt_plus */ stack[0] = opt_plus( stack[0], stack[1] ); /* leave */ return stack[0]; }
  41. Interruption handler Check interrupts like SIGINT, another thread Pop VM

    call frame Integer#+ redefinition check SET_SP: VM's behavior which can be removed
  42. Pop VM call frame Interruption handler Check interrupts like SIGINT,

    another thread Return 3 FIX2INT(0x7) == 3 Integer#+ redefinition check SET_SP: VM's behavior which can be removed
  43. Stack pointer motion Stack pointer motion Stack pointer motion Forgot

    to remove this 3. Stack pointer motion is reduced
  44. -BTU&YBNQMFXIJMFMPPQ def while_loop i = 0 while i < 1000000

    i += 1 end end i = 0 while i < 2000 while_loop i += 1 end
  45. i = 0 c i < 1000000 check interrupts check

    interrupts Fixnum?(i) for #< Fixnum#< redefined?
  46. i = 0 c i < 1000000 check interrupts check

    interrupts Fixnum#+ redefined? Fixnum?(i) for #< Fixnum#< redefined?
  47. i = 0 c i < 1000000 check interrupts check

    interrupts i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined?
  48. i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined?
  49. i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? can't optimize #+ ? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined?
  50. i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? can't optimize #+ ? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined? set i for VM + check WB
  51. i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? can't optimize #+ ? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined? set i for VM + check WB set i for JIT
  52. i = 0 c i < 1000000 check interrupts check

    interrupts Int overflow? i + 1 Fixnum?(i) for #< Fixnum#< redefined? Fixnum#+ redefined? can't optimize #+ ? set i for VM + check WB set i for JIT
  53. • #+ and #< are performed on not VM stack

    but registers • #+ and #< share some instructions to check redefinition • Unnecessary type checks are omitted from the loop 8IZXIJMFMPPQCFDPNFTGBTUFS
  54. • Many optimizations are possible because C compiler can know

    definitions • If we could inline methods, C compiler would be able to optimize more -FU$DPNQJMFSXPSLIBSE
  55. 1. JIT compiler can know definitions 2. JIT compiler can

    modify code to call a method 3. Inlined code can be invalidated 8IFOJTNFUIPEJOMJOJOHQPTTJCMF
  56. 1. JIT compiler can know definitions 2. JIT compiler can

    modify code to call a method 3. Inlined code can be invalidated 8IFOJTNFUIPEJOMJOJOHQPTTJCMF
  57. 1. JIT compiler can know definitions 2. JIT compiler can

    modify code to call a method 3. Inlined code can be invalidated 8IFOJTNFUIPEJOMJOJOHQPTTJCMF
  58. • Ruby method • called by Ruby method • called

    by C method • Ruby block • yield-ed by Ruby method • called by C method • C method • called by Ruby method • called by C method .BKPSJOMJOFUBSHFUT
  59. • Ruby method • called by Ruby method => easy

    • called by C method • Ruby block • yield-ed by Ruby method • called by C method • C method • called by Ruby method • called by C method .BKPSJOMJOFUBSHFUT JIT compiler can deal with bytecode easily Method cache can be used for invalidation
  60. • Ruby method • called by Ruby method => easy

    • called by C method • Ruby block • yield-ed by Ruby method => medium • called by C method • C method • called by Ruby method => medium • called by C method .BKPSJOMJOFUBSHFUT yield doesn't have cache Sometimes it's hard to know definitions
  61. • Ruby method • called by Ruby method => easy

    • called by C method => hard • Ruby block • yield-ed by Ruby method => medium • called by C method => hard • C method • called by Ruby method => medium • called by C method => hard .BKPSJOMJOFUBSHFUT There is no cache key for invalidation How to modify C code?
  62. ret = 0 1000000.times do |i| ret += i end

    ret Ruby -> C method call medium Integer#times is defined with C 3VCZˠ$ˠ3VCZJOMJOJOHQSPCMFN
  63. ret = 0 1000000.times do |i| ret += i end

    ret Ruby -> C method call medium Integer#times is defined with C C -> Ruby block call hard 3VCZˠ$ˠ3VCZJOMJOJOHQSPCMFN
  64. • Ruby method • called by Ruby method => easy

    • called by C method => hard • Ruby block • yield-ed by Ruby method => medium • called by C method => hard • C method • called by Ruby method => medium • called by C method => hard *JNQMFNFOUFEBQSPUPUZQFUPJOMJOFUIJT https://github.com/k0kubun/ruby/commits/mjit-inline-send-yield
  65. *OUFHFSUJNFTCFODINBSLSFTVMUT Integer#times in C Integer#times in Ruby VM 145.44s 1.00x

    156.38s 0.93x JIT 104.80s 1.39x time ruby --disable-gems times_loop.rb
  66. *OUFHFSUJNFTCFODINBSLSFTVMUT Integer#times in C Integer#times in Ruby VM 145.44s 1.00x

    156.38s 0.93x JIT 104.80s 1.39x 56.46s 2.56x time ruby --disable-gems times_loop.rb
  67. • Rails performance is going to be improved • JIT

    can eliminate many instructions • C language will be useless in the future $PODMVTJPO