JIT compiler improvements in Ruby 2.7 / RubyRussia 2019

JIT compiler improvements in Ruby 2.7 / RubyRussia 2019

08d5432a5bc31e6d9edec87b94cb1db1?s=128

Takashi Kokubun

September 28, 2019
Tweet

Transcript

  1. 3.
  2. 4.

    JIT

  3. 6.

    What's JIT? • Experimental optional feature since Ruby 2.6 •

    Compile your Ruby code to faster C code automatically • Just-in-Time: Use runtime information for optimizations $ ruby --jit
  4. 8.

    Ruby 3x3 benchmark: Optcarrot Intel 4.0GHz i7-4790K 8 cores, memory

    16GB, x86-64 Ubuntu Ruby 2.6.0 w/ Optcarrot Frames Per Second (fps) 0 23 45 68 90 53.8 JIT off JIT on
  5. 9.

    Ruby 3x3 benchmark: Optcarrot Speed 1.61x Intel 4.0GHz i7-4790K 8

    cores, memory 16GB, x86-64 Ubuntu Ruby 2.6.0 w/ Optcarrot Frames Per Second (fps) 0 23 45 68 90 86.6 53.8 JIT off JIT on
  6. 10.

    What's JIT? $ ps aufx ruby --jit bin/optcarrot-bench \_ /usr/bin/gcc

    -w -Wfatal-errors -fPIC -shared -w -pipe -O3 \_ /usr/lib/gcc/x86_64-linux-gnu/7/cc1 -quiet -imultiarch \_ as -W --64 -o /tmp/_ruby_mjit_p31673u20.o
  7. 11.

    How does it work? VM's C code Ruby process header

    queue VM Thread Build time Transform Precompile precompiled header MJIT Worker Thread
  8. 12.

    VM's C code Ruby process header queue VM Thread Build

    time Enqueue / Dequeue Bytecode to JIT precompiled header MJIT Worker Thread How does it work?
  9. 13.

    Ruby process queue VM Thread Build time Enqueue / Dequeue

    Bytecode to JIT Included Generate precompiled header C code MJIT Worker Thread VM's C code header How does it work?
  10. 14.

    Ruby process queue VM Thread Build time Enqueue / Dequeue

    Bytecode to JIT CC Included Generate precompiled header .o file C code MJIT Worker Thread VM's C code header How does it work?
  11. 15.

    Ruby process queue VM Thread Build time Enqueue / Dequeue

    Bytecode to JIT .so file CC Included Generate precompiled header .o file Link C code MJIT Worker Thread VM's C code header How does it work?
  12. 16.

    Ruby process queue VM Thread Build time Enqueue / Dequeue

    Bytecode to JIT .so file CC Included Generate Function pointer of machine code Load Called by precompiled header .o file Link C code MJIT Worker Thread VM's C code header How does it work?
  13. 18.

    How to use JIT • Just "--jit" is fine •

    You can also use RUBYOPT=--jit environment variable $ ruby --jit
  14. 19.

    How to use JIT $ ruby --help JIT options (experimental):

    --jit-warnings Enable printing JIT warnings --jit-debug Enable JIT debugging (very slow) --jit-wait Wait until JIT compilation is finished everytime (for testing) --jit-save-temps Save JIT temporary files in $TMP or /tmp (for testing) --jit-verbose=num Print JIT logs of level num or less to stderr (default: 0) --jit-max-cache=num Max number of methods to be JIT-ed in a cache (default: 100) --jit-min-calls=num Number of calls to trigger JIT (for testing, default: 10000)
  15. 20.

    How to use JIT $ ruby --help JIT options (experimental):

    --jit-warnings Enable printing JIT warnings --jit-debug Enable JIT debugging (very slow) --jit-wait Wait until JIT compilation is finished everytime (for testing) --jit-save-temps Save JIT temporary files in $TMP or /tmp (for testing) --jit-verbose=num Print JIT logs of level num or less to stderr (default: 0) --jit-max-cache=num Max number of methods to be JIT-ed in a cache (default: 100) --jit-min-calls=num Number of calls to trigger JIT (for testing, default: 10000)
  16. 22.

    How to use JIT $ ruby --jit-verbose=1 ... JIT success

    (35.1ms): block in symbolize_keys!@... JIT success (89.9ms): block in forwarded_scheme@...
  17. 23.

    How to use JIT $ ruby --jit-verbose=1 ... JIT success

    (35.1ms): block in symbolize_keys!@... JIT success (89.9ms): block in forwarded_scheme@... JIT inline: unwrapped_html_escape@... JIT success (106.9ms): unwrapped_html_escape@... JIT inline: present?@... JIT success (37.5ms): present?@...
  18. 24.

    How to use JIT $ ruby --jit-verbose=1 ... JIT success

    (35.1ms): block in symbolize_keys!@... JIT success (89.9ms): block in forwarded_scheme@... JIT inline: unwrapped_html_escape@... JIT success (106.9ms): unwrapped_html_escape@... JIT inline: present?@... JIT success (37.5ms): present?@... Optimization in Ruby 2.7
  19. 25.

    How to use JIT $ ruby --jit-verbose=1 ... JIT success

    (35.1ms): block in symbolize_keys!@... JIT success (89.9ms): block in forwarded_scheme@... JIT inline: unwrapped_html_escape@... JIT success (106.9ms): unwrapped_html_escape@... JIT inline: present?@... JIT success (37.5ms): present?@... JIT recompile: present?@...
  20. 26.

    How to use JIT $ ruby --jit-verbose=1 ... JIT success

    (35.1ms): block in symbolize_keys!@... JIT success (89.9ms): block in forwarded_scheme@... JIT inline: unwrapped_html_escape@... JIT success (106.9ms): unwrapped_html_escape@... JIT inline: present?@... JIT success (37.5ms): present?@... JIT recompile: present?@... Another optimization in Ruby 2.7
  21. 27.

    How to use JIT $ ruby --jit-verbose=1 ... JIT success

    (35.1ms): block in symbolize_keys!@... JIT success (89.9ms): block in forwarded_scheme@... JIT inline: unwrapped_html_escape@... JIT success (106.9ms): unwrapped_html_escape@... JIT inline: present?@... JIT success (37.5ms): present?@... JIT recompile: present?@... ... JIT compaction (17.0ms): Compacted 100 methods -> ...
  22. 28.

    How to use JIT $ ruby --jit-verbose=1 ... JIT success

    (35.1ms): block in symbolize_keys!@... JIT success (89.9ms): block in forwarded_scheme@... JIT inline: unwrapped_html_escape@... JIT success (106.9ms): unwrapped_html_escape@... JIT inline: present?@... JIT success (37.5ms): present?@... JIT recompile: present?@... ... JIT compaction (17.0ms): Compacted 100 methods -> ... ?
  23. 29.

    Function pointer of machine code Ruby process queue VM Thread

    Build time Function pointer of machine code Called by precompiled header .o file .o file MJIT Worker Thread .o file Function pointer of machine code VM's C code header "JIT compaction"
  24. 30.

    Ruby process queue VM Thread Build time precompiled header .o

    file .o file MJIT Worker Thread .o file .so file Link all VM's C code header Function pointer of machine code Function pointer of machine code Called by Function pointer of machine code "JIT compaction"
  25. 31.

    Ruby process queue VM Thread Build time Function pointers of

    machine code Reload all Called by precompiled header .o file .o file MJIT Worker Thread .o file .so file Link all VM's C code header "JIT compaction"
  26. 32.

    Ruby process queue VM Thread Build time Function pointers of

    machine code Called by precompiled header .o file .o file MJIT Worker Thread .o file VM's C code header "JIT compaction"
  27. 34.

    Ruby benchmark on Rails: Railsbench • Just rails scaffold #show:

    k0kubun/railsbench • headius/pgrailsbench, but on Rails 5.2 and w/ db:seed • Small but capturing some Rails characteristics
  28. 35.

    Ruby 2.6 Request Per Second (#/s) 0 235 470 705

    940 720.7 924.9 JIT off JIT on k0kubun/railsbench : WARMUP=30000 BENCHMARK=10000 bin/bench Railsbench: Speed Intel 4.0GHz i7-4790K 8 cores, memory 16GB, x86-64 Ubuntu, Ruby 2.6=2.6.2 Ruby2.7=r67600
  29. 36.

    Ruby 2.6 Request Per Second (#/s) 0 235 470 705

    940 720.7 924.9 JIT off JIT on k0kubun/railsbench : WARMUP=30000 BENCHMARK=10000 bin/bench Ruby 2.7 Request Per Second (#/s) 0 235 470 705 940 899.9 932.0 JIT off JIT on Railsbench: Speed Intel 4.0GHz i7-4790K 8 cores, memory 16GB, x86-64 Ubuntu, Ruby 2.6=2.6.2 Ruby2.7=r67600
  30. 37.

    Intel 4.0GHz i7-4790K 8 cores, memory 16GB, x86-64 Ubuntu Ruby

    2.6 Request Per Second (#/s) 0 27 54 81 108 107.2 105.2 JIT off JIT on k0kubun/railsbench : WARMUP=30000 BENCHMARK=10000 bin/bench Ruby 2.7 Request Per Second (#/s) 0 27 54 81 108 107.6 106.5 JIT off JIT on Railsbench: Memory
  31. 38.

    Why is it slow on Rails? • Too many methods

    => Cache inefficiency • Less CPU bound and fewer optimization chances
  32. 40.

    Ruby 2.7 JIT Performance Improvements 1. Default Option Changes 2.

    Deoptimized Recompilation 3. Method Inlining 4. Optimized Dispatch of JIT-ed Code (WIP) 5. Stack-based Object Allocation (PoC)
  33. 42.

    1. Default Option Changes • Ruby 2.7 changes in default

    values of JIT options • --jit-min-calls: 5 → 10,000 • --jit-max-cache: 1,000 → 100
  34. 43.
  35. 45.

    Problem 2: JIT calls may be cancelled frequently • The

    "Cancel JIT execution" had some overhead • How many cancels did we have?
  36. 47.
  37. 48.
  38. 50.

    Solution 2: Deoptimized Recompilation • Recompile a method when JIT's

    speculation is invalidated • It was in the original MJIT by Vladimir Makarov, but removed for simplicity in Ruby 2.6
  39. 54.

    Problem 3: Method call is slow • We're calling methods

    everywhere • Method call cost: VM → VM 10.28ns VM → JIT 9.12ns JIT → JIT 8.98ns JIT → VM 19.59ns
  40. 56.

    Solution 3: Method Inlining • Method inlining levels: • Level

    1: Just call an inline function instead of JIT-ed code's function pointer • Level 2: Skip pushing a call frame by default, but lazily push it when something happens • For 2, We need to know "purity" of VM instruction
  41. 58.
  42. 63.

    Solution 3: Method Inlining • Method inlining is already on

    master! • It's working for limited things like #html_safe?, #present? • To make it really useful, we need to prepare Ruby version of core class methods for JIT
  43. 65.

    Problem 4: Calling JIT-ed code seems slow • When benchmarking

    after-compile Rails performance, maximum number of methods should be compiled • Max: 1,000 in Ruby 2.6, 100 in Ruby 2.7 • Note: only 30 methods are compiled on Optcarrot
  44. 66.

    Problem 4: Calling JIT-ed code seems slow Time to call

    a method returning nil (ns) 0 8 16 24 32 Number of called methods 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 VM JIT def foo3 nil end def foo2 nil end def foo1 nil end
  45. 67.

    So we did this in Ruby 2.6 Ruby process queue

    VM Thread Build time Function pointers of machine code Reload all Called by precompiled header .o file .o file MJIT Worker Thread .o file .so file Link all VM's C code header
  46. 68.

    After "JIT compaction" in Ruby 2.6 Time to call a

    method returning nil (ns) 0 8 16 24 32 Number of called methods 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 VM JIT def foo3 nil end def foo2 nil end def foo1 nil end
  47. 72.

    Solution 4: Optimized Dispatch of JIT-ed Code • Calling JIT-ed

    code from VM is slow • Can we generate special code for dispatch from VM? • We can reduce # of virtual calls from two to one • Work in progress, but I can show you a graph
  48. 73.

    After optimized dispatch of JIT-ed code (WIP) Time to call

    a method returning nil (ns) 0 8 16 24 32 Number of called methods 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 VM JIT def foo3 nil end def foo2 nil end def foo1 nil end
  49. 75.

    Problem 5: Object allocation is slow • Rails app allocates

    objects (of course!), unlike Optcarrot • It takes time to allocate memory from heap and GC it
  50. 76.

    Problem 5: Object allocation is slow • Railsbench takes time

    for memory management in perf memory management, GC 9.3%
  51. 77.

    Solution 5: Stack-based Object Allocation (PoC) • If an object

    does not "escape", we can allocate an object on stack • Implementing really clever escape analysis is hard, but some basic one can suffice some of real-world use cases
  52. 84.

    Conclusion • Optimizing JIT-ed code dispatch may offset the current

    JIT's bottleneck in JIT on Rails • Once the problem is solved, we'd be able to continuously improve performance • By allocating objects on stack, eliminating branches, ...