VM-Generated JIT Compiler for Ruby 2.6

VM-Generated JIT compiler for Ruby 2.6 PLAZMA OSS Day: TD
Tech Talk 2018 Takashi Kokubun

Who? • GitHub, Twitter: k0kubun • Ruby Committer • Maintainer
of default template engine: ERB • Developed some JIT compilers for Ruby • LLRB, YARV-MJIT

Ad: WEB+DB PRESS Vol.103 • Introducing optimized Ruby 2.5 features
• Real example of Ruby code optimization • Proﬁling • Bytecode-wise optimization

NEWS: Ruby 2.6 merged JIT compiler

How is the performance? Optcarrot benchmark fps 0 15 30
45 60 2.0.0 2.1.0 2.2.0 2.3.0 2.4.0 2.5.0 2.6.0-dev r62403 59.22 53.09 48.33 45.54 38.92 38.32 38.76 37.2 JIT oﬀ JIT on Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores https://github.com/mame/optcarrot

How is the performance? MJIT micro benchmarks w/ 2.6.0-dev r62403
speedup ratio compared to JIT off 0 1 2 2 3 aread aref aset aw rte call const2 fannk fib ivread ivw rite m andelbrot m eteor nbody nest-ntim es nest-w rite norm nsvb sieve trees w hile 3.0 1.1 1.2 1.1 1.2 1.2 1.3 1.1 1.1 2.1 2.9 1.5 1.0 2.3 2.3 1.5 1.9 2.1 2.8 2.1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 JIT off JIT on Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores https://github.com/benchmark-driver/mjit-benchmarks

How is the performance? https://twitter.com/ChrisGSeaton/status/961035035385237509 Running that it looks like
MJIT is over 3x faster! Which is very impressive and it's already doing better than both JRuby and Rubinius. TrufﬂeRuby is over 300x faster (I only mention it because it's my own implementation of a Ruby JIT), so there's still lots of rooms for optimizations, as the authors have already said themselves.

Agenda 1. Overview of Ruby's JIT compilation 2. JIT Infrastructure:
The hard works for portability 3. JIT Compiler: Internals of VM-Generated JIT compiler 4. Future works

1. Overview of Ruby's JIT compilation

Options for JIT compilation • What to JIT-compile • Method
JIT • Tracing JIT • How to JIT-compile • Generate assembly code and assemble • Use JIT library's interface like LLVM

How about constructing LLVM IR? • It's popular in modern
languages, and I created PoC: LLRB • http://github.com/k0kubun/llrb • But I learned that we can't eﬃciently use it for Ruby • Major optimization is done by inlining Ruby core's LLVM IR generated by clang • Just generating C code and using clang seemed enough

The Ruby's way: "MJIT" infrastructure • "MJIT" (MRI JIT) infrastructure
• It puts a C ﬁle generated by a method's bytecode on a disk (method JIT) • Then it lets cc(1) compile the C code to .so ﬁle, and dynamically loads it • This idea is proposed and implemented by Vladimir Makarov • https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch

The Ruby's way: "MJIT" infrastructure VM's C code Ruby process
queue MJIT Worker Thread VM Thread Build time

The Ruby's way: "MJIT" infrastructure VM's C code Ruby process
queue MJIT Worker Thread VM Thread Build time header Transform

The Ruby's way: "MJIT" infrastructure VM's C code precompiled header
Ruby process header queue MJIT Worker Thread VM Thread Build time Transform CC

Ruby process header queue MJIT Worker Thread VM Thread Build time Transform CC Enqueue / Dequeue Bytecode to JIT

Ruby process header queue MJIT Worker Thread VM Thread Build time Transform CC Enqueue / Dequeue Bytecode to JIT C code Generate C code from bytecode

Ruby process header queue MJIT Worker Thread VM Thread Build time Transform CC Enqueue / Dequeue Bytecode to JIT C code .so ﬁle CC Included by C code Generate C code from bytecode

Ruby process header queue MJIT Worker Thread VM Thread Build time Transform CC Enqueue / Dequeue Bytecode to JIT C code .so ﬁle CC Included by C code Generate C code from bytecode Function pointer of machine code Load Called by

The Ruby's way: "MJIT" infrastructure • Upside • Build dependency
is almost not changed • Maintenance cost of JIT compiler is relatively low • Downside • C compiler becomes optional runtime dependency • It's highly recommended to keep C compiler used to build Ruby available on your server/container

What did Ruby 2.6 merge? • Ruby 2.6 merged: •
JIT Infrastructure: "MJIT" • JIT Compiler: "YARV-MJIT" • MJIT had built-in JIT compiler, but it required many VM changes and is risky • So I built conservative JIT compiler which runs on top of MJIT • Let's talk about those 2 components

2. JIT Infrastructure: The hard works for portability

Command line construction for C compilers • Spawn compiler with
$(CC) and compiler-speciﬁc ﬂags (improved by nobu, usa) • gcc: gcc -fPIC -shared -w -pipe ... • clang: clang -O2 -dynamic -w -bundle -include-pch ... • cl.exe: cl.exe -Fe ...

Command line construction for C compilers Ruby committers are desiring
to use Ruby

Command line construction for C compilers • We can't use
Ruby runtime on MJIT worker thread • Ruby VM is process global, and Ruby runtime is not thread safe • Who wants to apply GVL between main thread and JIT thread? • Using Ruby runtime on MJIT worker causes random SEGV...

Extra topic: Security on dynamic loading • It creates and
compiles files like: "/tmp/_ruby_mjit_p12789u161.c" • p12789 is PID, u161 is a sequential number, so it can be easily predicted • MJIT worker should prevent it from being modified by others • Initial implementation had vulnerability • nobu fixed it to use: "open(c_file, O_EXCL|O_CREAT, 0600)" • "O_EXCL|O_CREAT" is needed because an existing file may have unexpected permission

Windows support • I could port MJIT's pthread usage to
Windows native thread early • The actual hard parts: • long is 32bit - MinGW still seems to have some issue on it • cl.exe (Visual Studio) and Windows headers are not good for preprocessing

Transformation of C header for JIT • Platform supports: ICC,
AIX, NetBSD, MinGW... • JIT header generation depends on gcc/clang's "-E -dD" which preprocesses C code leaving macro • But Visual Studio doesn't have such feature... • Use Pure-Ruby C preprocessor for Windows (!?) • Dynamic C code transformation by regexp (!!!) • Adding "static inline" for inlining and to reduce compilation time

Transformation of C header for JIT He says it is
not matured and not so serious for now

Find C function with regexp ↓ Transform with String#sub!

Testing strategy • ruby(1) introduced options for JIT testing: •
--jit-wait - if JIT is triggered, wait until JIT compilation is ﬁnished • --jit-min-calls=N - change the threshold to trigger JIT • This is needed to control inlining by call cache (explained later) • Now trunk has unit tests that spawn "ruby --jit-wait --jit-min-calls=1 --jit- verbose=1", and conﬁrms stderr has "JIT success" output • When big JIT change is made, we need to verify that "make test-all" passes with RUN_OPTS="--jit-wait --jit-min-calls=1" (and "--jit-min-calls=5" too for call cache)

Replaceable JIT compiler • Ruby's JIT compiler is implemented as
a single object ﬁle mjit_compile.o, and its interface is only a single function mjit_compile() • I believe the current approach is the easiest way to maintain and has no blocker for any JIT optimization • But if we found a better strategy for JIT compiler, we can fully replace it easily • Vladimir Makarov is working on another approach that uses RTL as intermediate representation between YARV instructions and JIT-ed code

3. JIT Compiler: Internals of VM-Generated JIT compiler

The design philosophy of my JIT compiler • Make it
very easy to maintain and debug • Keep it simple at the ﬁrst release to minimize risks

A commit for the Ruby's initial JIT compiler

JIT compiler needed only 680 lines (2,584 in total with
MJIT infrastructure)

Super meta code generator ERB template Ruby C C ERB
#compile Kernel #eval fprintf "This is an ERB template that generates Ruby code that generates C code that generates JIT-ed C code." Machine Code gcc/clang Source Build-time only MJIT worker source JIT-ed temporary code

Switch-case for each instruction ERB

Static macro expansion Main JIT implementation (Just printing VM source)
Dynamic macro expansion ERB

Generated C code (JIT compiler) fprintf for each instruction

Generated C code (JIT-ed code) Copy-paste of VM instruction code
(sometimes optimized)

Super meta code generator • Even while I'm sleeping, JIT
compiler's source code is updated automatically when VM implementation is changed • JIT compiler actually worked before and after recent VM changes

Hacks to achieve this automation • Replacing macros like EXEC_EC_CFP,
THROW_EXCEPTION • Special compilation of JUMP for opt_case_dispatch • Keep moving program counter to meet catch table • Properly ignore unhandled execution from exception handler • We may be able to support it later tl;dr it was hard

Optimization 1: VM instruction inlining for JIT • Have C
function deﬁnitions in MJIT header as many as possible • Major optimization is done here, by inlining VM operations in MJIT header • Non-automated example: • Carve out fast path of method search function and inline it • Inline function used by instruction optimized by VM • I inlined Array#[] with Integer argument and it makes VM faster too

Separate slow path as external function (which is slow to
compile, so header doesn't have its deﬁnition) Make sure fast path is inlined (kept in JIT header)

Change external function reference to inline function (for fast path)
Array#[] with Integer is optimized in both VM and JIT

Optimization 2: Inlining method call setup by call cache •
Method call setup: method search, prepare arguments, push frame • VM has cache for method call, and JIT compiler utilizes it • But it requires receiver class to invalidate cache • JIT compiler doesn't know receiver on compilation • I introduced the invalidator for obsoleted call cache to avoid random SEGV

class Foo (serial 0)

class Foo (serial 1) def baz 2 end Increment class
serial on method deﬁnition

class Foo (serial 2) def bar 1 + baz end
def baz 2 end Increment class serial on method deﬁnition

def baz 2 end Bytecode A: putobject 2 Bytecode B: putobject 1 opt_send :baz, opt_plus cache nil On generating bytecode, it creates call cache

def baz 2 end Bytecode A: putobject 2 Bytecode B: putobject 1 opt_send :baz, opt_plus cache :A, serial: 2 Once method is called, it holds pointer to bytecode and serial

def baz 2 end Bytecode A: putobject 2 Bytecode B: putobject 1 opt_send :baz, opt_plus cache :A, serial: 2 When receiver object's class is Foo, it has new serial and invalidates old one def baz 3 end Bytecode C: putobject 3 On method redeﬁnition, it increments serial

Optimization 2: Inlining method call setup by call cache •
Why don't you use this for method inlining? • Currently it's only used for inlining Ruby-speciﬁc method call setup • But working on it!

WIP Optimization 3: Ruby -> Ruby method inlining • As
we have JIT compiler for bytecode, when call cache has valid bytecode, we can inline it and invalidate it by call cache • Patch is almost completed but is not properly veriﬁed/measured yet

Inlined call Redeﬁnition guard

Optimization 4: Call cache based type guard removal • Some
instructions has guard for receiver class to optimize (like opt_aref has guard for Array / Hash), and it dispatches normal method call if the class is not expected one • But if not optimized method is called, we can eliminate it by call cache

Optimized case for Array / Hash (This is removed for
others in JIT) Only this is needed for other classes

WIP Optimization 5: Lazy stack pointer motion • When longjmp
is called, JIT-ed function call frame goes away • We must restore VM's state so that it's the same as the middle of JIT-ed function • I'm moving stack pointer in JIT-ed code even though it's sometimes unnecessary • As we're moving program counter, we can restore stack pointer from it • But it's hard...

I want to change this to local variable. (currently it's
VM's and needs sp) Then this stack pointer motion is removed

class Foo def bar (JIT-ed) 1 + baz end def
baz raise "err" end JIT local variable array VM stack Program counter yyy xxx What we need to do

baz raise "err" end JIT local variable array VM stack 1 Program counter xxx yyy What we need to do

baz raise "err" end JIT local variable array VM stack 1 Program counter yyy nil What we need to do xxx Dynamic stack extension (difﬁcult) to insert value

baz raise "err" end JIT local variable array VM stack 1 Program counter yyy 1 This should be done before longjmp xxx

4. Future works

Near future 1: TracePoint check removal • Ruby 2.5 removed
"trace" instruction by default, and it dynamically alters all bytecodes to support tracing when TracePoint is enabled • It means that we need to cancel JIT function call on it • For now, I added guards for it after any method call • If we can cancel JIT-ed function call to VM execution outside the frame by longjmp properly, we can remove the guards

Near future 1: TracePoint check removal I want to remove
this guard

Near future 2: Improve performance on Rails • Unfortunately workload
of NES emulator (optcarrot) is diﬀerent from Rails, and currently Rails is not optimized by the JIT • There is no single perfect benchmark for Ruby • I believe JIT can improve performance of many pure-Ruby parts on Rails, but somehow it's not the case for now • I need more time to investigate the reason

Near future 3: Full Windows support • JIT compiler is
somewhat working on MinGW, but it still has some bugs to be addressed • Visual Studio support • usa already did some great jobs • Installing VM sources or pure-Ruby C preprocessor?

A little far future 4: Ruby -> C core method
inlining • We can use the same strategy as Ruby -> Ruby method inlining • If we successfully build a header that has both core method deﬁnitions and VM implementation, we may be able to do this • Not tried yet, but identifying the function in call cache might be a blocker

Far future 5: C core -> Ruby method inlining •
Using "while" is faster than "Enumerable#each", but many Ruby developers don't want to write "while" • Inlining block in JIT should solve it • But such block invocation in Ruby core methods is out of control when generating JIT-ed code for now

Conclusion • We're working hard to improve portability and performance
• Not so fast yet, but many optimizations are made possible and we have much time to do them until Ruby 2.6 • Ruby method inlining is almost there

VM-Generated JIT Compiler for Ruby 2.6

VM-Generated JIT Compiler for Ruby 2.6

More Decks by Takashi Kokubun

Other Decks in Programming

Featured

Transcript