Ruby Implementations with JIT Compilers
CRuby (MRI)
JRuby
TruffleRuby
Slide 5
Slide 5 text
Overview
How CPUs run software
Crash course in computer architecture
What compilers are fundamentally doing
Say “hello” to assembly
How JIT compilers can make Ruby code run as fast as C
Tying it all together
Instructions
Tell the CPU what to do
Loaded into memory when a program is executed
Operate on data
Values coming from and and going to outside world
Slide 9
Slide 9 text
Instruction Set
Architecture (ISA)
Slide 10
Slide 10 text
x86_64 32-bit Registers
Slide 11
Slide 11 text
x86_64 Numeric Data Types
Slide 12
Slide 12 text
x86_64 MUL (Unsigned Multiply) Overview
Slide 13
Slide 13 text
x86_64 MUL (Unsigned Multiply) Forms
Operand Size Source 1 Source 2 Destination
Byte AL r/m8 AX
Word AX r/m16 DX:AX
Doubleword EAX r/m32 EDX:EAX
Quadword RAX r/m64 RDX:RAX
Flags Affected
The OF and CF flags are set to 0 if the upper half of the result is 0; otherwise, they are set
to 1. The SF, ZF, AF, and PF flags are undefined.
Slide 14
Slide 14 text
x86_64 MUL (Unsigned Multiply)
IF (Byte operation)
1
THEN
2
AX := AL ∗ SRC;
3
ELSE (* Word or doubleword operation *)
4
IF OperandSize = 16
5
THEN
6
DX:AX := AX ∗ SRC;
7
ELSE IF OperandSize = 32
8
THEN
9
EDX:EAX := EAX ∗ SRC; FI;
10
ELSE (* OperandSize = 64 *)
11
RDX:RAX := RAX ∗ SRC;
12
FI;
13
FI;
14
Slide 15
Slide 15 text
x86_64 MUL (Unsigned Multiply) Opcode
Slide 16
Slide 16 text
ISA Classification:
CISC vs RISC
Slide 17
Slide 17 text
CISC vs RISC
CISC: Complex Instruction Set Computer
Intel x86, AMD64
RISC: Reduced Instruction Set Computer
Think Apple Silicon, Graviton, RISC-V, ARM, and most mobile CPUs
Most new ISAs are RISC
Key differences
Register number and use
Scope of instructions
How we address data in instructions
Register, memory address, immediate (constant) value, etc.
Slide 18
Slide 18 text
Special Registers
PC - Program Counter
Sometimes called: IP - Instruction Pointer
Holds address of next instruction to execute
SP - Stack Pointer
Holds address of the top of the stack
Efficiently allows for storing and removing values in RAM
FP - Frame Pointer
Called the BP - Base Pointer on x86_64
Holds address for the start of the stack frame
Allows functions to quickly clean up after themselves
Slide 19
Slide 19 text
2. What Compilers are
Fundamentally Doing
Slide 20
Slide 20 text
Machine Code
Sometimes called native code
Binary representation of instructions
Encoded using the ISA’s opcode table
That’s why applications are called binaries
You could hand-write this if you wanted
It’s really tedious
Sometimes necessary for microcontrollers
Slide 21
Slide 21 text
Assembly Language (ASM)
Low-level text-based programming language
Fairly simple by virtue of having a limited set of operations
Maps to ISA instructions
Assembler: turns ASM into machine code
Disassembler: decodes machine code back into ASM
Slide 22
Slide 22 text
Compilers
Change code in one language to another
Sometimes split as:
Transpiler: language -> language
Compiler: language -> machine code
We’ll focus on machine code generation
Slide 23
Slide 23 text
Time for Code
Slide 24
Slide 24 text
Example: Addition
int add(int a, int b) {
return a + b;
}
Application Binary Interface (ABI)
Platform-specific protocol for coordinating with a debugger
Keep track of stack frames
How to step through functions
How to read function arguments
Platform-specific protocol for laying out functions in ASM
Also called calling convention
How arguments are passed
Where return value ends up
Which registers can be used
caller-saved
callee-saved
scratch
Virtual Machine
Ruby code runs in a virtual machine (VM)
An abstract computer that hides details about
underlying system
Hides details about memory layout, IO access,
register sizes, etc.
We can run the same program on any platform with VM
Instead of executing machine code, we interpret VM code
We call that part of the VM the interpreter
Slide 40
Slide 40 text
Interpreter
Parser turns source code into a structure the interpreter
can process
Removes comments, white space, punctuation, etc.
Most common representations
Abstract Syntax Tree (AST)
AST interpreter
Byte Code (BC) (e.g., CRuby’s YARV)
BC interpreter
YARV Optimization
CRuby will apply some optimizations to the byte code
You can see the generated YARV byte code with:
ruby --dump=insns
Sometimes take form of special instructions, like opt_plus
> ruby --dump=insns -e 'def add(a, b); a + b; end'
1
== disasm:
2
0000 definemethod :add, add ( 1)[Li]
3
0003 putobject :add
4
0005 leave
5
6
== disasm:
7
0000 getlocal_WC_0 a@0 ( 1)[LiCa]
8
0002 getlocal_WC_0 b@1
9
0004 opt_plus [CcCr]
10
0006 leave [Re]
11
Slide 45
Slide 45 text
YARV Optimization
CRuby will apply some optimizations to the byte code
You can see the generated YARV byte code with:
ruby --dump=insns
Sometimes take form of special instructions, like opt_plus
> ruby --dump=insns -e 'def add(a, b); a + b; end'
1
== disasm:
2
0000 definemethod :add, add ( 1)[Li]
3
0003 putobject :add
4
0005 leave
5
6
== disasm:
7
0000 getlocal_WC_0 a@0 ( 1)[LiCa]
8
0002 getlocal_WC_0 b@1
9
0004 opt_plus [CcCr]
10
0006 leave [Re]
11
Slide 46
Slide 46 text
VM Profiler
Monitors control and data flow
How execution proceeds in your application
How and what data moves through
Measures how frequently functions are called
Measures how frequently loops iterate
Uses heuristics to determine when code is hot and should
be compiled
Slide 47
Slide 47 text
JIT Compiler
Compiles a fragment of code (rather than whole application)
Stores it in a region we call a code cache
Common scopes
Basic Block
Fancy way of saying straight-line code
Method
A Ruby-level method (composed of basic blocks)
Trace
A flow of execution through multiple methods
Once compiled, updates interpreter to jump to compiled code instead of
interpreting that fragment
Slide 48
Slide 48 text
Speculative Optimization
What does this function do?
def add(a, b)
1
a + b
2
end
3
Add integers?
a(10, 20)
Concatenate strings?
a('Hello ', 'friend')
Append arrays?
a([1, 2], [3, 4])
Slide 49
Slide 49 text
Speculative Optimization
Since the profiler knows the control and data flow, it can
guess how your program will continue to operate
The VM can rewrite its internal representation (IR) based
on those guesses
Called speculative optimization
Sets up a fail safe for when that guess is wrong
Slide 50
Slide 50 text
Critical Optimization:
Method Lookup
Slide 51
Slide 51 text
Method Lookup
Nearly everything Ruby is a method call
To call a method, we need a reference to it
We need to look up the method in a method table
Methods can change at runtime
New methods added at runtime
Existing methods redefined or removed
Inheritance hierarchy changes
Cache miss requires full method lookup
Must be careful to invalidate entries when necessary
Practical considerations limit size
LRU cache eviction policy
May thrash if cache too small or many methods called
Slide 55
Slide 55 text
When in Doubt,
Add Another Level
Slide 56
Slide 56 text
Inline Cache (IC)
VM modifies method body based on observed values
Cache is scoped to a call site
Registers a “cheap” predicate to check if cache can be used
AKA a guard function
If guard passes, use the cache
Otherwise, transition the cache state
Slide 57
Slide 57 text
Inline Cache States
Uninitialized
Monomorphic
Polymorphic
Megamorphic
Uninitialized
Monomorphic
One cache entry
Polymorphic
Multiple cache entries
Megamorphic
Remove cache because
it’s not advantageous
Slide 58
Slide 58 text
Monomorphic Inline Cache
def add(a, b)
1
a + b
2
end
3
4
add(10, 20)
5
Slide 59
Slide 59 text
Monomorphic Inline Cache
def type_ok?(obj, klass)
1
obj.class == klass && !VM.has_changed?(klass)
2
end
3
4
def add_monomorphic(a, b)
5
if type_ok?(a, Integer) && type_ok?(b, Integer)
6
m = Integer.instance_method(:+)
7
m.bind_call(a, b)
8
else
9
handle_miss!
10
end
11
end
12
Slide 60
Slide 60 text
Polymorphic Inline Cache (PIC)
def add(a, b)
1
a + b
2
end
3
4
add(10, 20)
5
add('hello ', 'good people')
6
Slide 61
Slide 61 text
Polymorphic Inline Cache (PIC)
def add_polymorphic(a, b)
1
if type_ok?(a, Integer) && type_ok?(b, Integer)
2
m = Integer.instance_method(:+)
3
m.bind_call(a, b)
4
5
elsif type_ok?(a, String) && type_ok?(b, String)
6
m = String.instance_method(:+)
7
m.bind_call(a, b)
8
9
else
10
handle_miss!
11
end
12
end
13
Slide 62
Slide 62 text
Megamorphic
def add(a, b)
1
a + b
2
end
3
4
add(10, 20)
5
add('hello ', 'good people')
6
add([1, 2], [3, 4])
7
add(10, 20.0)
8
Slide 63
Slide 63 text
Megamorphic
def add_megamorphic(a, b)
1
# Look up method the slow way.
2
# The VM may update the Global Method Cache.
3
m = VM.lookup_method([a.class, :+])
4
5
m.bind_call(a, b)
6
end
7
Slide 64
Slide 64 text
JIT Compile Inline Cache
Take that internal VM state and turn it into machine code
Speculative optimization that a and b types are stable & the method isn’t redefined
The machine code can optimize for the specialized operation
add:
1
cmp [rdi + 0x20], 0xfe826359 ; Check if a.class is Integer
2
jne 0x12344321 ; Deoptimize if not an Integer
3
4
cmp [rsi + 0x20], 0xfe826359 ; Check if b.class is Integer
5
jne 0x12344321 ; Deoptimize if not an Integer
6
7
mov eax, rdi ; Copy `a` into EAX for addition
8
add eax, rsi ; Perform `a + b`
9
10
jo 0x67899876 ; Handle potential overflow
11
12
ret
13
Slide 65
Slide 65 text
JIT Compile Inline Cache
Take that internal VM state and turn it into machine code
Speculative optimization that a and b types are stable & the method isn’t redefined
The machine code can optimize for the specialized operation
add:
1
cmp [rdi + 0x20], 0xfe826359 ; Check if a.class is Integer
2
jne 0x12344321 ; Deoptimize if not an Integer
3
4
cmp [rsi + 0x20], 0xfe826359 ; Check if b.class is Integer
5
jne 0x12344321 ; Deoptimize if not an Integer
6
7
mov eax, rdi ; Copy `a` into EAX for addition
8
add eax, rsi ; Perform `a + b`
9
10
jo 0x67899876 ; Handle potential overflow
11
12
ret
13
Slide 66
Slide 66 text
Deoptimization
Recovers from bad guesses
Throws away compiled code fragment
Updates interpreter to resume interpreting that code
Resets the profiler to start profiling again
Optionally makes note about bad optimizations decisions
to avoid repeating deopt loops
Slide 67
Slide 67 text
Other Ruby JIT
Optimizations
Slide 68
Slide 68 text
Method Inlining
# Real implementation of empty? in TruffleRuby.
1
class Array
2
def empty?
3
size == 0
4
end
5
end
6
7
# Our method before inlining.
8
def blank?(o)
9
o.nil? || o.empty?
10
end
11
12
# Our method after inlining.
13
def blank_after_inlining?(o)
14
o.nil? || o.size == 0
15
end
16
Slide 69
Slide 69 text
Escape Analysis
The array never escapes
It stays within min?
No references to it appear anywhere else
The JIT compiler could eliminate the array allocation
def min?(value)
1
[value, 1000].min == value
2
end
3
Slide 70
Slide 70 text
Eliminate Metaprogramming Overhead
send
"abc".send(:size) is the same as "abc".size
method_missing
Implicitly call define_method so calls are fast
respond_to?
Can be made constant with inline cache
Slide 71
Slide 71 text
instance_variable_{get|set}
Turn into simple field accesses
Slide 72
Slide 72 text
Plenty of Room for Other
Optimizations!
Slide 73
Slide 73 text
JIT Compilers Recap
Conceptually simple
Take your Ruby code and transform it to optimized
machine code
Faster than interpreting
But incur a warm-up cost before hitting peak
performance
Optimize for the values flowing through your program
Speculative optimizations could be faster than AOT
Slide 74
Slide 74 text
Work best with idiomatic Ruby
Native extensions & clever hacks present barriers to JIT
optimization