Slide 1

Slide 1 text

Improving my own Ruby monochrome @s_isshiki1969 sisshiki1969

Slide 2

Slide 2 text

About me - General Surgeon (hardware engineer for human) - @s_isshiki1969 - https://github.com/sisshiki1969 - Loves Ruby and Rust and JIT compiler - My grandmother was born in Matsuyama - My father spent his childhood in Matsuyama

Slide 3

Slide 3 text

monoruby - https://github.com/sisshiki1969/monoruby - A yet another Ruby implementation with JIT compiler - Written in Rust from (almost) scratch - parser, garbage collector, interpreter - Only x86-64 / Linux is supported - new! Supports RubyGems. - not yet! Struggling with Bundler.

Slide 4

Slide 4 text

Compatibility - Supports - Bignum, Fiber, Binding - Redefining basic ops methods (like Integer#+) - Does NOT support - Native C extensions (but has alternatives) - Native threads - Encoding: supports only UTF-8 and ASCII-8BIT - ObjectSpace, TracePoint, Refinements, call/cc..

Slide 5

Slide 5 text

Micro benchmark (yjit-bench)

Slide 6

Slide 6 text

Optcarrot benchmark (~3000 frame)

Slide 7

Slide 7 text

deoptimization stats (top 20) FuncId func name [index] count ------------------------------------------------------------------------------------------------------------ ( 1884) CPU#fetch [:00003] 9483 %2 = %2.[%1] [Method][Integer] ( 2281) block in PPU#setup_lut [:00001] 1018 %2 = %1.object_id() POLY [Array] FuncId(11) ( 2370) block in Parser#find_option [:00001] 210 %2 = %1.to_s() POLY [Symbol] FuncId(14) ( 2013) block in ##op [:00003] 191 %6 = %6.is_a?(%7) POLY [Array] FuncId(21) global method cache stats (top 20) func name class count ------------------------------------------------------------------------ _bne Optcarrot::CPU 17914 _clc Optcarrot::CPU 7898 full method exploration stats (top 20) func name class count ------------------------------------------------------------------------ attr_reader # 30 key? Hash 29 jit recompile stats (top 20) FuncId func name class count -------------------------------------------------------------------------------------------- ( 2346) block in Config # 12 ( 492) Array#map Array 7 debug option “--profile”

Slide 8

Slide 8 text

<-- non-traced branch in FuncId(846). [:00011] _%6 = %6 === %5 [TrueClass][NilClass] <-- non-traced branch in FuncId(846). [:00011] _%6 = %6 === %5 [TrueClass][Array] <-- non-traced branch in FuncId(846). [:00011] _%6 = %6 === %5 [TrueClass][Gem::Requirement] <-- deopt occurs in FuncId(847). [:00002] %3 = %2.is_a?(%3) POLY [Array] FuncId(21) caused by # <-- deopt occurs in FuncId(847). [:00002] %3 = %2.is_a?(%3) POLY [Gem::Requirement] FuncId(21) caused by # <-- deopt occurs in FuncId(847). [:00002] %3 = %2.is_a?(%3) POLY [Gem::Requirement] FuncId(21) caused by "3.6.2" <-- deopt occurs in FuncId(848). [:00003] %2 = %2.nil?() POLY [Array] FuncId(64) caused by # <-- deopt occurs in FuncId(848). [:00003] %2 = %2.nil?() POLY [Gem::Requirement] FuncId(64) caused by # <-- deopt occurs in FuncId(848). [:00003] %2 = %2.nil?() POLY [Gem::Requirement] FuncId(64) caused by "3.6.2" <-- deopt occurs in FuncId(521). [:00004] %3 = %1.send(%3,*%4) [Symbol] FuncId(22) caused by :__version_guard <-- deopt occurs in FuncId(1275). [:00001] %1 = %1.flatten() [Array] FuncId(307) caused by :__version_guard <-- deopt occurs in FuncId(491). [:00001] %3 = %0.block_given?() [Array] FuncId(74) caused by :__version_guard <-- deopt occurs in FuncId(491). [:00012] %2 = %0.size() [Array] FuncId(250) caused by :__version_guard <-- non-traced branch in FuncId(1296). [:00003] %2 = %2.parse(%1) [#] FuncId(1274) <-- deopt occurs in <##parse> FuncId(1274). [:00001] %2 = Gem::Version [Gem::Version] <-- deopt occurs in <##new> FuncId(1313). [:00001] %2 = Gem::Version [Gem::Version] <-- non-traced branch in FuncId(1315). [:00003] %2 = %2.correct?(%1) [#] FuncId(1311) <-- deopt occurs in <##correct?> FuncId(1311). [:00001] %2 = %1.nil?() POLY [String] FuncId(64) caused by "0.3.3" <-- deopt occurs in FuncId(492). [:00001] %4 = %0.block_given?() [Array] FuncId(74) caused by :__version_guard <-- deopt occurs in FuncId(925). [:00001] %2 = %1.nil?() [Array] FuncId(64) caused by ["erb"] <-- non-traced branch in FuncId(850). [:00009] %1 = %1.flatten() [Array] FuncId(307) <-- deopt occurs in #register_default_spec> FuncId(665). [:00002] %2 = %1.start_with?(%2) [String] FuncId(195) caused by :__version_guard debug option “--deopt”

Slide 9

Slide 9 text

0xaa %dst %lhs %rhs ClassId(lhs) ClassId(rhs) 0x01 %dst CallsiteId cached FuncId METHOD_CALL 0x82 %rcv %args pos ADD_RR 8 bytes 8 bytes Bytecode (Virtual Machine instruction) cached ClassId cached version opcode operand trace info

Slide 10

Slide 10 text

Control frame Stack frame return addr prev rbp prev cfp lfp outer meta block %0 self %1 %2 %3 Local frame prev cfp outer lfp method a block b method c

Slide 11

Slide 11 text

JIT code compile deoptimize interpreter movsx rsi,WORD PTR [r13-0x10] movzx rdi,WORD PTR [r13-0xe] movzx r15,WORD PTR [r13-0xc] neg rdi mov rdi,QWORD PTR [r14+rdi*8-0x30] neg rsi mov rsi,QWORD PTR [r14+rsi*8-0x30] neg r15 lea r15,[r14+r15*8-0x30] test rdi,0x1 je slow_path test rsi,0x1 je slow_path mov DWORD PTR [r13-0x8],0x6 mov DWORD PTR [r13-0x4],0x6 mov rax,rdi sub al,0x1 add rax,rsi jo slow_path mov QWORD PTR [r15],rax movabs r15,0x561fe2169000 movzx rax,BYTE PTR [r13+0x6] add r13,0x10 jmp QWORD PTR [r15+rax*8] mov rdi,QWORD PTR [r14-0x38] test rdi,0x1 je deopt mov rsi,QWORD PTR [r14-0x40] test rsi,0x1 je deopt sub rdi,0x1 add rdi,rsi jo deopt mov r15,rdi deopt: mov r13, (pc) jmp interpreter fetch & dispatch execute deoptimize

Slide 12

Slide 12 text

We must generate (nice) asm code. JIT compiler is an extremely complex system.

Slide 13

Slide 13 text

%0 %1 %2 Stack slots in the local frame CPU registers float float float xmm0 xmm1 xmm2 RAX GPRs FPRs Hardware resources RDI R15 Memory ... (self, local variables, temporaries) ... ...

Slide 14

Slide 14 text

100 10 10.0 %1 %2 %3 3.14 %4 “State” of register (in interpreter) Stack Reg

Slide 15

Slide 15 text

10.0 xmm2 xmm3 xmm4 100 R15 10 %1 %2 %3 3.14 %4 “State” of register (in compiler) GPRs FPRs Stack Reg %1: GP(R15) INTEGER %2: Stack VALUE %3: FP(XMM3) FLOAT %4: Concrete( 3.14 ) FLOAT

Slide 16

Slide 16 text

:00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002 %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 10 %1 %2 %3 xmm2 xmm3 xmm4 R15 Stack Reg def area(r) r * r * 3.14 end area(10)

Slide 17

Slide 17 text

:00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002 %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 xmm2 xmm3 xmm4 100 R15 class guard %1:Integer R15 := %1 * %1 check overflow link %2 to R15 10 %1 %2 %3 000023: mov rdi,QWORD PTR [r14-0x20] 000027: mov rsi,QWORD PTR [r14-0x20] 00002b: test rdi,0x1 000032: je 0xfffb572 000038: sar rsi,1 00003b: sub rdi,0x1 00003f: imul rdi,rsi 000043: jo 0xfffb5a2 000049: or rdi,0x1 00004d: mov r15,rdi Stack Reg

Slide 18

Slide 18 text

:00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002 %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 xmm2 xmm3 xmm4 100 R15 link %3 to constant 3.14 10 %1 %2 %3 3.14 Stack Reg

Slide 19

Slide 19 text

:00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002 %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 100.0 3.14 xmm2 xmm3 xmm4 100 R15 class guard %2:Integer 10 100 %1 %2 %3 class guard %3:Float 000050: mov QWORD PTR [r14-0x28],r15 000054: sar r15,1 000057: cvtsi2sd xmm2,r15 00005c: movq xmm3,QWORD PTR [rip+0x20] 3.14 xmm2 := (Value to f64) R15 xmm3 := 3.14 Stack Reg

Slide 20

Slide 20 text

:00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002 %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 100.0 3.14 xmm2 xmm3 xmm4 100 R15 xmm4 := xmm2 * xmm3 10 100 %1 %2 %3 000064: movq xmm4,xmm2 000068: mulsd xmm4,xmm3 3.14 314.0 Stack Reg link %2 to xmm4

Slide 21

Slide 21 text

:00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002 %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 100.0 3.14 314.0 xmm2 xmm3 xmm4 100 R15 10 100 %1 %2 %3 Stack Reg

Slide 22

Slide 22 text

:00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002 %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 100.0 3.14 314.0 xmm2 xmm3 xmm4 100 R15 %2 := (f64 to Value) xmm4 10 314.0 %1 %2 %3 return %2 00006c: movq xmm0,xmm4 000070: call 0xfffcfa57 000075: mov QWORD PTR [r14-0x28],rax 000079: leave 00007a: ret Stack Reg

Slide 23

Slide 23 text

Specialization “generic” Array#each ary.each do |i| puts i end do |i| puts i end ary.each do |x,y| x + y end do |x,y| x + y end ary.each.flat_map class Array def each .. yield .. end end We don’t know until runtime: 1. which block is given (or not given) 2. a signature of given block

Slide 24

Slide 24 text

compile at one time Specialization “specialized” Array#each 1000.times do |i| puts i end do |i| puts i end class Array def each .. yield .. end end

Slide 25

Slide 25 text

BB0 :00000 [02] init_method reg:3 arg:0 stack_offset:7 :00001 [04] %3 = %0.block_given?() [Array] FuncId(74) :00003 [04] %2 = %3 :00004 [03] %2 = !%2 :00005 [02] condnotbr %2 => BB2 BB1 :00006 [03] %2 = :each :00007 [03] %2 = %0.to_enum(%2) [] - :00009 [02] ret %2 BB2 :00010 [02] %1 = 0: i32 BB3 :00011 [02] loop_start counter=0 jit-addr=0x0 :00012 [03] %2 = %0.size() [Array] FuncId(249) :00014 [03] _%2 = %1 < %2 [Integer][Integer] :00015 [02] condnotbr _%2 => BB5 BB4 :00016 [03] %2 = %0.[%1] [Array][Integer] :00017 [03] _ = yield(%2) :00019 [02] %1 = %1 + 1: i16 [Integer][Integer] :00020 [02] br => BB3 BB5 :00021 [02] loop_end BB6 :00022 [02] ret %0 class Array def each return self.to_enum(:each) if !block_given? i = 0 while i < self.size yield self[i] i += 1 end self end end

Slide 26

Slide 26 text

BB 0 :00000 init_method :00001 %3 = %0.block_given?() [Array] FuncId(74) :00003 %2 = %3 :00004 %2 = !%2 :00005 condnotbr %2 => BB2 BB 1 :00006 %2 = :each :00007 %2 = %0.to_enum(%2) :00009 ret %2 BB 2 :00010 %1 = 0: i32 BB 3 :00011 loop_start :00012 %2 = %0.size() [Array] FuncId(249) :00014 _%2 = %1 < %2 [Integer][Integer] :00015 condnotbr _%2 => BB5 BB 4 :00016 %2 = %0.[%1] [Array][Integer] :00017 _ = yield(%2) :00019 %1 = %1 + 1: i16 [Integer][Integer] :00020 br => BB3 BB 5 :00021 loop_end BB 6 :00022 ret %0

Slide 27

Slide 27 text

BB 0 :00000 init_method :00001 %3 = %0.block_given?() [Array] FuncId(74) :00003 %2 = %3 :00004 %2 = !%2 :00005 condnotbr %2 => BB2 BB 1 :00006 %2 = :each :00007 %2 = %0.to_enum(%2) :00009 ret %2 BB 2 :00010 %1 = 0: i32

Slide 28

Slide 28 text

fn kernel_block_given(...) -> bool { let dst = callsite.dst; if let Some(true) = jitctx.has_block() { if let Some(dst) = dst { bb.def_concrete_value(dst, Value::bool(true)); } } else { ir.inline(|gen, _, _| { let exit = gen.jit.label(); monoasm! { &mut gen.jit, movq rax, (FALSE_VALUE); movq rdi, [r14 - (LFP_BLOCK)]; testq rdi, rdi; jz exit; cmpq rdi, (NIL_VALUE); jeq exit; movq rax, (TRUE_VALUE); exit: } }); bb.rax2acc(ir, dst); } true } Inline (dynamic) assembly If we know the block is given, no code generated. (state change only) If not, we can generate asm directly Kernel#block_given?

Slide 29

Slide 29 text

BB 0 :00000 init_method :00001 %3 = %0.block_given?() [Array] FuncId(74) :00003 %2 = %3 :00004 %2 = !%2 :00005 condnotbr %2 => BB2 BB 1 :00006 %2 = :each :00007 %2 = %0.to_enum(%2) :00009 ret %2 BB 2 :00010 %1 = 0: i32 in this specialized context, always true we can remove this branch

Slide 30

Slide 30 text

BB0 :00000 init_method :00001 %3 = %0.block_given?() [Array] FuncId(74) :00003 %2 = %3 :00004 %2 = !%2 :00005 condnotbr %2 => BB2 BB1 :00006 %2 = :each :00007 %2 = %0.to_enum(%2) :00009 ret %2 BB2 :00010 %1 = 0: i32 BB3 :00011 loop_start :00012 %2 = %0.size() [Array] FuncId(249) :00014 _%2 = %1 < %2 [Integer][Integer] :00015 condnotbr _%2 => BB5 BB4 :00016 %2 = %0.[%1] [Array][Integer] :00017 _ = yield(%2) :00019 %1 = %1 + 1: i16 [Integer][Integer] :00020 br => BB3 BB5 :00021 loop_end BB6 :00022 ret %0 asm inlined asm inlined

Slide 31

Slide 31 text

Generally, yield is slow BB 4 :00016 %2 = %0.[%1] :00017 _ = yield(%2) :00019 %1 = %1 + 1: i16 :00020 br => BB3 in this specialized context, we know the signature of callee block in compile time. - can not know which block is given - can not know the signature of callee - must use indirect branch

Slide 32

Slide 32 text

benchmark: specialization

Slide 33

Slide 33 text

Special thanks: @k0kubun,@yhara, @raviqqe @ko1, @mametter Wellcome to RubyKaigi. Asm