Slide 1

Slide 1 text

Don't @ me! Faster Instance Variables with Object Shapes

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Aaron Patterson

Slide 4

Slide 4 text

15 min standup

Slide 5

Slide 5 text

Ruby Core Team

Slide 6

Slide 6 text

Rails Core Team

Slide 7

Slide 7 text

@tenderlove mastodon.social/@tenderlove GitHub Cohost Instagram

Slide 8

Slide 8 text

LinkedIn: tenderlove

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Ruby Infrastructure

Slide 11

Slide 11 text

Instance Variables

Slide 12

Slide 12 text

Instance Variables: TMI

Slide 13

Slide 13 text

Object Shapes

Slide 14

Slide 14 text

YJIT

Slide 15

Slide 15 text

All in 30 min! LOL

Slide 16

Slide 16 text

Thanks!! 🥰🥰🥰🥰🥰🥰🥰🥰🥰 Ruby Infrastructure YJIT team Jemma Issroff Maxime Chevalier-Boisvert John Hawthorn @ GitHub

Slide 17

Slide 17 text

How do IVARS work? Don't @ me!

Slide 18

Slide 18 text

Instance Variables = IVARs = IVs

Slide 19

Slide 19 text

Implementing Instance Variables

Slide 20

Slide 20 text

Instance Variables How to store them? class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end Hello.new Instance of Hello IV Hash Table IV Name IV Value :@foo 1 :@bar 2

Slide 21

Slide 21 text

Hash based implementation If it were written in Ruby class Object def initialize @ivs = {} # Magic instance variable hash end def instance_variable_set name, value @ivs[name] = value end def instance_variable_get name @ivs[name] end def instance_variable_defined? name # ooohhh, why did he put this method in the example? # I bet it's foreshadowing! @ivs.key? name end end

Slide 22

Slide 22 text

Ruby <= 1.8.X

Slide 23

Slide 23 text

Tree Walking Interpreter class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end + @foo @bar 1 2 3 Hash Lookup! Hash Lookup!

Slide 24

Slide 24 text

Ruby 1.9: YARV

Slide 25

Slide 25 text

YARV Execution Code is compiled to instructions before execution class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end hi = Hello.new hi.foo Source Code Byte Code for "foo" [:getivar, :@foo] [:getivar, :@bar] [:plus] VM Stack 1 2 3

Slide 26

Slide 26 text

Instruction Implementation Example implementation written in Ruby def getivar name get_self.instance_variables[name] end getivar Implementation [:getivar, :@foo] [:getivar, :@bar] [:plus] Get self from current stack frame G et H ash of IVS

Slide 27

Slide 27 text

Hashes are slow Compared to Arrays

Slide 28

Slide 28 text

Hashes Use Memory Compared to Arrays

Slide 29

Slide 29 text

Lets Use an Array! Instead of a Hash!

Slide 30

Slide 30 text

Instance Variables How to store them? class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end Hello.new Instance of Hello IV Index Table Name Index :@foo 0 :@bar 1 Hello Class IV Array 0 1 1 2

Slide 31

Slide 31 text

Instance Variables (second instance) How to store them? class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end Hello.new Instance of Hello IV Index Table Name Index :@foo 0 :@bar 1 Hello Class IV Array 0 1 1 2

Slide 32

Slide 32 text

Instance Variables (many instances) Hash table size is amortized class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end Hello.new Instance of Hello IV Index Table Name Index :@foo 0 :@bar 1 Hello Class IV Array 0 1 1 2 Instance of Hello 0 1 1 2

Slide 33

Slide 33 text

Instance Variables Storage Location References are stored inside the object (it's in the computer) class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end Hello.new Instance of Hello In-Memory Layout Byte Index Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 24 32 First IV Second IV Third IV 1 2 Qundef

Slide 34

Slide 34 text

Storage Location Depends on Type Objects store instance variables "in line", others in an external table class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end class PleaseDoNotDoThis < Array def initialize @foo = 1 @bar = 2 super end def foo @foo + @bar end end

Slide 35

Slide 35 text

Instruction Implementation "foo" method instructions def getivar name # get the class klass = get_self.class # get the index of the ivar index = klass.ivar_index[name] if get_self.is_a?(Object) # get the ivar value get_self.instance_variables[index] else # do something different end end getivar Implementation [:getivar, :@foo] [:getivar, :@bar] [:plus] Still doing a hash lookup 😆

Slide 36

Slide 36 text

Inline Caches

Slide 37

Slide 37 text

Instruction Implementation "foo" method instructions, with inline caches def getivar name, cache # If there is no cached index unless cache.index # get the class klass = get_self.class # get the index of the ivar index = klass.ivar_index[name] # store the index cache.index = index end # get the cached index index = cache.index if get_self.is_a?(Object) # get the ivar value get_self.instance_variables[index] else # do something different end end getivar Implementation [:getivar, :@foo, cache] [:getivar, :@bar, cache] [:plus] Find and cache the index Use the cached index

Slide 38

Slide 38 text

Usually No Hash Lookups!

Slide 39

Slide 39 text

Cache Lookup Problem Name to Index mapping is per class class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end class World < Hello def initialize @oops = "yikes!!" super end end Hello.new.foo World.new.foo Instance of Hello IV Index Table Name Index :@foo 0 :@bar 1 Hello Class IV Array 0 1 1 2 Cache Index 0 and 1

Slide 40

Slide 40 text

Cache Lookup Problem Name to Index mapping is per class class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end class World < Hello def initialize @oops = "yikes!!" super end end Hello.new.foo World.new.foo IV Index Table Name Index :@foo 0 :@bar 1 Hello Class Cache Index 0 and 1 Name Index :@oops 0 :@foo 1 :@bar 2 World Class Oops was set fi rst! Oops was set fi rst!

Slide 41

Slide 41 text

Use Class as a Cache Key

Slide 42

Slide 42 text

Compare Class in Cache Cache miss if no index or the class doesn't match def getivar name, cache # get the class klass = get_self.class # If there is no cached index and class doesn't match if !(cache.index && cache.klass == klass) # get the index of the ivar index = klass.ivar_index[name] # store the index cache.index = index # store the class cache.klass = klass end # get the cached index index = cache.index if get_self.is_a?(Object) # get the ivar value get_self.instance_variables[index] else # do something different end end Class must match and IV index set Return value at index inside list

Slide 43

Slide 43 text

Subclasses Cause Cache Misses Since the class is a cache key, subclasses can't share cache with superclass class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end class World < Hello end hello = Hello.new world = World.new loop do hello.foo world.foo end IV Index Table Name Index :@foo 0 :@bar 1 Hello Class Name Index :@foo 0 :@bar 1 World Class

Slide 44

Slide 44 text

🐵🔧

Slide 45

Slide 45 text

class Hello def initialize(set_bar) @foo = 1 @bar = 2 if set_bar @baz = 3 end def foo if !instance_variable_defined?(:@bar) puts "oh!" end @foo + @bar.to_i end end p Hello.new(true).foo # => 3 p Hello.new(false).foo # => 1 Handling "Undefined" Instance Variables Unde fi ned IVs return `nil`, but how do we know it's unde fi ned? IV Index Table Name Index :@foo 0 :@bar 1 :@baz 2 Hello Class Hello Instance In-Memory Layout Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 1 24 2 32 3

Slide 46

Slide 46 text

class Hello def initialize(set_bar) @foo = 1 @bar = 2 if set_bar @baz = 3 end def foo if !instance_variable_defined?(:@bar) puts "oh!" end @foo + @bar.to_i end end p Hello.new(true).foo # => 3 p Hello.new(false).foo # => 1 Handling "Undefined" Instance Variables Unde fi ned IVs return `nil`, but how do we know it's unde fi ned? IV Index Table Name Index :@foo 0 :@bar 1 :@baz 2 Hello Class Hello Instance In-Memory Layout Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 1 24 Qundef (0x24) 32 3 Cache Index 0 and 1

Slide 47

Slide 47 text

Return `nil` for Undefined IVs If the value stored in the array is Qundef, return nil, otherwise return the value def getivar name, cache # get the class klass = get_self.class # If there is no cached index and class doesn't match if !(cache.index && cache.klass == klass) # get the index of the ivar index = klass.ivar_index[name] # store the index cache.index = index # store the class cache.klass = klass end # get the cached index index = cache.index if get_self.is_a?(Object) # get the ivar value iv = get_self.instance_variables[index] if iv == Qundef nil else iv end else # do something different end end Return nil if Qundef

Slide 48

Slide 48 text

😫😫😫😫😫

Slide 49

Slide 49 text

Conditionals for Reading an IV Just a Recap! • Is an index set? • Do the classes match? • Is it an "Object" type? • Is the IV value equal to Qundef?

Slide 50

Slide 50 text

JIT Compilation

Slide 51

Slide 51 text

JIT Compilation JIT compiler translates byte code to machine code class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end hi = Hello.new hi.foo Source Code Byte Code for "foo" [:getivar, :@foo, cache] [:getivar, :@bar, cache] [:plus] Machine Code == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55a658d0a6dd: test qword ptr [r13 + 0x18], 7 0x55a658d0a6e5: jne 0x55a660d0a0e5 0x55a658d0a6eb: cmp qword ptr [r13 + 0x18], 8 0x55a658d0a6f0: jbe 0x55a660d0a0fe 0x55a658d0a6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a6fa: movabs rcx, 0x7fbb2af48f20 0x55a658d0a704: cmp qword ptr [rax + 8], rcx 0x55a658d0a708: jne 0x55a660d0a117 0x55a658d0a70e: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a712: cmp qword ptr [rax + 0x10], 0 0x55a658d0a717: jbe 0x55a660d0a0cc # guard embedded getivar 0x55a658d0a71d: test word ptr [rax], 0x2000 0x55a658d0a722: je 0x55a660d0a130 0x55a658d0a728: cmp qword ptr [rax + 0x18], 0x34 0x55a658d0a72d: mov ecx, 8 0x55a658d0a732: cmovne rcx, qword ptr [rax + 0x18] 0x55a658d0a737: mov qword ptr [rbx], rcx == BLOCK 2/5, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/5, ISEQ RANGE [3,6), 69 bytes ====================== # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x55a658d0a73a: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a73e: movabs rcx, 0x7fbb2af48f20 0x55a658d0a748: cmp qword ptr [rax + 8], rcx 0x55a658d0a74c: jne 0x55a660d0a183 0x55a658d0a752: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a756: cmp qword ptr [rax + 0x10], 1 0x55a658d0a75b: jbe 0x55a660d0a162 # guard embedded getivar 0x55a658d0a761: test word ptr [rax], 0x2000 0x55a658d0a766: je 0x55a660d0a19c 0x55a658d0a76c: cmp qword ptr [rax + 0x20], 0x34 0x55a658d0a771: mov ecx, 8 0x55a658d0a776: cmovne rcx, qword ptr [rax + 0x20] 0x55a658d0a77b: mov qword ptr [rbx + 8], rcx == BLOCK 4/5, ISEQ RANGE [6,8), 0 bytes ======================= == BLOCK 5/5, ISEQ RANGE [6,9), 86 bytes ====================== # opt_plus # regenerate_branch # opt_plus # guard arg0 fixnum # regenerate_branch 0x55a658d0a77f: test byte ptr [rbx], 1 0x55a658d0a782: je 0x55a660d0a1ef # guard arg1 fixnum 0x55a658d0a788: test byte ptr [rbx + 8], 1 0x55a658d0a78c: je 0x55a660d0a208 0x55a658d0a792: mov rax, qword ptr [rbx] 0x55a658d0a795: sub rax, 1 0x55a658d0a799: add rax, qword ptr [rbx + 8] 0x55a658d0a79d: jo 0x55a660d0a1ce 0x55a658d0a7a3: mov qword ptr [rbx], rax # leave # RUBY_VM_CHECK_INTS(ec) 0x55a658d0a7a6: mov eax, dword ptr [r12 + 0x24] 0x55a658d0a7ab: not eax 0x55a658d0a7ad: test dword ptr [r12 + 0x20], eax 0x55a658d0a7b2: jne 0x55a660d0a221 # pop stack frame 0x55a658d0a7b8: mov rax, r13 0x55a658d0a7bb: add rax, 0x40 0x55a658d0a7bf: mov r13, rax Machine Code == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55a658d0a6dd: test qword ptr [r13 + 0x18], 7 0x55a658d0a6e5: jne 0x55a660d0a0e5 0x55a658d0a6eb: cmp qword ptr [r13 + 0x18], 8 0x55a658d0a6f0: jbe 0x55a660d0a0fe 0x55a658d0a6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a6fa: movabs rcx, 0x7fbb2af48f20 0x55a658d0a704: cmp qword ptr [rax + 8], rcx 0x55a658d0a708: jne 0x55a660d0a117 0x55a658d0a70e: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a712: cmp qword ptr [rax + 0x10], 0 0x55a658d0a717: jbe 0x55a660d0a0cc # guard embedded getivar 0x55a658d0a71d: test word ptr [rax], 0x2000 0x55a658d0a722: je 0x55a660d0a130 0x55a658d0a728: cmp qword ptr [rax + 0x18], 0x34 0x55a658d0a72d: mov ecx, 8 0x55a658d0a732: cmovne rcx, qword ptr [rax + 0x18] 0x55a658d0a737: mov qword ptr [rbx], rcx == BLOCK 2/5, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/5, ISEQ RANGE [3,6), 69 bytes ====================== # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x55a658d0a73a: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a73e: movabs rcx, 0x7fbb2af48f20 0x55a658d0a748: cmp qword ptr [rax + 8], rcx 0x55a658d0a74c: jne 0x55a660d0a183 0x55a658d0a752: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a756: cmp qword ptr [rax + 0x10], 1 0x55a658d0a75b: jbe 0x55a660d0a162 # guard embedded getivar 0x55a658d0a761: test word ptr [rax], 0x2000 0x55a658d0a766: je 0x55a660d0a19c 0x55a658d0a76c: cmp qword ptr [rax + 0x20], 0x34 0x55a658d0a771: mov ecx, 8 0x55a658d0a776: cmovne rcx, qword ptr [rax + 0x20] 0x55a658d0a77b: mov qword ptr [rbx + 8], rcx == BLOCK 4/5, ISEQ RANGE [6,8), 0 bytes ======================= == BLOCK 5/5, ISEQ RANGE [6,9), 86 bytes ====================== # opt_plus # regenerate_branch # opt_plus # guard arg0 fixnum # regenerate_branch 0x55a658d0a77f: test byte ptr [rbx], 1 0x55a658d0a782: je 0x55a660d0a1ef # guard arg1 fixnum 0x55a658d0a788: test byte ptr [rbx + 8], 1 0x55a658d0a78c: je 0x55a660d0a208 0x55a658d0a792: mov rax, qword ptr [rbx] 0x55a658d0a795: sub rax, 1 0x55a658d0a799: add rax, qword ptr [rbx + 8] 0x55a658d0a79d: jo 0x55a660d0a1ce 0x55a658d0a7a3: mov qword ptr [rbx], rax # leave # RUBY_VM_CHECK_INTS(ec) 0x55a658d0a7a6: mov eax, dword ptr [r12 + 0x24] 0x55a658d0a7ab: not eax 0x55a658d0a7ad: test dword ptr [r12 + 0x20], eax 0x55a658d0a7b2: jne 0x55a660d0a221 # pop stack frame 0x55a658d0a7b8: mov rax, r13 0x55a658d0a7bb: add rax, 0x40 0x55a658d0a7bf: mov r13, rax Machine Code == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55a658d0a6dd: test qword ptr [r13 + 0x18], 7 0x55a658d0a6e5: jne 0x55a660d0a0e5 0x55a658d0a6eb: cmp qword ptr [r13 + 0x18], 8 0x55a658d0a6f0: jbe 0x55a660d0a0fe 0x55a658d0a6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a6fa: movabs rcx, 0x7fbb2af48f20 0x55a658d0a704: cmp qword ptr [rax + 8], rcx 0x55a658d0a708: jne 0x55a660d0a117 0x55a658d0a70e: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a712: cmp qword ptr [rax + 0x10], 0 0x55a658d0a717: jbe 0x55a660d0a0cc # guard embedded getivar 0x55a658d0a71d: test word ptr [rax], 0x2000 0x55a658d0a722: je 0x55a660d0a130 0x55a658d0a728: cmp qword ptr [rax + 0x18], 0x34 0x55a658d0a72d: mov ecx, 8 0x55a658d0a732: cmovne rcx, qword ptr [rax + 0x18] 0x55a658d0a737: mov qword ptr [rbx], rcx == BLOCK 2/5, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/5, ISEQ RANGE [3,6), 69 bytes ====================== # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x55a658d0a73a: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a73e: movabs rcx, 0x7fbb2af48f20 0x55a658d0a748: cmp qword ptr [rax + 8], rcx 0x55a658d0a74c: jne 0x55a660d0a183 0x55a658d0a752: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a756: cmp qword ptr [rax + 0x10], 1 0x55a658d0a75b: jbe 0x55a660d0a162 # guard embedded getivar 0x55a658d0a761: test word ptr [rax], 0x2000 0x55a658d0a766: je 0x55a660d0a19c 0x55a658d0a76c: cmp qword ptr [rax + 0x20], 0x34 0x55a658d0a771: mov ecx, 8 0x55a658d0a776: cmovne rcx, qword ptr [rax + 0x20] 0x55a658d0a77b: mov qword ptr [rbx + 8], rcx == BLOCK 4/5, ISEQ RANGE [6,8), 0 bytes ======================= == BLOCK 5/5, ISEQ RANGE [6,9), 86 bytes ====================== # opt_plus # regenerate_branch # opt_plus # guard arg0 fixnum # regenerate_branch 0x55a658d0a77f: test byte ptr [rbx], 1 0x55a658d0a782: je 0x55a660d0a1ef # guard arg1 fixnum 0x55a658d0a788: test byte ptr [rbx + 8], 1 0x55a658d0a78c: je 0x55a660d0a208 0x55a658d0a792: mov rax, qword ptr [rbx] 0x55a658d0a795: sub rax, 1 0x55a658d0a799: add rax, qword ptr [rbx + 8] 0x55a658d0a79d: jo 0x55a660d0a1ce 0x55a658d0a7a3: mov qword ptr [rbx], rax # leave # RUBY_VM_CHECK_INTS(ec) 0x55a658d0a7a6: mov eax, dword ptr [r12 + 0x24] 0x55a658d0a7ab: not eax 0x55a658d0a7ad: test dword ptr [r12 + 0x20], eax 0x55a658d0a7b2: jne 0x55a660d0a221 # pop stack frame 0x55a658d0a7b8: mov rax, r13 0x55a658d0a7bb: add rax, 0x40 0x55a658d0a7bf: mov r13, rax Machine Code

Slide 52

Slide 52 text

Machine Code for Reading an IV def getivar name, cache # get the class klass = get_self.class # If there is no cached index and class doesn't match if !(cache.index && cache.klass == klass) # get the index of the ivar index = klass.ivar_index[name] # store the index cache.index = index # store the class cache.klass = klass end # get the cached index index = cache.index if get_self.is_a?(Object) # get the ivar value iv = get_self.instance_variables[index] if iv == Qundef nil else iv end else # do something different end end Instruction Implementation == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55a658d0a6dd: test qword ptr [r13 + 0x18], 7 0x55a658d0a6e5: jne 0x55a660d0a0e5 0x55a658d0a6eb: cmp qword ptr [r13 + 0x18], 8 0x55a658d0a6f0: jbe 0x55a660d0a0fe 0x55a658d0a6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a6fa: movabs rcx, 0x7fbb2af48f20 0x55a658d0a704: cmp qword ptr [rax + 8], rcx 0x55a658d0a708: jne 0x55a660d0a117 0x55a658d0a70e: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a712: cmp qword ptr [rax + 0x10], 0 0x55a658d0a717: jbe 0x55a660d0a0cc # guard embedded getivar 0x55a658d0a71d: test word ptr [rax], 0x2000 0x55a658d0a722: je 0x55a660d0a130 0x55a658d0a728: cmp qword ptr [rax + 0x18], 0x34 0x55a658d0a72d: mov ecx, 8 0x55a658d0a732: cmovne rcx, qword ptr [rax + 0x18] 0x55a658d0a737: mov qword ptr [rbx], rcx Generated Machine Code def getivar name, cache # get the class klass = get_self.class # If there is no cached index and class doesn't match if !(cache.index && cache.klass == klass) # get the index of the ivar index = klass.ivar_index[name] # store the index cache.index = index # store the class cache.klass = klass end # get the cached index index = cache.index if get_self.is_a?(Object) # get the ivar value iv = get_self.instance_variables[index] if iv == Qundef nil else iv end else # do something different end end Instruction Implementation == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55a658d0a6dd: test qword ptr [r13 + 0x18], 7 0x55a658d0a6e5: jne 0x55a660d0a0e5 0x55a658d0a6eb: cmp qword ptr [r13 + 0x18], 8 0x55a658d0a6f0: jbe 0x55a660d0a0fe 0x55a658d0a6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a6fa: movabs rcx, 0x7fbb2af48f20 0x55a658d0a704: cmp qword ptr [rax + 8], rcx 0x55a658d0a708: jne 0x55a660d0a117 0x55a658d0a70e: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a712: cmp qword ptr [rax + 0x10], 0 0x55a658d0a717: jbe 0x55a660d0a0cc # guard embedded getivar 0x55a658d0a71d: test word ptr [rax], 0x2000 0x55a658d0a722: je 0x55a660d0a130 0x55a658d0a728: cmp qword ptr [rax + 0x18], 0x34 0x55a658d0a72d: mov ecx, 8 0x55a658d0a732: cmovne rcx, qword ptr [rax + 0x18] 0x55a658d0a737: mov qword ptr [rbx], rcx Generated Machine Code def getivar name, cache # get the class klass = get_self.class # If there is no cached index and class doesn't match if !(cache.index && cache.klass == klass) # get the index of the ivar index = klass.ivar_index[name] # store the index cache.index = index # store the class cache.klass = klass end # get the cached index index = cache.index if get_self.is_a?(Object) # get the ivar value iv = get_self.instance_variables[index] if iv == Qundef nil else iv end else # do something different end end Instruction Implementation == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55a658d0a6dd: test qword ptr [r13 + 0x18], 7 0x55a658d0a6e5: jne 0x55a660d0a0e5 0x55a658d0a6eb: cmp qword ptr [r13 + 0x18], 8 0x55a658d0a6f0: jbe 0x55a660d0a0fe 0x55a658d0a6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a6fa: movabs rcx, 0x7fbb2af48f20 0x55a658d0a704: cmp qword ptr [rax + 8], rcx 0x55a658d0a708: jne 0x55a660d0a117 0x55a658d0a70e: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a712: cmp qword ptr [rax + 0x10], 0 0x55a658d0a717: jbe 0x55a660d0a0cc # guard embedded getivar 0x55a658d0a71d: test word ptr [rax], 0x2000 0x55a658d0a722: je 0x55a660d0a130 0x55a658d0a728: cmp qword ptr [rax + 0x18], 0x34 0x55a658d0a72d: mov ecx, 8 0x55a658d0a732: cmovne rcx, qword ptr [rax + 0x18] 0x55a658d0a737: mov qword ptr [rbx], rcx Generated Machine Code def getivar name, cache # get the class klass = get_self.class # If there is no cached index and class doesn't match if !(cache.index && cache.klass == klass) # get the index of the ivar index = klass.ivar_index[name] # store the index cache.index = index # store the class cache.klass = klass end # get the cached index index = cache.index if get_self.is_a?(Object) # get the ivar value iv = get_self.instance_variables[index] if iv == Qundef nil else iv end else # do something different end end Instruction Implementation == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55a658d0a6dd: test qword ptr [r13 + 0x18], 7 0x55a658d0a6e5: jne 0x55a660d0a0e5 0x55a658d0a6eb: cmp qword ptr [r13 + 0x18], 8 0x55a658d0a6f0: jbe 0x55a660d0a0fe 0x55a658d0a6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a6fa: movabs rcx, 0x7fbb2af48f20 0x55a658d0a704: cmp qword ptr [rax + 8], rcx 0x55a658d0a708: jne 0x55a660d0a117 0x55a658d0a70e: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a712: cmp qword ptr [rax + 0x10], 0 0x55a658d0a717: jbe 0x55a660d0a0cc # guard embedded getivar 0x55a658d0a71d: test word ptr [rax], 0x2000 0x55a658d0a722: je 0x55a660d0a130 0x55a658d0a728: cmp qword ptr [rax + 0x18], 0x34 0x55a658d0a72d: mov ecx, 8 0x55a658d0a732: cmovne rcx, qword ptr [rax + 0x18] 0x55a658d0a737: mov qword ptr [rbx], rcx Generated Machine Code def getivar name, cache # get the class klass = get_self.class # If there is no cached index and class doesn't match if !(cache.index && cache.klass == klass) # get the index of the ivar index = klass.ivar_index[name] # store the index cache.index = index # store the class cache.klass = klass end # get the cached index index = cache.index if get_self.is_a?(Object) # get the ivar value iv = get_self.instance_variables[index] if iv == Qundef nil else iv end else # do something different end end Instruction Implementation

Slide 53

Slide 53 text

93 Bytes for 1 IV

Slide 54

Slide 54 text

We Can Do Better!

Slide 55

Slide 55 text

Object Shapes

Slide 56

Slide 56 text

Not these shapes

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

Shape Transitions on Write Shapes form a tree representing Object properties class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end Sample Code Shape Tree Root id: 0 @foo id: 1, index: 0 @bar id: 2, index: 1 @foo @bar from: 0, to: 1, iv index: 0 from:1, to: 2, iv index: 1 Cache Key Cache Key Destination Shape Destination Shape IV Index IV Index

Slide 59

Slide 59 text

Shape Tree Shape Transitions on Write Shape ID is used as the cache key class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end Sample Code Root id: 0 @foo id: 1, index: 0 @bar id: 2, index: 1 @foo @bar from: 0, to: 1, iv index: 0 from:1, to: 2, iv index: 1

Slide 60

Slide 60 text

Shapes Form a Graph

Slide 61

Slide 61 text

All Object Start From a "Root Shape"

Slide 62

Slide 62 text

Objects Only Change Shape on Writes

Slide 63

Slide 63 text

Shape ID is the Cache Key

Slide 64

Slide 64 text

Class is not a cache key

Slide 65

Slide 65 text

Object can share shapes Hello and World can share caches class Hello def initialize @foo = 1 @bar = 2 end end class World < Hello def initialize super @baz = 3 end end Sample Code Shape Tree from: 0, to: 1, iv index: 0 from:1, to: 2, iv index: 1 from:2, to: 3, iv index: 2 Root id: 0 @foo id: 1, index: 0 @bar id: 2, index: 1 @baz id:3, index:2 Shared between Hello and World instances

Slide 66

Slide 66 text

Cross Type Memory Amortization

Slide 67

Slide 67 text

Cross Type Cache Hits

Slide 68

Slide 68 text

Shared Shape Tree Shape Tree is Shared All objects use the shape tree, so more types can share info class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end class World < Hello end hello = Hello.new world = World.new loop do hello.foo world.foo end IV Index Table Name Index :@foo 0 :@bar 1 Hello Class Name Index :@foo 0 :@bar 1 World Class Root id: 0 @foo id: 1, index: 0 @bar id: 2, index: 1 Same shape on both instances Cache Shape 2 and 2

Slide 69

Slide 69 text

Cross Type Cache Hits require 'harness' class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end class World < Hello end hello = Hello.new world = World.new run_benchmark(100) do i = 0 while i < 90_000 hello.foo world.foo i += 1 end end Microbenchmark before: ruby 3.2.0dev (2022-09-28T14:51:38Z before-shapes a05b261464) [x86_64-linux] after: ruby 3.2.0dev (2022-11-22T05:20:45Z master 20b9d7b9fd) [x86_64-linux] ------------------- ----------- ---------- ---------- ---------- ------------ ------------- bench before (ms) stddev (%) after (ms) stddev (%) before/after after 1st itr getivar-polymorphic 12.1 1.4 4.4 2.1 2.76 2.82 ------------------- ----------- ---------- ---------- ---------- ------------ ------------- Legend: - before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup. - after 1st itr: ratio of before/after time for the first benchmarking iteration. Results 2.76x Faster

Slide 70

Slide 70 text

Memory Usage Improvements

Slide 71

Slide 71 text

Classes store their name as an IV

Slide 72

Slide 72 text

Class Name is an IV Class names are stored as an instance variable on the class instance class Hello def initialize @foo = 1 @bar = 2 end def foo @foo + @bar end end puts Hello.name # => IV read

Slide 73

Slide 73 text

No content

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

Not All Properties Are Instance Variables

Slide 76

Slide 76 text

Freezing is a shape transition

Slide 77

Slide 77 text

Freezing Changes Shape When we freeze an object, it changes shape class Hello def initialize @foo = 1 @bar = 2 end def set @baz = 3 end end hello = Hello.new hello.set hello = Hello.new hello.freeze hello.set Sample Code Shape Tree Root id: 0 @foo id: 1, index: 0 @bar id: 2, index: 1 @baz id:3, index:2 Shape: 2 from: 0, to: 1, iv index: 0 from: 1, to: 2, iv index: 1 from: 2, to: 3, iv index: 2 Shape: 3 Shape: 2 Shape: 4 @foo @bar @baz frozen id:4 freeze

Slide 78

Slide 78 text

Set Instance Variable Instruction Frozen check only on cache misses def setinstancevariable iv_name, cache if get_self.frozen? raise "It's frozen!" end if cache.klass == get_self.class && cache.index # CACHE HIT!! # set the instance variable else cache.klass = get_self.class cache.index = get_self.iv_index_table[iv_name] # set the instance variable end end Before Shapes def setinstancevariable iv_name, cache if cache.from_shape_id == get_self.shape_id # CACHE HIT!! # set the instance variable else if get_self.frozen? raise "It's frozen!" end cache.shape_id = get_self.shape_id # set the instance variable end end After Shapes

Slide 79

Slide 79 text

Frozen Checks only on Cache Misses

Slide 80

Slide 80 text

IV Write Performance Improvement require 'harness' class TheClass def initialize @v0 = 1 @v1 = 2 @v3 = 3 @levar = 1 end def set_value_loop # 1M i = 0 while i < 1000000 # 10 times to de-emphasize loop overhead @levar = i @levar = i @levar = i @levar = i @levar = i @levar = i @levar = i @levar = i @levar = i @levar = i i += 1 end end end obj = TheClass.new run_benchmark(100) do obj.set_value_loop end Micro Benchmark before: ruby 3.2.0dev (2022-09-28T14:51:38Z before-shapes a05b261464) [x86_64-linux] after: ruby 3.2.0dev (2022-11-22T05:20:45Z master 20b9d7b9fd) [x86_64-linux] ------- ----------- ---------- ---------- ---------- ------------ ------------- bench before (ms) stddev (%) after (ms) stddev (%) before/after after 1st itr setivar 64.0 0.7 53.0 2.5 1.21 1.19 ------- ----------- ---------- ---------- ---------- ------------ ------------- Legend: - before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup. - after 1st itr: ratio of before/after time for the first benchmarking iteration Results 21% Faster

Slide 81

Slide 81 text

JIT Performance

Slide 82

Slide 82 text

Object Layout All objects have 2 common fi elds: " fl ags" and "class" Basic Object Layout Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 24 32 T_OBJECT Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 Instance Variable 24 Instance Variable 32 Instance Variable T_ARRAY Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 Array Element 24 Array Element 32 Array Element

Slide 83

Slide 83 text

Let's Check the Flags Field!

Slide 84

Slide 84 text

64 bits (width of a pointer)

Slide 85

Slide 85 text

Flags Bitmap Layout Bottom 5 bits represent Object Type Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type ruby_value_type { RUBY_T_OBJECT = 0x01, /**< @see struct ::RObject */ RUBY_T_CLASS = 0x02, /**< @see struct ::RClass and ::rb_cClass */ RUBY_T_MODULE = 0x03, /**< @see struct ::RClass and ::rb_cModule */ RUBY_T_FLOAT = 0x04, /**< @see struct ::RFloat */ RUBY_T_STRING = 0x05, /**< @see struct ::RString */ RUBY_T_REGEXP = 0x06, /**< @see struct ::RRegexp */ RUBY_T_ARRAY = 0x07, /**< @see struct ::RArray */ RUBY_T_HASH = 0x08, /**< @see struct ::RHash */ RUBY_T_STRUCT = 0x09, /**< @see struct ::RStruct */ RUBY_T_BIGNUM = 0x0a, /**< @see struct ::RBignum */ RUBY_T_FILE = 0x0b, /**< @see struct ::RFile */ RUBY_T_DATA = 0x0c, /**< @see struct ::RTypedData */ RUBY_T_MATCH = 0x0d, /**< @see struct ::RMatch */ RUBY_T_COMPLEX = 0x0e, /**< @see struct ::RComplex */ RUBY_T_RATIONAL = 0x0f, /**< @see struct ::RRational */ }

Slide 86

Slide 86 text

Flags Bitmap Layout Bottom 12 bits have a common "meaning" (see fl _type.h) Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Object ID has been seen? Object.new.object_id [].object_id

Slide 87

Slide 87 text

Flags Bitmap Layout Bottom 12 bits have a common "meaning" (see fl _type.h) Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Object ID has been seen? Object.new.object_id [].object_id

Slide 88

Slide 88 text

Flags Bitmap Layout Object Type gives upper bits meaning Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Common Stu ff (see fl _type.h) Object Extended? Depends on Object Type

Slide 89

Slide 89 text

T_OBJECT Extended Layout Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 Pointer to Bu ff er 24 32 IV Array Byte Value 0 Instance Variable 8 Instance Variable 16 Instance Variable 24 Instance Variable 32 Instance Variable ... ... class Hello def initialize @foo = 1 @bar = 2 @baz = 3 @hoge = 4 end end Hello.new T_OBJECT Layout Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 Instance Variable 24 Instance Variable 32 Instance Variable T_OBJECT Layout

Slide 90

Slide 90 text

Flags Bitmap Layout Extended Bit means "read from external table" Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Common Stu ff (see fl _type.h) Object Extended? Depends on Object Type

Slide 91

Slide 91 text

JIT Compilation JIT compilation must write guards for assumptions class Hello def initialize @foo = 1 @bar = 2 @baz = 3 @hoge = 4 end def foo @foo + @bar end end What is the type? Is it embedded or extended? Is the IV Qundef? Is the Class correct?

Slide 92

Slide 92 text

Runtime Check Locations We need to test object type, extended bit, IV value Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Common Stu ff (see fl _type.h) Object Extended? Depends on Object Type Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 Instance Variable 24 Instance Variable 32 Instance Variable Object Type Qundef? Right class?

Slide 93

Slide 93 text

Machine Code for reading one IV == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55a658d0a6dd: test qword ptr [r13 + 0x18], 7 0x55a658d0a6e5: jne 0x55a660d0a0e5 0x55a658d0a6eb: cmp qword ptr [r13 + 0x18], 8 0x55a658d0a6f0: jbe 0x55a660d0a0fe 0x55a658d0a6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55a658d0a6fa: movabs rcx, 0x7fbb2af48f20 0x55a658d0a704: cmp qword ptr [rax + 8], rcx 0x55a658d0a708: jne 0x55a660d0a117 0x55a658d0a70e: mov rax, qword ptr [r13 + 0x18] 0x55a658d0a712: cmp qword ptr [rax + 0x10], 0 0x55a658d0a717: jbe 0x55a660d0a0cc # guard embedded getivar 0x55a658d0a71d: test word ptr [rax], 0x2000 0x55a658d0a722: je 0x55a660d0a130 0x55a658d0a728: cmp qword ptr [rax + 0x18], 0x34 0x55a658d0a72d: mov ecx, 8 0x55a658d0a732: cmovne rcx, qword ptr [rax + 0x18] 0x55a658d0a737: mov qword ptr [rbx], rcx Generated Machine Code

Slide 94

Slide 94 text

Use Shapes to Eliminate Checks

Slide 95

Slide 95 text

Shape ID Storage Shape id is stored in the upper 32 bits Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Common Stu ff (see fl _type.h) Depends on Object Type Shape ID

Slide 96

Slide 96 text

Class Check Isn't Necessary Shapes are independent of class Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Common Stu ff (see fl _type.h) Object Extended? Depends on Object Type Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 Instance Variable 24 Instance Variable 32 Instance Variable Object Type Qundef? Right class? Shape ID

Slide 97

Slide 97 text

Handling "Undefined" Instance Variables Shapes care about IV set order class Hello def initialize(set_bar) @foo = 1 @bar = 2 if set_bar @baz = 3 end def foo if !instance_variable_defined?(:@bar) puts "oh!" end @foo + @bar.to_i end end p Hello.new(true).foo # => 3 p Hello.new(false).foo # => 1 Root id: 0 @foo id: 1 @bar id: 2 @baz id: 3 @baz id: 4 Shape 3 Shape 4

Slide 98

Slide 98 text

Handling "Undefined" Instance Variables Shape 3 has a "bar" instance variable class Hello def initialize(set_bar) @foo = 1 @bar = 2 if set_bar @baz = 3 end def foo if !instance_variable_defined?(:@bar) puts "oh!" end @foo + @bar.to_i end end p Hello.new(true).foo # => 3 p Hello.new(false).foo # => 1 Shape 3 Root id: 0 @foo id: 1 @bar id: 2 @baz id: 3 @baz id: 4

Slide 99

Slide 99 text

Handling "Undefined" Instance Variables Shape 4 doesn't have a "bar" instance variable class Hello def initialize(set_bar) @foo = 1 @bar = 2 if set_bar @baz = 3 end def foo if !instance_variable_defined?(:@bar) puts "oh!" end @foo + @bar.to_i end end p Hello.new(true).foo # => 3 p Hello.new(false).foo # => 1 Shape 4 Root id: 0 @foo id: 1 @bar id: 2 @baz id: 3 @baz id: 4

Slide 100

Slide 100 text

Class Check Isn't Necessary Shapes are independent of class Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Common Stu ff (see fl _type.h) Object Extended? Depends on Object Type Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 Instance Variable 24 Instance Variable 32 Instance Variable Object Type Qundef? Shape ID

Slide 101

Slide 101 text

Multiple Possible Layouts Objects can vary in width, so there are 2 possible layouts class Hello def initialize @foo = 1 @bar = 2 @baz = 3 end def foo @foo + @bar end end Hello.new Embedded Layout Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 1 24 2 32 3 Extended Layout Byte Value 0 Flags (a 64 bit bitmap) 8 Pointer to Class 16 Pointer to Bu ff er 24 32 IV Array Byte Value 0 1 8 2 16 3 24 ... 32 ... ... ...

Slide 102

Slide 102 text

Multiple Possible Layouts "Extending" adds a shape transition class Hello def initialize @foo = 1 @bar = 2 @baz = 3 end def foo @foo + @bar end end Hello.new Extended Layout Byte Value 0 Flags 8 Class 16 24 Byte Val 0 1 8 2 16 3 24 ... 32 ... ... ... IV Ptr 1 2 Root id: 0 @bar id: 2 @foo id: 1 EXTEND id: 3 @baz id: 4 Shape 4

Slide 103

Slide 103 text

Multiple Possible Layouts "Extending" adds a shape transition class Hello def initialize @foo = 1 @bar = 2 @baz = 3 end def foo @foo + @bar end end Hello.new Root id: 0 @bar id: 2 @foo id: 1 EXTEND id: 3 @baz id: 4 @baz id: 5 Embedded Layout Byte Value 0 Flags 8 Class 16 24 32 2 3 1 Shape 5

Slide 104

Slide 104 text

Different Layouts Have Different Shapes JIT Compiler can di ff erentiate based on shape id class Hello def initialize @foo = 1 @bar = 2 @baz = 3 end def foo @foo + @bar end end Hello.new Root id: 0 @bar id: 2 @foo id: 1 EXTEND id: 3 @baz id: 4 @baz id: 5 Embedded Layout Byte Value 0 Flags 8 Class 16 1 24 2 32 3 Extended Layout Byte Value 0 Flags 8 Class 16 PTR 24 Byte Val 0 1 8 2 16 3 24 ... 32 ... ... ...

Slide 105

Slide 105 text

Extended Check Isn't Necessary Shapes di ff er depending on embedded vs extended Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Common Stu ff (see fl _type.h) Object Extended? Depends on Object Type Object Type Shape ID

Slide 106

Slide 106 text

Different Types, Same Shape Di ff erent types can have the same shape, but IV storage is di ff erent class Hello def initialize @foo = 1 @bar = 2 @baz = 3 end end Hello.new ary = [] ary.instance_variable_set(:@foo, 4) ary.instance_variable_set(:@bar, 5) ary.instance_variable_set(:@baz, 6) ary Sample Code Shape Tree Root id: 0 @foo id: 1 @bar id: 2 @baz id: 3 Shape 3 Shape 3

Slide 107

Slide 107 text

Different Types Store Instance Variables Differently.

Slide 108

Slide 108 text

Assign Shape at Allocation Time When a T_OBJECT is allocated, immediately set a new shape class Hello def initialize @foo = 1 @bar = 2 @baz = 3 end end Hello.new ary = [] ary.instance_variable_set(:@foo, 4) ary.instance_variable_set(:@bar, 5) ary.instance_variable_set(:@baz, 6) ary Sample Code Shape Tree Root id: 0 Shape 4 Shape 7 T_OBJECT id: 1 @foo id: 2 @bar id: 3 @baz id: 4 @foo id: 5 @bar id: 6 @baz id: 7

Slide 109

Slide 109 text

Object Type Check Isn't Necessary Shapes di ff er depending on object type Flags Bitmap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Object Type Common Stu ff (see fl _type.h) Depends on Object Type Object Type Shape ID

Slide 110

Slide 110 text

Only Shape ID Check is Required

Slide 111

Slide 111 text

JIT Code Comparison Machine code for reading 1 instance variable == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55ce5998b6dd: test qword ptr [r13 + 0x18], 7 0x55ce5998b6e5: jne 0x55ce6198b0e5 0x55ce5998b6eb: cmp qword ptr [r13 + 0x18], 8 0x55ce5998b6f0: jbe 0x55ce6198b0fe 0x55ce5998b6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55ce5998b6fa: movabs rcx, 0x7f09fd1e8f30 0x55ce5998b704: cmp qword ptr [rax + 8], rcx 0x55ce5998b708: jne 0x55ce6198b117 0x55ce5998b70e: mov rax, qword ptr [r13 + 0x18] 0x55ce5998b712: cmp qword ptr [rax + 0x10], 0 0x55ce5998b717: jbe 0x55ce6198b0cc # guard embedded getivar 0x55ce5998b71d: test word ptr [rax], 0x2000 0x55ce5998b722: je 0x55ce6198b130 0x55ce5998b728: cmp qword ptr [rax + 0x18], 0x34 0x55ce5998b72d: mov ecx, 8 0x55ce5998b732: cmovne rcx, qword ptr [rax + 0x18] 0x55ce5998b737: mov qword ptr [rbx], rcx Before Object Shapes == BLOCK 1/5, ISEQ RANGE [0,3), 40 bytes ====================== # getinstancevariable 0x5594850ba13a: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x5594850ba13e: test al, 7 0x5594850ba141: jne 0x5594850bc090 0x5594850ba147: cmp rax, 0 0x5594850ba14b: je 0x5594850bc090 # guard shape 0x5594850ba151: cmp dword ptr [rax + 4], 0x19 0x5594850ba155: jne 0x5594850bc0a9 0x5594850ba15b: mov rax, qword ptr [rax + 0x10] 0x5594850ba15f: mov qword ptr [rbx], rax After Object Shapes

Slide 112

Slide 112 text

JIT Code Comparison Machine code for reading 1 instance variable == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55ce5998b6dd: test qword ptr [r13 + 0x18], 7 0x55ce5998b6e5: jne 0x55ce6198b0e5 0x55ce5998b6eb: cmp qword ptr [r13 + 0x18], 8 0x55ce5998b6f0: jbe 0x55ce6198b0fe 0x55ce5998b6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55ce5998b6fa: movabs rcx, 0x7f09fd1e8f30 0x55ce5998b704: cmp qword ptr [rax + 8], rcx 0x55ce5998b708: jne 0x55ce6198b117 0x55ce5998b70e: mov rax, qword ptr [r13 + 0x18] 0x55ce5998b712: cmp qword ptr [rax + 0x10], 0 0x55ce5998b717: jbe 0x55ce6198b0cc # guard embedded getivar 0x55ce5998b71d: test word ptr [rax], 0x2000 0x55ce5998b722: je 0x55ce6198b130 0x55ce5998b728: cmp qword ptr [rax + 0x18], 0x34 0x55ce5998b72d: mov ecx, 8 0x55ce5998b732: cmovne rcx, qword ptr [rax + 0x18] 0x55ce5998b737: mov qword ptr [rbx], rcx Before Object Shapes == BLOCK 1/5, ISEQ RANGE [0,3), 40 bytes ====================== # getinstancevariable 0x5594850ba13a: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x5594850ba13e: test al, 7 0x5594850ba141: jne 0x5594850bc090 0x5594850ba147: cmp rax, 0 0x5594850ba14b: je 0x5594850bc090 # guard shape 0x5594850ba151: cmp dword ptr [rax + 4], 0x19 0x5594850ba155: jne 0x5594850bc0a9 After Object Shapes Make sure it's shape 0x19

Slide 113

Slide 113 text

JIT Code Comparison Machine code for reading 1 instance variable == BLOCK 1/5, ISEQ RANGE [0,3), 93 bytes ====================== # getinstancevariable # guard not immediate 0x55ce5998b6dd: test qword ptr [r13 + 0x18], 7 0x55ce5998b6e5: jne 0x55ce6198b0e5 0x55ce5998b6eb: cmp qword ptr [r13 + 0x18], 8 0x55ce5998b6f0: jbe 0x55ce6198b0fe 0x55ce5998b6f6: mov rax, qword ptr [r13 + 0x18] # guard known class 0x55ce5998b6fa: movabs rcx, 0x7f09fd1e8f30 0x55ce5998b704: cmp qword ptr [rax + 8], rcx 0x55ce5998b708: jne 0x55ce6198b117 0x55ce5998b70e: mov rax, qword ptr [r13 + 0x18] 0x55ce5998b712: cmp qword ptr [rax + 0x10], 0 0x55ce5998b717: jbe 0x55ce6198b0cc # guard embedded getivar 0x55ce5998b71d: test word ptr [rax], 0x2000 0x55ce5998b722: je 0x55ce6198b130 0x55ce5998b728: cmp qword ptr [rax + 0x18], 0x34 0x55ce5998b72d: mov ecx, 8 0x55ce5998b732: cmovne rcx, qword ptr [rax + 0x18] 0x55ce5998b737: mov qword ptr [rbx], rcx Before Object Shapes == BLOCK 1/5, ISEQ RANGE [0,3), 40 bytes ====================== # getinstancevariable 0x5594850ba13a: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x5594850ba13e: test al, 7 0x5594850ba141: jne 0x5594850bc090 0x5594850ba147: cmp rax, 0 0x5594850ba14b: je 0x5594850bc090 # guard shape 0x5594850ba151: cmp dword ptr [rax + 4], 0x19 0x5594850ba155: jne 0x5594850bc0a9 0x5594850ba15b: mov rax, qword ptr [rax + 0x10] 0x5594850ba15f: mov qword ptr [rbx], rax After Object Shapes Read the IV, and push on the stack

Slide 114

Slide 114 text

Benchmark Comparison Measure the cost of fetching and instance variable class TheClass def initialize @v0 = 1 @v1 = 2 @v3 = 3 @levar = 1 end def get_value_loop sum = 0 # 1M i = 0 while i < 1000000 # 10 times to de-emphasize loop overhead sum += (@levar + @levar + @levar + @levar + @levar + @levar + @levar + @levar + @levar + @levar) i += 1 end return sum end end obj = TheClass.new run_benchmark(100) do obj.get_value_loop end

Slide 115

Slide 115 text

Benchmark Results before: ruby 3.2.0dev (2022-09-28T14:51:38Z before-shapes a05b261464) +YJIT [x86_64-linux] after: ruby 3.2.0dev (2022-11-22T05:20:45Z master 20b9d7b9fd) +YJIT [x86_64-linux] ------- ----------- ---------- ---------- ---------- ------------ ------------- bench before (ms) stddev (%) after (ms) stddev (%) before/after after 1st itr getivar 17.4 0.5 12.0 0.3 1.45 0.97 ------- ----------- ---------- ---------- ---------- ------------ ------------- Legend: - before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup. - after 1st itr: ratio of before/after time for the first benchmarking iteration. 45% Speed up!

Slide 116

Slide 116 text

Before Shapes: 3.76x

Slide 117

Slide 117 text

After Shapes: 5.42x

Slide 118

Slide 118 text

Future: 32 byte Objects

Slide 119

Slide 119 text

TL;DR

Slide 120

Slide 120 text

Fewer Checks

Slide 121

Slide 121 text

Faster Code

Slide 122

Slide 122 text

Thank You!!