Slide 1

Slide 1 text

Reducing Memory Usage in Ruby

Slide 2

Slide 2 text

HELLO!!!!

Slide 3

Slide 3 text

Aaron Patterson

Slide 4

Slide 4 text

@tenderlove

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Famous Programmer

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

G GitHub

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Ruby Kaigi 2017: September 20

Slide 12

Slide 12 text

Ruby Kaigi 2018: May 31

Slide 13

Slide 13 text

Only 70% Complete! >> last_year = Date.parse "September 20, 2017" => # >> this_year = Date.parse "May 31, 2018" => # >> ((this_year - last_year) / 365).to_f => 0.6931506849315069 >> sprintf "%f%%", (1.0 - ((this_year - last_year) / 365).to_f) * 100 => "30.684932%"

Slide 14

Slide 14 text

Reducing Memory Usage in Ruby

Slide 15

Slide 15 text

Feature Caches

Slide 16

Slide 16 text

Direct ISeq Marking

Slide 17

Slide 17 text

Finding Memory Usage

Slide 18

Slide 18 text

Reading the Code

Slide 19

Slide 19 text

Malloc Stack Tracing

Slide 20

Slide 20 text

GC Malloc

Slide 21

Slide 21 text

ObjectSpace allocation_tracer (gem)

Slide 22

Slide 22 text

Malloc Stack Logging

Slide 23

Slide 23 text

Malloc Stack Logging $ MallocStackLoggingNoCompact=1 \ RAILS_ENV=production \ bin/rails r 'p $$; GC.start; $stdin.getc' Enable the Logger Print the PID Clean any garbage Pause the process

Slide 24

Slide 24 text

Dump Malloc Logs $ malloc_history [PID] -allEvents > malloc_log.log

Slide 25

Slide 25 text

Log File Size $ ls -alh trunk_log.log -rw-r--r-- 1 aaron staff 6.2G Mar 12 10:42 trunk_log.log

Slide 26

Slide 26 text

File Contents ALLOC 0x7fb1fa600940-0x7fb1fa600b4f [size=528]: thread_7fff8b218340 | start | main | ruby_init | ruby_setup | Init_BareVM | rb_objspace_alloc | calloc | malloc_zone_calloc FREE 0x7fb1fa603730: thread_7fff8b218340 | start | main | ruby_init | ruby_setup | rb_call_inits | Init_Encoding | rb_define_method | rb_add_method_cfunc | rb_add_method | rb_method_entry_make | rb_id_table_insert | rb_id_table_insert_key | hash_table_extend | ruby_xfree | ruby_sized_xfree | objspace_xfree | free

Slide 27

Slide 27 text

Reconciling Live Memory allocs = {} total = 0 File.open(ARGV[0], "r") do |f| f.each_line do |line| case line when /^(?:ALLOC)\s*([^\s]+)\s+\[size=(\d+)\]:/ from, to = *$1.split('-', 2) size = $2.to_i total += size allocs[from] = size puts total when /^(?:FREE)\s*([^\s]+):\s/ total -= allocs.fetch($1) allocs.delete $1 end end end p allocs = {}

Slide 28

Slide 28 text

Who calls malloc?

Slide 29

Slide 29 text

Top 20 malloc Callers rb_ast_newnode 16% new_insn_body 12% iseq_setup 10% prepare_iseq_build 9% ary_resize_capa 8% st_init_table_with_size 6% io_fillbuf 6% new_insn_send 5% rb_ary_modify 5% str_new0 4% heap_assign_page 4% rb_iseq_new_with_opt 3% iseq_compile_each0 2% local_push_gen 2% CRYPTO_malloc 2% rb_str_resize 1% rb_str_buf_new 1% __opendir_common 1% rb_ast_new 1% ruby_strdup 1% ruby_strdup rb_ast_new __opendir_common rb_str_buf_new rb_str_resize CRYPTO_malloc local_push_gen iseq_compile_each0 rb_iseq_new_with_opt heap_assign_page str_new0 rb_ary_modify new_insn_send io_fillbuf st_init_table_with_size ary_resize_capa prepare_iseq_build iseq_setup new_insn_body rb_ast_newnode

Slide 30

Slide 30 text

Combine 2 Techniques

Slide 31

Slide 31 text

"Instrumentation" + "Read The Code"

Slide 32

Slide 32 text

Loaded Features Caching

Slide 33

Slide 33 text

Shared String Optimization

Slide 34

Slide 34 text

Shared Strings x = '/a/b/c.rb' a = x.dup b = x[1, x.length - 1] / a / b / c . r b x a b

Slide 35

Slide 35 text

Not Shared Strings x = '/a/b/c.rb' a = x[0, 2] / a / b / c . r b x a / a

Slide 36

Slide 36 text

Shared String Rule: Always Copy To The End (If you can)

Slide 37

Slide 37 text

Loaded Features?

Slide 38

Slide 38 text

$LOADED_FEATURES before = $LOADED_FEATURES.dup require 'foo' after = $LOADED_FEATURES - before p after # => ["/private/tmp/foo.rb"]

Slide 39

Slide 39 text

Requiring The Same File require 'foo' require 'foo' require 'foo' require 'foo'

Slide 40

Slide 40 text

What is "the same file"?

Slide 41

Slide 41 text

Requiring The Same File require '/a/b/c.rb' require '/a/b/c' $LOAD_PATH.unshift "/" require 'a/b/c.rb' require 'a/b/c' $LOAD_PATH.unshift "/a" require 'b/c.rb' require 'b/c' $LOAD_PATH.unshift "/a/b" require 'c.rb' require 'c'

Slide 42

Slide 42 text

Array search is slow.

Slide 43

Slide 43 text

Cache Generation require '/a/b/c.rb' Code /a/b/c.rb /a/b/c a/b/c.rb a/b/c b/c.rb b/c c.rb c Cache

Slide 44

Slide 44 text

Cache Structure features_index = { '/a/b/c.rb' => 2, '/a/b/c' => 2, 'a/b/c.rb' => 2, 'b/c.rb' => 2, 'b/c' => 2, 'c.rb' => 2, 'c' => 2 }

Slide 45

Slide 45 text

Generation Algorithm def features_index_add(feature, index) ext = feature.index('.') p = ext ? ext : feature.length loop do p -= 1 while p > 0 && feature[p] != '/' p -= 1 end break if p == 0 short_feature = feature[p + 1, feature.length - p - 1] # New Ruby Object features_index_add_single(short_feature, index) if ext # slice out the file extension if there is one short_feature = feature[p + 1, ext - p - 1] # New Ruby Object + malloc features_index_add_single(short_feature, index) end end end

Slide 46

Slide 46 text

Key Generation / a / b / c . r b require '/a/b/c.rb' /a/b/c.rb /a/b/c a/b/c.rb a/b/c b/c.rb b/c c.rb c

Slide 47

Slide 47 text

Key Generation / a / b / c . r b require '/a/b/c.rb' /a/b/c.rb /a/b/c a/b/c.rb a/b/c b/c.rb b/c c.rb c rb_substr( ) ) / a / b / c a / b / c b / c c

Slide 48

Slide 48 text

Reduce Mallocs With Shared Strings

Slide 49

Slide 49 text

Key Generation / a / b / c . r b require '/a/b/c.rb' /a/b/c.rb /a/b/c a/b/c.rb a/b/c b/c.rb b/c c.rb c rb_substr( ) ) / a / b / c rb_substr(

Slide 50

Slide 50 text

Eliminating Ruby Objects

Slide 51

Slide 51 text

Cache Structure Loaded Feature Cache (Hash) /a/b/c.rb /a/b/c a/b/c.rb a/b/c b/c.rb b/c c.rb c / a / b / c . r b / a / b / c

Slide 52

Slide 52 text

Cache Structure Loaded Feature Cache (Hash) / a / b / c . r b / a / b / c

Slide 53

Slide 53 text

Implementation From bec1637da7fc5bafd9c91ba6443ad38c29ec656f Mon Sep 17 00:00:00 2001 From: Aaron Patterson Date: Fri, 9 Feb 2018 13:14:27 -0800 Subject: [PATCH] Use shared substrings in feature index cache hash Before this patch, `features_index_add` would use `rb_str_subseq` to get a substring of the feature being added to the loaded features list. `features_index_add_single` would use `ruby_strdup` to copy that string and use it as a hash key in `loaded_features_index`. This patch changes `features_index_add` to index in to the underlying character array stored in the Ruby string, and use that as the hash key without copying its contents. The cache also needs keys that do not contain file extensions, so this patch will allocate one new string that does not contain the file extension, then indexes in to that character array rather than use substrings. The strings that do not have the file extension are added to a new array on the VM `loaded_features_index_pool` to ensure liveness. The loaded features array already ensures liveness of the strings *with* file extensions. --- load.c | 42 ++++++++++++++++++++++++++---------------- vm.c | 1 + vm_core.h | 1 + 3 files changed, 28 insertions(+), 16 deletions(-) diff --git a/load.c b/load.c index fe1d0280bf..ec046db209 100644 --- a/load.c +++ b/load.c @@ -166,6 +166,12 @@ get_loaded_features_index_raw(void) return GET_VM()->loaded_features_index; } +static VALUE +get_loaded_features_index_pool_raw(void) +{ + return GET_VM()->loaded_features_index_pool; +} + static st_table *

Slide 54

Slide 54 text

Measure the Impact

Slide 55

Slide 55 text

Object Allocations require 'allocation_tracer' ObjectSpace::AllocationTracer.setup(%i{path line type}) pp ObjectSpace::AllocationTracer.trace { require 'a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r' }

Slide 56

Slide 56 text

Output {["features.rb", 6, :T_STRING]=>[91, 0, 0, 0, 0, 0], ["features.rb", 6, :T_DATA]=>[3, 0, 0, 0, 0, 0], ["features.rb", 6, :T_FILE]=>[1, 0, 0, 0, 0, 0], ["features.rb", 6, :T_ARRAY]=>[5, 0, 0, 0, 0, 0], ["features.rb", 6, :T_IMEMO]=>[3, 0, 0, 0, 0, 0], ["features.rb", 6, :T_HASH]=>[2, 0, 0, 0, 0, 0]} {["features.rb", 6, :T_STRING]=>[50, 0, 0, 0, 0, 0], ["features.rb", 6, :T_DATA]=>[3, 0, 0, 0, 0, 0], ["features.rb", 6, :T_FILE]=>[1, 0, 0, 0, 0, 0], ["features.rb", 6, :T_ARRAY]=>[4, 0, 0, 0, 0, 0], ["features.rb", 6, :T_IMEMO]=>[3, 0, 0, 0, 0, 0], ["features.rb", 6, :T_HASH]=>[2, 0, 0, 0, 0, 0]} Ruby 2.5 Ruby 2.6

Slide 57

Slide 57 text

Object Allocations 0 25 50 75 100 T_STRING T_DATA T_FILE T_ARRAY T_IMEMO T_HASH Ruby 2.5 Ruby 2.6

Slide 58

Slide 58 text

Malloc Logging After Before 4.2% Savings!

Slide 59

Slide 59 text

The More You Load The More You Save!

Slide 60

Slide 60 text

Free When You Upgrade to Ruby 2.6!

Slide 61

Slide 61 text

https://bugs.ruby-lang.org/issues/14460

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

""""

Slide 64

Slide 64 text

Always Search The Issues!

Slide 65

Slide 65 text

Direct ISeq Marking

Slide 66

Slide 66 text

Ruby VM

Slide 67

Slide 67 text

Stack VM [ :push, 3 ] [ :push, 5 ] [ :add ] Instructions Stack Program Workspace PC Instruction O perand 3 5 8

Slide 68

Slide 68 text

How do we get instructions?

Slide 69

Slide 69 text

Compiling Ruby Code

Slide 70

Slide 70 text

Processing Phases AST Source Code (text) Linked List Byte Code

Slide 71

Slide 71 text

Processing Phases AST Source Code (text) Linked List Byte Code Parsing Compiling Optimizations Product

Slide 72

Slide 72 text

Data Structures

Slide 73

Slide 73 text

AST: Abstract Syntax Tree $ ந৅ߏจ໦

Slide 74

Slide 74 text

Source to AST 3 + 5 Ruby Code (text) + 5 3 AST Ruby Objects! (T_NODE)

Slide 75

Slide 75 text

AST to Linked List + 5 3 AST Ruby Objects! (T_NODE) Visit Visit Visit Push 3 Push 5 Add Linked List

Slide 76

Slide 76 text

Apply Optimizations + = "hand wave" "hand wave" = આ໌͠ͳ͍

Slide 77

Slide 77 text

Optimization Pass Push 3 Push 5 Add Linked List Push 3 Push 5 Add Linked List Optimized Linked List

Slide 78

Slide 78 text

What is Byte Code?

Slide 79

Slide 79 text

Byte Code [ 123, 3, 123, 5, 456 ]

Slide 80

Slide 80 text

Byte Code Translation Push 3 Push 5 Add Linked List [ 123, 3, 123, 5, 456 ] Translate Byte Code

Slide 81

Slide 81 text

Byte Code Translation Push 3 Push 5 Add Linked List [ 123, 3, 123, 5, 456 ] Byte Code Ruby Objects! (T_NODE) Ruby Objects! (IMEMO)

Slide 82

Slide 82 text

Let’s make a simple VM!

Slide 83

Slide 83 text

Stack VM [ :push, 3 ] [ :push, 5 ] [ :add ] Instructions Stack Program Workspace PC

Slide 84

Slide 84 text

Stack VM Instructions Stack Program Workspace PC 3 5 8 [ 123, # push 3, 123, # push 5, 456 # add ]

Slide 85

Slide 85 text

Simple VM PUSH = 123 ADD = 456 PRINT = 789 byte_code = [ 123, 3, 123, 5, 456, 789 ] pc = 0 stack = [] # Virtual Machine Loop loop do case byte_code[pc] when nil then break when PUSH parameter = byte_code[pc + 1] stack.push parameter pc += 1 when ADD a = stack.pop b = stack.pop c = a + b stack.push c when PRINT puts stack.pop end pc += 1 end Extra Increment

Slide 86

Slide 86 text

Compilation VM

Slide 87

Slide 87 text

Hello World puts "hello" << "world" Ruby << world hello puts AST Ruby Object Ruby Object

Slide 88

Slide 88 text

Hello World << world hello puts AST Ruby Object Ruby Object Push "hello" Push "world" << puts Linked List Ruby Object Ruby Object

Slide 89

Slide 89 text

Hello World Push "hello" Push "world" << puts Linked List Translate [ 123, # push 111, # hello 123, # push 222, # world 333, # << 444, # puts ] Byte Code Object Address Object Address Ruby Object Ruby Object

Slide 90

Slide 90 text

Hello World [ 123, # push 111, # hello 123, # push 222, # world 333, # << 444, # puts ] Byte Code Byte Code [ 123, # push "hello", 123, # push "world", 333, # << 444, # puts ]

Slide 91

Slide 91 text

VM Implementation PUSH = 123 APPEND = 333 PRINT = 444 byte_code = [ 123, # push "hello", 123, # push "world", 333, # << 444, # puts ] def run_vm(pc, stack, byte_code) # Virtual Machine Loop loop do case byte_code[pc] when nil then break when PUSH parameter = byte_code[pc + 1] stack.push parameter pc += 1 when APPEND b = stack.pop a = stack.pop c = a << b stack.push c when PRINT puts stack.pop end pc += 1 end end run_vm(0, [], byte_code) # helloworld run_vm(0, [], byte_code) Ruby puts "hello" << "world"

Slide 92

Slide 92 text

VM Run $ ruby mini_vm.rb helloworld helloworldworld

Slide 93

Slide 93 text

Stack VM [ :push, "hello" ] [ :push, "world" ] [ :append ] Instructions Stack Program Workspace PC Ruby Ruby "hello" "world"

Slide 94

Slide 94 text

Stack VM [ :push, "hello" ] [ :push, "world" ] [ :append ] Instructions Stack Program Workspace PC "world" "hello" "helloworld" [ :push, "helloworld" ]

Slide 95

Slide 95 text

Stack VM [ :push, "world" ] [ :append ] Instructions Stack Program Workspace PC [ :push, "helloworld" "world" "helloworldworld" [ :push, "helloworldworld" ]

Slide 96

Slide 96 text

Execute 3 Times $ ruby mini_vm.rb helloworld helloworldworld helloworldworldworld

Slide 97

Slide 97 text

New VM PUSH = 123 APPEND = 333 PRINT = 444 byte_code = [ 123, # push "hello", 123, # push "world", 333, # << 444, # puts ] def run_vm(pc, stack, byte_code) # Virtual Machine Loop loop do case byte_code[pc] when nil then break when PUSH parameter = byte_code[pc + 1] stack.push parameter.dup pc += 1 when APPEND b = stack.pop a = stack.pop c = a << b stack.push c when PRINT puts stack.pop end pc += 1 end end Copy

Slide 98

Slide 98 text

Stack VM [ :push, "hello" ] [ :push, "world" ] [ :append ] Instructions Stack Program Workspace PC "hello" (copy) "world" (copy)

Slide 99

Slide 99 text

Object Allocations puts "hello" << "world" String IMEMO Compile Time 2 1 Run Time 2

Slide 100

Slide 100 text

Object Allocations # frozen_string_literal: true puts "hello" + "world" String IMEMO Compile Time 2 1 Run Time 1

Slide 101

Slide 101 text

NEAT!!!

Slide 102

Slide 102 text

Reducing Memory Usage

Slide 103

Slide 103 text

Byte Code is Stored on Instruction Sequences

Slide 104

Slide 104 text

Instruction Sequences are Ruby Objects

Slide 105

Slide 105 text

ISeq Layout ISeq Object [ 123, 555, 123, 456, 333, 444, ]

Slide 106

Slide 106 text

ISeq Layout ISeq Object [ 123, 555, 123, 456, 333, 444, ] "hello" "world"

Slide 107

Slide 107 text

"Mark Array"

Slide 108

Slide 108 text

ISeq Layout ISeq Object [ 123, 555, 123, 456, 333, 444, ] "hello" "world" Array Mark Array

Slide 109

Slide 109 text

ISeq GC ISeq Object [ 123, 555, 123, 456, 333, 444, ] "hello" "world" Array GC mark

Slide 110

Slide 110 text

Mark Array Problems

Slide 111

Slide 111 text

ISeq GC ISeq Object [ 123, 555, 123, 456, 333, 444, ] "hello" "world" Array Duplicated Information "Hidden" Reference "Hidden" Reference

Slide 112

Slide 112 text

Array Bloat

Slide 113

Slide 113 text

Bloat Graph Array Size vs Array Capacity Number of Elements 0 750 1500 2250 3000 Size Capacity Unused

Slide 114

Slide 114 text

ISeq GC ISeq Object [ 123, 555, 123, 456, 333, 444, ] "hello" "world" Array Lives Forever!

Slide 115

Slide 115 text

Remove Bloat?

Slide 116

Slide 116 text

Remove the Array?

Slide 117

Slide 117 text

ISeq GC ISeq Object [ 123, 555, 123, 456, 333, 444, ] "hello" "world" Array Decode

Slide 118

Slide 118 text

Mark Loop def mark_params(pc, byte_code) # Virtual Machine Loop loop do case byte_code[pc] when nil then break when PUSH parameter = byte_code[pc + 1] gc_mark(parameter) pc += 1 when APPEND when PRINT end pc += 1 end end Mark

Slide 119

Slide 119 text

ISeq GC ISeq Object [ 123, 555, 123, 456, 333, 444, ] "hello" "world" Array

Slide 120

Slide 120 text

Mark Object Directly

Slide 121

Slide 121 text

Actual Code commit 9e26858e8c32e7f4b6ae3bccf9896ea7b61ce335 Author: tenderlove Date: Mon Mar 19 18:21:54 2018 +0000 Reverting r62775, this should fix i686 builds We need to mark default values for kwarg methods. This also fixes Bootsnap. IBF iseq loading needed to mark iseqs as "having markable objects". git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62851 b2dd03c8-39d4-4d8f-98ff-823fe69b080e diff --git a/compile.c b/compile.c index 71f60b9b1b..68d4bf549a 100644 --- a/compile.c +++ b/compile.c @@ -562,15 +562,6 @@ APPEND_ELEM(ISEQ_ARG_DECLARE LINK_ANCHOR *const anchor, LINK_ELEMENT *before, LI #define APPEND_ELEM(anchor, before, elem) APPEND_ELEM(iseq, (anchor), (before), (elem)) #endif -static int -iseq_add_mark_object(const rb_iseq_t *iseq, VALUE v) -{ - if (!SPECIAL_CONST_P(v)) { - rb_iseq_add_mark_object(iseq, v); - } - return COMPILE_OK; -} - static int iseq_add_mark_object_compile_time(const rb_iseq_t *iseq, VALUE v) { @@ -749,6 +740,7 @@ rb_iseq_translate_threaded_code(rb_iseq_t *iseq) encoded[i] = (VALUE)table[insn]; i += len;

Slide 122

Slide 122 text

Direct ISeq Marking https://bugs.ruby-lang.org/issues/14370

Slide 123

Slide 123 text

Real World

Slide 124

Slide 124 text

Basic Rails App Number of Live Objects 0 17500 35000 52500 70000 Object Type T_IM EM O T_STRIN G T_ARRAY T_C LASS T_O BJEC T T_DATA T_H ASH T_REG EXP T_IC LASS T_M O DU LE T_RATIO N AL T_STRU C T T_SYM BO L T_BIG N U M T_FLO AT T_FILE T_M ATC H T_C O M PLEX Ruby 2.5 Ruby 2.6 Array Reduction

Slide 125

Slide 125 text

Ruby 2.5: 35k Arrays Ruby 2.6: 8.5k Arrays

Slide 126

Slide 126 text

34% Reduction in Objects Ruby 2.5: 267783 Ruby 2.6: 202181

Slide 127

Slide 127 text

Process Memory: -6% Before After

Slide 128

Slide 128 text

The More You Load The More You Save!

Slide 129

Slide 129 text

Free When You Upgrade to Ruby 2.6!

Slide 130

Slide 130 text

Conclusion (ŇɾεɾŇ)

Slide 131

Slide 131 text

We learned about Memory Inspection!

Slide 132

Slide 132 text

We learned about VMs!

Slide 133

Slide 133 text

We learned about ourselves.

Slide 134

Slide 134 text

Upgrade to Ruby 2.6!

Slide 135

Slide 135 text

Thank You!!!