Reducing Memory Usage in Ruby

Reducing Memory Usage in Ruby

Presentation about two patches to reduce memory usage in Ruby applications

F29327647a9cff5c69618bae420792ea?s=128

Aaron Patterson

May 25, 2018
Tweet

Transcript

  1. Reducing Memory Usage in Ruby

  2. HELLO!!!!

  3. Aaron Patterson

  4. @tenderlove

  5. None
  6. Famous Programmer

  7. None
  8. G GitHub

  9. None
  10. None
  11. Ruby Kaigi 2017: September 20

  12. Ruby Kaigi 2018: May 31

  13. Only 70% Complete! >> last_year = Date.parse "September 20, 2017"

    => #<Date: 2017-09-20 ((2458017j,0s,0n),+0s,2299161j)> >> this_year = Date.parse "May 31, 2018" => #<Date: 2018-05-31 ((2458270j,0s,0n),+0s,2299161j)> >> ((this_year - last_year) / 365).to_f => 0.6931506849315069 >> sprintf "%f%%", (1.0 - ((this_year - last_year) / 365).to_f) * 100 => "30.684932%"
  14. Reducing Memory Usage in Ruby

  15. Feature Caches

  16. Direct ISeq Marking

  17. Finding Memory Usage

  18. Reading the Code

  19. Malloc Stack Tracing

  20. GC Malloc

  21. ObjectSpace allocation_tracer (gem)

  22. Malloc Stack Logging

  23. Malloc Stack Logging $ MallocStackLoggingNoCompact=1 \ RAILS_ENV=production \ bin/rails r

    'p $$; GC.start; $stdin.getc' Enable the Logger Print the PID Clean any garbage Pause the process
  24. Dump Malloc Logs $ malloc_history [PID] -allEvents > malloc_log.log

  25. Log File Size $ ls -alh trunk_log.log -rw-r--r-- 1 aaron

    staff 6.2G Mar 12 10:42 trunk_log.log
  26. File Contents ALLOC 0x7fb1fa600940-0x7fb1fa600b4f [size=528]: thread_7fff8b218340 | start | main

    | ruby_init | ruby_setup | Init_BareVM | rb_objspace_alloc | calloc | malloc_zone_calloc FREE 0x7fb1fa603730: thread_7fff8b218340 | start | main | ruby_init | ruby_setup | rb_call_inits | Init_Encoding | rb_define_method | rb_add_method_cfunc | rb_add_method | rb_method_entry_make | rb_id_table_insert | rb_id_table_insert_key | hash_table_extend | ruby_xfree | ruby_sized_xfree | objspace_xfree | free
  27. Reconciling Live Memory allocs = {} total = 0 File.open(ARGV[0],

    "r") do |f| f.each_line do |line| case line when /^(?:ALLOC)\s*([^\s]+)\s+\[size=(\d+)\]:/ from, to = *$1.split('-', 2) size = $2.to_i total += size allocs[from] = size puts total when /^(?:FREE)\s*([^\s]+):\s/ total -= allocs.fetch($1) allocs.delete $1 end end end p allocs = {}
  28. Who calls malloc?

  29. Top 20 malloc Callers rb_ast_newnode 16% new_insn_body 12% iseq_setup 10%

    prepare_iseq_build 9% ary_resize_capa 8% st_init_table_with_size 6% io_fillbuf 6% new_insn_send 5% rb_ary_modify 5% str_new0 4% heap_assign_page 4% rb_iseq_new_with_opt 3% iseq_compile_each0 2% local_push_gen 2% CRYPTO_malloc 2% rb_str_resize 1% rb_str_buf_new 1% __opendir_common 1% rb_ast_new 1% ruby_strdup 1% ruby_strdup rb_ast_new __opendir_common rb_str_buf_new rb_str_resize CRYPTO_malloc local_push_gen iseq_compile_each0 rb_iseq_new_with_opt heap_assign_page str_new0 rb_ary_modify new_insn_send io_fillbuf st_init_table_with_size ary_resize_capa prepare_iseq_build iseq_setup new_insn_body rb_ast_newnode
  30. Combine 2 Techniques

  31. "Instrumentation" + "Read The Code"

  32. Loaded Features Caching

  33. Shared String Optimization

  34. Shared Strings x = '/a/b/c.rb' a = x.dup b =

    x[1, x.length - 1] / a / b / c . r b x a b
  35. Not Shared Strings x = '/a/b/c.rb' a = x[0, 2]

    / a / b / c . r b x a / a
  36. Shared String Rule: Always Copy To The End (If you

    can)
  37. Loaded Features?

  38. $LOADED_FEATURES before = $LOADED_FEATURES.dup require 'foo' after = $LOADED_FEATURES -

    before p after # => ["/private/tmp/foo.rb"]
  39. Requiring The Same File require 'foo' require 'foo' require 'foo'

    require 'foo'
  40. What is "the same file"?

  41. Requiring The Same File require '/a/b/c.rb' require '/a/b/c' $LOAD_PATH.unshift "/"

    require 'a/b/c.rb' require 'a/b/c' $LOAD_PATH.unshift "/a" require 'b/c.rb' require 'b/c' $LOAD_PATH.unshift "/a/b" require 'c.rb' require 'c'
  42. Array search is slow.

  43. Cache Generation require '/a/b/c.rb' Code /a/b/c.rb /a/b/c a/b/c.rb a/b/c b/c.rb

    b/c c.rb c Cache
  44. Cache Structure features_index = { '/a/b/c.rb' => 2, '/a/b/c' =>

    2, 'a/b/c.rb' => 2, 'b/c.rb' => 2, 'b/c' => 2, 'c.rb' => 2, 'c' => 2 }
  45. Generation Algorithm def features_index_add(feature, index) ext = feature.index('.') p =

    ext ? ext : feature.length loop do p -= 1 while p > 0 && feature[p] != '/' p -= 1 end break if p == 0 short_feature = feature[p + 1, feature.length - p - 1] # New Ruby Object features_index_add_single(short_feature, index) if ext # slice out the file extension if there is one short_feature = feature[p + 1, ext - p - 1] # New Ruby Object + malloc features_index_add_single(short_feature, index) end end end
  46. Key Generation / a / b / c . r

    b require '/a/b/c.rb' /a/b/c.rb /a/b/c a/b/c.rb a/b/c b/c.rb b/c c.rb c
  47. Key Generation / a / b / c . r

    b require '/a/b/c.rb' /a/b/c.rb /a/b/c a/b/c.rb a/b/c b/c.rb b/c c.rb c rb_substr( ) ) / a / b / c a / b / c b / c c
  48. Reduce Mallocs With Shared Strings

  49. Key Generation / a / b / c . r

    b require '/a/b/c.rb' /a/b/c.rb /a/b/c a/b/c.rb a/b/c b/c.rb b/c c.rb c rb_substr( ) ) / a / b / c rb_substr(
  50. Eliminating Ruby Objects

  51. Cache Structure Loaded Feature Cache (Hash) /a/b/c.rb /a/b/c a/b/c.rb a/b/c

    b/c.rb b/c c.rb c / a / b / c . r b / a / b / c
  52. Cache Structure Loaded Feature Cache (Hash) / a / b

    / c . r b / a / b / c
  53. Implementation From bec1637da7fc5bafd9c91ba6443ad38c29ec656f Mon Sep 17 00:00:00 2001 From: Aaron

    Patterson <aaron.patterson@gmail.com> Date: Fri, 9 Feb 2018 13:14:27 -0800 Subject: [PATCH] Use shared substrings in feature index cache hash Before this patch, `features_index_add` would use `rb_str_subseq` to get a substring of the feature being added to the loaded features list. `features_index_add_single` would use `ruby_strdup` to copy that string and use it as a hash key in `loaded_features_index`. This patch changes `features_index_add` to index in to the underlying character array stored in the Ruby string, and use that as the hash key without copying its contents. The cache also needs keys that do not contain file extensions, so this patch will allocate one new string that does not contain the file extension, then indexes in to that character array rather than use substrings. The strings that do not have the file extension are added to a new array on the VM `loaded_features_index_pool` to ensure liveness. The loaded features array already ensures liveness of the strings *with* file extensions. --- load.c | 42 ++++++++++++++++++++++++++---------------- vm.c | 1 + vm_core.h | 1 + 3 files changed, 28 insertions(+), 16 deletions(-) diff --git a/load.c b/load.c index fe1d0280bf..ec046db209 100644 --- a/load.c +++ b/load.c @@ -166,6 +166,12 @@ get_loaded_features_index_raw(void) return GET_VM()->loaded_features_index; } +static VALUE +get_loaded_features_index_pool_raw(void) +{ + return GET_VM()->loaded_features_index_pool; +} + static st_table *
  54. Measure the Impact

  55. Object Allocations require 'allocation_tracer' ObjectSpace::AllocationTracer.setup(%i{path line type}) pp ObjectSpace::AllocationTracer.trace {

    require 'a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r' }
  56. Output {["features.rb", 6, :T_STRING]=>[91, 0, 0, 0, 0, 0], ["features.rb",

    6, :T_DATA]=>[3, 0, 0, 0, 0, 0], ["features.rb", 6, :T_FILE]=>[1, 0, 0, 0, 0, 0], ["features.rb", 6, :T_ARRAY]=>[5, 0, 0, 0, 0, 0], ["features.rb", 6, :T_IMEMO]=>[3, 0, 0, 0, 0, 0], ["features.rb", 6, :T_HASH]=>[2, 0, 0, 0, 0, 0]} {["features.rb", 6, :T_STRING]=>[50, 0, 0, 0, 0, 0], ["features.rb", 6, :T_DATA]=>[3, 0, 0, 0, 0, 0], ["features.rb", 6, :T_FILE]=>[1, 0, 0, 0, 0, 0], ["features.rb", 6, :T_ARRAY]=>[4, 0, 0, 0, 0, 0], ["features.rb", 6, :T_IMEMO]=>[3, 0, 0, 0, 0, 0], ["features.rb", 6, :T_HASH]=>[2, 0, 0, 0, 0, 0]} Ruby 2.5 Ruby 2.6
  57. Object Allocations 0 25 50 75 100 T_STRING T_DATA T_FILE

    T_ARRAY T_IMEMO T_HASH Ruby 2.5 Ruby 2.6
  58. Malloc Logging After Before 4.2% Savings!

  59. The More You Load The More You Save!

  60. Free When You Upgrade to Ruby 2.6!

  61. https://bugs.ruby-lang.org/issues/14460

  62. None
  63. """"

  64. Always Search The Issues!

  65. Direct ISeq Marking

  66. Ruby VM

  67. Stack VM [ :push, 3 ] [ :push, 5 ]

    [ :add ] Instructions Stack Program Workspace PC Instruction O perand 3 5 8
  68. How do we get instructions?

  69. Compiling Ruby Code

  70. Processing Phases AST Source Code (text) Linked List Byte Code

  71. Processing Phases AST Source Code (text) Linked List Byte Code

    Parsing Compiling Optimizations Product
  72. Data Structures

  73. AST: Abstract Syntax Tree $ ந৅ߏจ໦

  74. Source to AST 3 + 5 Ruby Code (text) +

    5 3 AST Ruby Objects! (T_NODE)
  75. AST to Linked List + 5 3 AST Ruby Objects!

    (T_NODE) Visit Visit Visit Push 3 Push 5 Add Linked List
  76. Apply Optimizations + = "hand wave" "hand wave" = આ໌͠ͳ͍

  77. Optimization Pass Push 3 Push 5 Add Linked List Push

    3 Push 5 Add Linked List Optimized Linked List
  78. What is Byte Code?

  79. Byte Code [ 123, 3, 123, 5, 456 ]

  80. Byte Code Translation Push 3 Push 5 Add Linked List

    [ 123, 3, 123, 5, 456 ] Translate Byte Code
  81. Byte Code Translation Push 3 Push 5 Add Linked List

    [ 123, 3, 123, 5, 456 ] Byte Code Ruby Objects! (T_NODE) Ruby Objects! (IMEMO)
  82. Let’s make a simple VM!

  83. Stack VM [ :push, 3 ] [ :push, 5 ]

    [ :add ] Instructions Stack Program Workspace PC
  84. Stack VM Instructions Stack Program Workspace PC 3 5 8

    [ 123, # push 3, 123, # push 5, 456 # add ]
  85. Simple VM PUSH = 123 ADD = 456 PRINT =

    789 byte_code = [ 123, 3, 123, 5, 456, 789 ] pc = 0 stack = [] # Virtual Machine Loop loop do case byte_code[pc] when nil then break when PUSH parameter = byte_code[pc + 1] stack.push parameter pc += 1 when ADD a = stack.pop b = stack.pop c = a + b stack.push c when PRINT puts stack.pop end pc += 1 end Extra Increment
  86. Compilation VM

  87. Hello World puts "hello" << "world" Ruby << world hello

    puts AST Ruby Object Ruby Object
  88. Hello World << world hello puts AST Ruby Object Ruby

    Object Push "hello" Push "world" << puts Linked List Ruby Object Ruby Object
  89. Hello World Push "hello" Push "world" << puts Linked List

    Translate [ 123, # push 111, # hello 123, # push 222, # world 333, # << 444, # puts ] Byte Code Object Address Object Address Ruby Object Ruby Object
  90. Hello World [ 123, # push 111, # hello 123,

    # push 222, # world 333, # << 444, # puts ] Byte Code Byte Code [ 123, # push "hello", 123, # push "world", 333, # << 444, # puts ]
  91. VM Implementation PUSH = 123 APPEND = 333 PRINT =

    444 byte_code = [ 123, # push "hello", 123, # push "world", 333, # << 444, # puts ] def run_vm(pc, stack, byte_code) # Virtual Machine Loop loop do case byte_code[pc] when nil then break when PUSH parameter = byte_code[pc + 1] stack.push parameter pc += 1 when APPEND b = stack.pop a = stack.pop c = a << b stack.push c when PRINT puts stack.pop end pc += 1 end end run_vm(0, [], byte_code) # helloworld run_vm(0, [], byte_code) Ruby puts "hello" << "world"
  92. VM Run $ ruby mini_vm.rb helloworld helloworldworld

  93. Stack VM [ :push, "hello" ] [ :push, "world" ]

    [ :append ] Instructions Stack Program Workspace PC Ruby Ruby "hello" "world"
  94. Stack VM [ :push, "hello" ] [ :push, "world" ]

    [ :append ] Instructions Stack Program Workspace PC "world" "hello" "helloworld" [ :push, "helloworld" ]
  95. Stack VM [ :push, "world" ] [ :append ] Instructions

    Stack Program Workspace PC [ :push, "helloworld" "world" "helloworldworld" [ :push, "helloworldworld" ]
  96. Execute 3 Times $ ruby mini_vm.rb helloworld helloworldworld helloworldworldworld

  97. New VM PUSH = 123 APPEND = 333 PRINT =

    444 byte_code = [ 123, # push "hello", 123, # push "world", 333, # << 444, # puts ] def run_vm(pc, stack, byte_code) # Virtual Machine Loop loop do case byte_code[pc] when nil then break when PUSH parameter = byte_code[pc + 1] stack.push parameter.dup pc += 1 when APPEND b = stack.pop a = stack.pop c = a << b stack.push c when PRINT puts stack.pop end pc += 1 end end Copy
  98. Stack VM [ :push, "hello" ] [ :push, "world" ]

    [ :append ] Instructions Stack Program Workspace PC "hello" (copy) "world" (copy)
  99. Object Allocations puts "hello" << "world" String IMEMO Compile Time

    2 1 Run Time 2
  100. Object Allocations # frozen_string_literal: true puts "hello" + "world" String

    IMEMO Compile Time 2 1 Run Time 1
  101. NEAT!!!

  102. Reducing Memory Usage

  103. Byte Code is Stored on Instruction Sequences

  104. Instruction Sequences are Ruby Objects

  105. ISeq Layout ISeq Object [ 123, 555, 123, 456, 333,

    444, ]
  106. ISeq Layout ISeq Object [ 123, 555, 123, 456, 333,

    444, ] "hello" "world"
  107. "Mark Array"

  108. ISeq Layout ISeq Object [ 123, 555, 123, 456, 333,

    444, ] "hello" "world" Array Mark Array
  109. ISeq GC ISeq Object [ 123, 555, 123, 456, 333,

    444, ] "hello" "world" Array GC mark
  110. Mark Array Problems

  111. ISeq GC ISeq Object [ 123, 555, 123, 456, 333,

    444, ] "hello" "world" Array Duplicated Information "Hidden" Reference "Hidden" Reference
  112. Array Bloat

  113. Bloat Graph Array Size vs Array Capacity Number of Elements

    0 750 1500 2250 3000 Size Capacity Unused
  114. ISeq GC ISeq Object [ 123, 555, 123, 456, 333,

    444, ] "hello" "world" Array Lives Forever!
  115. Remove Bloat?

  116. Remove the Array?

  117. ISeq GC ISeq Object [ 123, 555, 123, 456, 333,

    444, ] "hello" "world" Array Decode
  118. Mark Loop def mark_params(pc, byte_code) # Virtual Machine Loop loop

    do case byte_code[pc] when nil then break when PUSH parameter = byte_code[pc + 1] gc_mark(parameter) pc += 1 when APPEND when PRINT end pc += 1 end end Mark
  119. ISeq GC ISeq Object [ 123, 555, 123, 456, 333,

    444, ] "hello" "world" Array
  120. Mark Object Directly

  121. Actual Code commit 9e26858e8c32e7f4b6ae3bccf9896ea7b61ce335 Author: tenderlove <tenderlove@b2dd03c8-39d4-4d8f-98ff-823fe69b080e> Date: Mon Mar

    19 18:21:54 2018 +0000 Reverting r62775, this should fix i686 builds We need to mark default values for kwarg methods. This also fixes Bootsnap. IBF iseq loading needed to mark iseqs as "having markable objects". git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62851 b2dd03c8-39d4-4d8f-98ff-823fe69b080e diff --git a/compile.c b/compile.c index 71f60b9b1b..68d4bf549a 100644 --- a/compile.c +++ b/compile.c @@ -562,15 +562,6 @@ APPEND_ELEM(ISEQ_ARG_DECLARE LINK_ANCHOR *const anchor, LINK_ELEMENT *before, LI #define APPEND_ELEM(anchor, before, elem) APPEND_ELEM(iseq, (anchor), (before), (elem)) #endif -static int -iseq_add_mark_object(const rb_iseq_t *iseq, VALUE v) -{ - if (!SPECIAL_CONST_P(v)) { - rb_iseq_add_mark_object(iseq, v); - } - return COMPILE_OK; -} - static int iseq_add_mark_object_compile_time(const rb_iseq_t *iseq, VALUE v) { @@ -749,6 +740,7 @@ rb_iseq_translate_threaded_code(rb_iseq_t *iseq) encoded[i] = (VALUE)table[insn]; i += len;
  122. Direct ISeq Marking https://bugs.ruby-lang.org/issues/14370

  123. Real World

  124. Basic Rails App Number of Live Objects 0 17500 35000

    52500 70000 Object Type T_IM EM O T_STRIN G T_ARRAY T_C LASS T_O BJEC T T_DATA T_H ASH T_REG EXP T_IC LASS T_M O DU LE T_RATIO N AL T_STRU C T T_SYM BO L T_BIG N U M T_FLO AT T_FILE T_M ATC H T_C O M PLEX Ruby 2.5 Ruby 2.6 Array Reduction
  125. Ruby 2.5: 35k Arrays Ruby 2.6: 8.5k Arrays

  126. 34% Reduction in Objects Ruby 2.5: 267783 Ruby 2.6: 202181

  127. Process Memory: -6% Before After

  128. The More You Load The More You Save!

  129. Free When You Upgrade to Ruby 2.6!

  130. Conclusion (ŇɾεɾŇ)

  131. We learned about Memory Inspection!

  132. We learned about VMs!

  133. We learned about ourselves.

  134. Upgrade to Ruby 2.6!

  135. Thank You!!!