Slide 1

Slide 1 text

Profiling Ruby, or, How I Made Rails Start Up 2x Faster Greg Price Solano Labs Pivotal Labs SF 2013-04-02 1 / 30

Slide 2

Slide 2 text

Problem: Ruby startup time 2 / 30

Slide 3

Slide 3 text

(with 1.9.3-p194) $ date; bundle exec rails c Mon Oct 29 18:26:11 PDT 2012 Time.now [waiting, waiting...] 3 / 30

Slide 4

Slide 4 text

(with 1.9.3-p194) $ date; bundle exec rails c Mon Oct 29 18:26:11 PDT 2012 Time.now Loading development environment (Rails 3.0.17) irb(main):001:0> Time.now => 2012-10-29 18:26:28 -0700 3 / 30

Slide 5

Slide 5 text

Spoiler! (with my patches, as in Ruby 2.0) $ date; bundle exec rails c Thu Feb 7 18:36:45 PST 2013 Time.now 4 / 30

Slide 6

Slide 6 text

Spoiler! (with my patches, as in Ruby 2.0) $ date; bundle exec rails c Thu Feb 7 18:36:45 PST 2013 Time.now Loading development environment (Rails 3.0.19) irb(main):001:0> Time.now => 2013-02-07 18:36:54 -0800 4 / 30

Slide 7

Slide 7 text

Fixing performance problems 1. Measure the problem 2. Profile to locate the problem 3. Fix the problem 4. Repeat until fast 5 / 30

Slide 8

Slide 8 text

Step 1: Measure $ bundle exec rails runner \ ’p $LOADED_FEATURES.size’ 2125 6 / 30

Slide 9

Slide 9 text

Step 1: Measure $ mkdir reqtest $ for i in {1..10000}; do \ touch reqtest/$i.rb; done $ time ruby -I reqtest -e \ ’(1..10000).each {|i| require "#{i}"}’ 7 / 30

Slide 10

Slide 10 text

Step 1: Measure require N empty files time avg. / N (best of 5) file 2000 0.286s 0.1ms 4000 0.838 0.2 8000 2.775 0.3 16000 10.677 0.7 32000 42.531 1.3 8 / 30

Slide 11

Slide 11 text

Step 2: Profile $ perf record ruby -I reqtest -e \ ’(1..10000).each {|i| require "#{i}"}’ 9 / 30

Slide 12

Slide 12 text

Step 2: Profile $ perf report 10 / 30

Slide 13

Slide 13 text

Step 2: Profile $ perf report 58.68% ruby libc-2.15.so 0x25820 12.36% ruby ruby rb_string_value_ptr 7.90% ruby ruby rb_feature_p 6.19% ruby ruby loaded_feature_path 5.83% ruby [kernel] 0xc10359d8 10 / 30

Slide 14

Slide 14 text

Step 2: Profile $ perf report -d ruby 37.20% ruby rb_string_value_ptr 23.78% ruby rb_feature_p 18.62% ruby loaded_feature_path 3.18% ruby 0x108e0 0.83% ruby file_expand_path 11 / 30

Slide 15

Slide 15 text

Step 2a Call-graph profiling 12 / 30

Slide 16

Slide 16 text

Step 2a Call-graph profiling Ideally: $ perf record -g ruby ... $ perf report 12 / 30

Slide 17

Slide 17 text

Step 2a Call-graph profiling Ideally: $ perf record -g ruby ... $ perf report MRI adds a step 12 / 30

Slide 18

Slide 18 text

Step 2a: Call-Graph Profiling for MRI 1. Build the Ruby interpreter with make debugflags=’-ggdb -fno-omit-frame-pointer’ (small slowdown, < 3%) 13 / 30

Slide 19

Slide 19 text

Step 2a: Call-Graph Profiling for MRI 1. Build the Ruby interpreter with make debugflags=’-ggdb -fno-omit-frame-pointer’ (small slowdown, < 3%) then 2. perf record -g ruby ... 3. perf report (maybe -g to customize) 13 / 30

Slide 20

Slide 20 text

Step 2a: Call-Graph Profiling $ perf report -g graph,1 -d ruby - 32.16% ruby rb_string_value_ptr - rb_string_value_ptr + 28.38% rb_feature_p + 3.78% rb_require_safe + 27.41% ruby rb_feature_p - 22.10% ruby loaded_feature_path - loaded_feature_path + 21.00% rb_feature_p + 1.11% rb_require_safe 14 / 30

Slide 21

Slide 21 text

Step 2a: Call-Graph Profiling $ perf report -g graph,1 -d ruby - 32.16% ruby rb_string_value_ptr - rb_string_value_ptr + 28.38% rb feature p + 3.78% rb_require_safe + 27.41% ruby rb feature p - 22.10% ruby loaded_feature_path - loaded_feature_path + 21.00% rb feature p + 1.11% rb_require_safe 15 / 30

Slide 22

Slide 22 text

Step 2a: Call-Graph Profiling $ perf report -g graph,1 -d ruby - 32.16% ruby rb_string_value_ptr - rb_string_value_ptr + 28.38% rb feature p + 3.78% rb_require_safe + 27.41% ruby rb feature p - 22.10% ruby loaded_feature_path - loaded_feature_path + 21.00% rb feature p + 1.11% rb_require_safe 76.79% rb feature p() in total 15 / 30

Slide 23

Slide 23 text

Step 2: Profile For C code: (like MRI & extensions) Linux perf perf record $cmd perf report 16 / 30

Slide 24

Slide 24 text

Step 2: Profile For C code: (like MRI & extensions) Linux perf perf record $cmd perf report For Ruby code: perftools.rb (can’t fit invocation here) 16 / 30

Slide 25

Slide 25 text

Step 3: Fix load.c: static int rb_feature_p(const char *feature, const char *ext, int rb, int expanded, const char **fn) { VALUE v, features, p, load_path = 0; const char *f, *e; long i, len, elen, n; st_table *loading_tbl; st_data_t data; int type; if (fn) *fn = 0; if (ext) { elen = strlen(ext); len = strlen(feature) - elen; 17 / 30

Slide 26

Slide 26 text

Step 3: Fix rb_feature_p(const char *feature, const char *ext, int rb, int expanded, const char **fn) { features = /* $LOADED_FEATURES aka $" */; /* for v in features */ { f = StringValuePtr(v); if (loaded feature path(f, /*f.size*/, feature, /*feature.size*/)) return ’u’; } /* also check get_loading_table() */ return 0; } 18 / 30

Slide 27

Slide 27 text

Step 3: Fix Fix: hash table 19 / 30

Slide 28

Slide 28 text

Step 3: Fix Fix: hash table Complications: intricate match semantics $LOADED FEATURES mutable 19 / 30

Slide 29

Slide 29 text

Step 3: Fix Fix: hash table Complications: intricate match semantics $LOADED FEATURES mutable Complete fix: http://bugs.ruby-lang.org/issues/7158 19 / 30

Slide 30

Slide 30 text

Step 4: Repeat 20 / 30

Slide 31

Slide 31 text

Step 1: Measure (again) 21 / 30

Slide 32

Slide 32 text

Step 1: Measure (again) $ time ruby -I reqtest -e \ ’(1..10000).each {|i| require "#{i}"}’ time speedup avg. / N (best of 5) file 2000 0.145 2.0x 0.07ms 4000 0.241 3.5 0.06 8000 0.700 4.0 0.09 16000 2.547 4.2 0.16 32000 10.053 4.2 0.31 22 / 30

Slide 33

Slide 33 text

Step 1: Measure (again) $ bundle exec rails runner "p 1" Rails startup time, version best of 5 speedup v1_9_3_194 12.197s v1_9_3_194+index 8.688s 1.40x 23 / 30

Slide 34

Slide 34 text

Step 2: Profile (again) $ perf record -g bundle exec rails runner "p 1" $ perf report -g graph,3 -d ruby - 19.35% ruby [.] gc_mark - gc_mark + 3.34% mark_keyvalue + 2.17% mark_entry 1.34% gc_mark_children - 13.56% ruby [.] st_foreach - st_foreach + 5.14% gc_mark_children + 3.03% rb_mark_tbl + 9.84% ruby [.] gc_mark_children 6.06% ruby [.] iseq_mark 24 / 30

Slide 35

Slide 35 text

Step 2: Profile (again), better $ perf report --blackbox garbage collect 25 / 30

Slide 36

Slide 36 text

Step 2: Profile (again), better $ perf report --blackbox garbage collect Not available in stock perf; I made it up https://github.com/gnprice/linux/commit/perf Maybe in Linux 3.9? 25 / 30

Slide 37

Slide 37 text

Step 2: Profile (again), better $ perf record -g bundle exec rails runner 1 $ perf report --blackbox garbage_collect \ -g graph,3 -d ruby - garbage_collect + garbage_collect - vm_xmalloc - 82.87% str_new - 96.08% rb_usascii_str_new - 98.32% rb_file_expand_path rb_get_expanded_load_path rb_feature_p + rb_require_safe 26 / 30

Slide 38

Slide 38 text

Step 3: Fix (again) Cache the result of that work 27 / 30

Slide 39

Slide 39 text

Step 3: Fix (again) Cache the result of that work Complications: $LOAD PATH mutable (important!) cwd, etc, etc mutable too 27 / 30

Slide 40

Slide 40 text

Step 3: Fix (again) Cache the result of that work Complications: $LOAD PATH mutable (important!) cwd, etc, etc mutable too Complete fix: http://bugs.ruby-lang.org/issues/7158 27 / 30

Slide 41

Slide 41 text

Step 4: Repeat (again) 28 / 30

Slide 42

Slide 42 text

Step 1: Measure (last time) $ bundle exec rails runner "p 1" Rails startup time, version best of 5 speedup v1_9_3_194 12.197s v1_9_3_194+index 8.688s 1.40x v1_9_3_194+index+cache 5.538s 2.20x 29 / 30

Slide 43

Slide 43 text

End My patches in Ruby 2.0: http://bugs.ruby-lang.org/issues/7158 My --blackbox patch to Linux perf: http://github.com/gnprice/linux/commit/perf These slides: http://speakerdeck.com/gnprice/profiling-ruby 30 / 30