Profiling Ruby

Proﬁling Ruby, or, How I Made Rails Start Up 2x
Faster Greg Price Solano Labs Pivotal Labs SF 2013-04-02 1 / 30

Problem: Ruby startup time 2 / 30

(with 1.9.3-p194) $ date; bundle exec rails c Mon Oct
29 18:26:11 PDT 2012 Time.now [waiting, waiting...] 3 / 30

(with 1.9.3-p194) $ date; bundle exec rails c Mon Oct
29 18:26:11 PDT 2012 Time.now Loading development environment (Rails 3.0.17) irb(main):001:0> Time.now => 2012-10-29 18:26:28 -0700 3 / 30

Spoiler! (with my patches, as in Ruby 2.0) $ date;
bundle exec rails c Thu Feb 7 18:36:45 PST 2013 Time.now 4 / 30

Spoiler! (with my patches, as in Ruby 2.0) $ date;
bundle exec rails c Thu Feb 7 18:36:45 PST 2013 Time.now Loading development environment (Rails 3.0.19) irb(main):001:0> Time.now => 2013-02-07 18:36:54 -0800 4 / 30

Fixing performance problems 1. Measure the problem 2. Proﬁle to
locate the problem 3. Fix the problem 4. Repeat until fast 5 / 30

Step 1: Measure $ bundle exec rails runner \ ’p
$LOADED_FEATURES.size’ 2125 6 / 30

Step 1: Measure $ mkdir reqtest $ for i in
{1..10000}; do \ touch reqtest/$i.rb; done $ time ruby -I reqtest -e \ ’(1..10000).each {|i| require "#{i}"}’ 7 / 30

Step 1: Measure require N empty files time avg. /
N (best of 5) file 2000 0.286s 0.1ms 4000 0.838 0.2 8000 2.775 0.3 16000 10.677 0.7 32000 42.531 1.3 8 / 30

Step 2: Proﬁle $ perf record ruby -I reqtest -e
\ ’(1..10000).each {|i| require "#{i}"}’ 9 / 30

Step 2: Proﬁle $ perf report 10 / 30

Step 2: Proﬁle $ perf report 58.68% ruby libc-2.15.so 0x25820
12.36% ruby ruby rb_string_value_ptr 7.90% ruby ruby rb_feature_p 6.19% ruby ruby loaded_feature_path 5.83% ruby [kernel] 0xc10359d8 10 / 30

Step 2: Proﬁle $ perf report -d ruby 37.20% ruby
rb_string_value_ptr 23.78% ruby rb_feature_p 18.62% ruby loaded_feature_path 3.18% ruby 0x108e0 0.83% ruby file_expand_path 11 / 30

Step 2a Call-graph proﬁling 12 / 30

Step 2a Call-graph proﬁling Ideally: $ perf record -g ruby
... $ perf report 12 / 30

Step 2a Call-graph proﬁling Ideally: $ perf record -g ruby
... $ perf report MRI adds a step 12 / 30

Step 2a: Call-Graph Proﬁling for MRI 1. Build the Ruby
interpreter with make debugflags=’-ggdb -fno-omit-frame-pointer’ (small slowdown, < 3%) 13 / 30

Step 2a: Call-Graph Proﬁling for MRI 1. Build the Ruby
interpreter with make debugflags=’-ggdb -fno-omit-frame-pointer’ (small slowdown, < 3%) then 2. perf record -g ruby ... 3. perf report (maybe -g to customize) 13 / 30

Step 2a: Call-Graph Proﬁling $ perf report -g graph,1 -d
ruby - 32.16% ruby rb_string_value_ptr - rb_string_value_ptr + 28.38% rb_feature_p + 3.78% rb_require_safe + 27.41% ruby rb_feature_p - 22.10% ruby loaded_feature_path - loaded_feature_path + 21.00% rb_feature_p + 1.11% rb_require_safe 14 / 30

ruby - 32.16% ruby rb_string_value_ptr - rb_string_value_ptr + 28.38% rb feature p + 3.78% rb_require_safe + 27.41% ruby rb feature p - 22.10% ruby loaded_feature_path - loaded_feature_path + 21.00% rb feature p + 1.11% rb_require_safe 15 / 30

ruby - 32.16% ruby rb_string_value_ptr - rb_string_value_ptr + 28.38% rb feature p + 3.78% rb_require_safe + 27.41% ruby rb feature p - 22.10% ruby loaded_feature_path - loaded_feature_path + 21.00% rb feature p + 1.11% rb_require_safe 76.79% rb feature p() in total 15 / 30

Step 2: Proﬁle For C code: (like MRI & extensions)
Linux perf perf record $cmd perf report 16 / 30

Step 2: Proﬁle For C code: (like MRI & extensions)
Linux perf perf record $cmd perf report For Ruby code: perftools.rb (can’t ﬁt invocation here) 16 / 30

Step 3: Fix load.c: static int rb_feature_p(const char *feature, const
char *ext, int rb, int expanded, const char **fn) { VALUE v, features, p, load_path = 0; const char *f, *e; long i, len, elen, n; st_table *loading_tbl; st_data_t data; int type; if (fn) *fn = 0; if (ext) { elen = strlen(ext); len = strlen(feature) - elen; 17 / 30

Step 3: Fix rb_feature_p(const char *feature, const char *ext, int
rb, int expanded, const char **fn) { features = /* $LOADED_FEATURES aka $" */; /* for v in features */ { f = StringValuePtr(v); if (loaded feature path(f, /*f.size*/, feature, /*feature.size*/)) return ’u’; } /* also check get_loading_table() */ return 0; } 18 / 30

Step 3: Fix Fix: hash table 19 / 30

Step 3: Fix Fix: hash table Complications: intricate match semantics
$LOADED FEATURES mutable 19 / 30

Step 3: Fix Fix: hash table Complications: intricate match semantics
$LOADED FEATURES mutable Complete ﬁx: http://bugs.ruby-lang.org/issues/7158 19 / 30

Step 4: Repeat 20 / 30

Step 1: Measure (again) 21 / 30

Step 1: Measure (again) $ time ruby -I reqtest -e
\ ’(1..10000).each {|i| require "#{i}"}’ time speedup avg. / N (best of 5) file 2000 0.145 2.0x 0.07ms 4000 0.241 3.5 0.06 8000 0.700 4.0 0.09 16000 2.547 4.2 0.16 32000 10.053 4.2 0.31 22 / 30

Step 1: Measure (again) $ bundle exec rails runner "p
1" Rails startup time, version best of 5 speedup v1_9_3_194 12.197s v1_9_3_194+index 8.688s 1.40x 23 / 30

Step 2: Proﬁle (again) $ perf record -g bundle exec
rails runner "p 1" $ perf report -g graph,3 -d ruby - 19.35% ruby [.] gc_mark - gc_mark + 3.34% mark_keyvalue + 2.17% mark_entry 1.34% gc_mark_children - 13.56% ruby [.] st_foreach - st_foreach + 5.14% gc_mark_children + 3.03% rb_mark_tbl + 9.84% ruby [.] gc_mark_children 6.06% ruby [.] iseq_mark 24 / 30

Step 2: Proﬁle (again), better $ perf report --blackbox garbage
collect 25 / 30

Step 2: Proﬁle (again), better $ perf report --blackbox garbage
collect Not available in stock perf; I made it up https://github.com/gnprice/linux/commit/perf Maybe in Linux 3.9? 25 / 30

Step 2: Proﬁle (again), better $ perf record -g bundle
exec rails runner 1 $ perf report --blackbox garbage_collect \ -g graph,3 -d ruby - garbage_collect + garbage_collect - vm_xmalloc - 82.87% str_new - 96.08% rb_usascii_str_new - 98.32% rb_file_expand_path rb_get_expanded_load_path rb_feature_p + rb_require_safe 26 / 30

Step 3: Fix (again) Cache the result of that work
27 / 30

Complications: $LOAD PATH mutable (important!) cwd, etc, etc mutable too 27 / 30

Complications: $LOAD PATH mutable (important!) cwd, etc, etc mutable too Complete ﬁx: http://bugs.ruby-lang.org/issues/7158 27 / 30

Step 4: Repeat (again) 28 / 30

Step 1: Measure (last time) $ bundle exec rails runner
"p 1" Rails startup time, version best of 5 speedup v1_9_3_194 12.197s v1_9_3_194+index 8.688s 1.40x v1_9_3_194+index+cache 5.538s 2.20x 29 / 30

End My patches in Ruby 2.0: http://bugs.ruby-lang.org/issues/7158 My --blackbox patch
to Linux perf: http://github.com/gnprice/linux/commit/perf These slides: http://speakerdeck.com/gnprice/profiling-ruby 30 / 30

Profiling Ruby

Profiling Ruby

More Decks by Greg Price

Other Decks in Programming

Featured

Transcript