Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profiling Ruby

Profiling Ruby

Ruby 2.0 starts applications in less than half the time taken by 1.9.3. We'll see the techniques I used to diagnose (and fix) the problem, which can be used for any performance problem in MRI or a C extension. The same principles apply to fixing performance problems in Ruby code.

(As delivered at Pivotal Labs in San Francisco, 2013-04-02.)

For a recording of the complete in-person presentation, see http://bostonrb.org/presentations/profiling-ruby (a slightly different version.)

Greg Price

April 02, 2013
Tweet

More Decks by Greg Price

Other Decks in Programming

Transcript

  1. Profiling Ruby, or,
    How I Made Rails Start Up 2x
    Faster
    Greg Price
    Solano Labs
    Pivotal Labs SF
    2013-04-02
    1 / 30

    View Slide

  2. Problem: Ruby startup time
    2 / 30

    View Slide

  3. (with 1.9.3-p194)
    $ date; bundle exec rails c
    Mon Oct 29 18:26:11 PDT 2012
    Time.now
    [waiting, waiting...]
    3 / 30

    View Slide

  4. (with 1.9.3-p194)
    $ date; bundle exec rails c
    Mon Oct 29 18:26:11 PDT 2012
    Time.now
    Loading development environment (Rails
    3.0.17)
    irb(main):001:0> Time.now
    => 2012-10-29 18:26:28 -0700
    3 / 30

    View Slide

  5. Spoiler!
    (with my patches, as in Ruby 2.0)
    $ date; bundle exec rails c
    Thu Feb 7 18:36:45 PST 2013
    Time.now
    4 / 30

    View Slide

  6. Spoiler!
    (with my patches, as in Ruby 2.0)
    $ date; bundle exec rails c
    Thu Feb 7 18:36:45 PST 2013
    Time.now
    Loading development environment (Rails
    3.0.19)
    irb(main):001:0> Time.now
    => 2013-02-07 18:36:54 -0800
    4 / 30

    View Slide

  7. Fixing performance problems
    1. Measure the problem
    2. Profile to locate the problem
    3. Fix the problem
    4. Repeat until fast
    5 / 30

    View Slide

  8. Step 1: Measure
    $ bundle exec rails runner \
    ’p $LOADED_FEATURES.size’
    2125
    6 / 30

    View Slide

  9. Step 1: Measure
    $ mkdir reqtest
    $ for i in {1..10000}; do \
    touch reqtest/$i.rb; done
    $ time ruby -I reqtest -e \
    ’(1..10000).each {|i| require "#{i}"}’
    7 / 30

    View Slide

  10. Step 1: Measure
    require N
    empty files
    time avg. /
    N (best of 5) file
    2000 0.286s 0.1ms
    4000 0.838 0.2
    8000 2.775 0.3
    16000 10.677 0.7
    32000 42.531 1.3
    8 / 30

    View Slide

  11. Step 2: Profile
    $ perf record ruby -I reqtest -e \
    ’(1..10000).each {|i| require "#{i}"}’
    9 / 30

    View Slide

  12. Step 2: Profile
    $ perf report
    10 / 30

    View Slide

  13. Step 2: Profile
    $ perf report
    58.68% ruby libc-2.15.so 0x25820
    12.36% ruby ruby rb_string_value_ptr
    7.90% ruby ruby rb_feature_p
    6.19% ruby ruby loaded_feature_path
    5.83% ruby [kernel] 0xc10359d8
    10 / 30

    View Slide

  14. Step 2: Profile
    $ perf report -d ruby
    37.20% ruby rb_string_value_ptr
    23.78% ruby rb_feature_p
    18.62% ruby loaded_feature_path
    3.18% ruby 0x108e0
    0.83% ruby file_expand_path
    11 / 30

    View Slide

  15. Step 2a
    Call-graph profiling
    12 / 30

    View Slide

  16. Step 2a
    Call-graph profiling
    Ideally:
    $ perf record -g ruby ...
    $ perf report
    12 / 30

    View Slide

  17. Step 2a
    Call-graph profiling
    Ideally:
    $ perf record -g ruby ...
    $ perf report
    MRI adds a step
    12 / 30

    View Slide

  18. Step 2a: Call-Graph Profiling for MRI
    1. Build the Ruby interpreter with
    make debugflags=’-ggdb -fno-omit-frame-pointer’
    (small slowdown, < 3%)
    13 / 30

    View Slide

  19. Step 2a: Call-Graph Profiling for MRI
    1. Build the Ruby interpreter with
    make debugflags=’-ggdb -fno-omit-frame-pointer’
    (small slowdown, < 3%)
    then
    2. perf record -g ruby ...
    3. perf report (maybe -g to customize)
    13 / 30

    View Slide

  20. Step 2a: Call-Graph Profiling
    $ perf report -g graph,1 -d ruby
    - 32.16% ruby rb_string_value_ptr
    - rb_string_value_ptr
    + 28.38% rb_feature_p
    + 3.78% rb_require_safe
    + 27.41% ruby rb_feature_p
    - 22.10% ruby loaded_feature_path
    - loaded_feature_path
    + 21.00% rb_feature_p
    + 1.11% rb_require_safe
    14 / 30

    View Slide

  21. Step 2a: Call-Graph Profiling
    $ perf report -g graph,1 -d ruby
    - 32.16% ruby rb_string_value_ptr
    - rb_string_value_ptr
    + 28.38% rb feature p
    + 3.78% rb_require_safe
    + 27.41% ruby rb feature p
    - 22.10% ruby loaded_feature_path
    - loaded_feature_path
    + 21.00% rb feature p
    + 1.11% rb_require_safe
    15 / 30

    View Slide

  22. Step 2a: Call-Graph Profiling
    $ perf report -g graph,1 -d ruby
    - 32.16% ruby rb_string_value_ptr
    - rb_string_value_ptr
    + 28.38% rb feature p
    + 3.78% rb_require_safe
    + 27.41% ruby rb feature p
    - 22.10% ruby loaded_feature_path
    - loaded_feature_path
    + 21.00% rb feature p
    + 1.11% rb_require_safe
    76.79% rb feature p() in total
    15 / 30

    View Slide

  23. Step 2: Profile
    For C code: (like MRI & extensions)
    Linux perf
    perf record $cmd
    perf report
    16 / 30

    View Slide

  24. Step 2: Profile
    For C code: (like MRI & extensions)
    Linux perf
    perf record $cmd
    perf report
    For Ruby code:
    perftools.rb
    (can’t fit invocation here)
    16 / 30

    View Slide

  25. Step 3: Fix
    load.c:
    static int
    rb_feature_p(const char *feature, const char *ext,
    int rb, int expanded, const char **fn)
    {
    VALUE v, features, p, load_path = 0;
    const char *f, *e;
    long i, len, elen, n;
    st_table *loading_tbl;
    st_data_t data;
    int type;
    if (fn) *fn = 0;
    if (ext) {
    elen = strlen(ext);
    len = strlen(feature) - elen;
    17 / 30

    View Slide

  26. Step 3: Fix
    rb_feature_p(const char *feature, const char *ext,
    int rb, int expanded, const char **fn)
    {
    features = /* $LOADED_FEATURES aka $" */;
    /* for v in features */ {
    f = StringValuePtr(v);
    if (loaded feature path(f, /*f.size*/,
    feature, /*feature.size*/))
    return ’u’;
    }
    /* also check get_loading_table() */
    return 0;
    }
    18 / 30

    View Slide

  27. Step 3: Fix
    Fix: hash table
    19 / 30

    View Slide

  28. Step 3: Fix
    Fix: hash table
    Complications:
    intricate match semantics
    $LOADED FEATURES mutable
    19 / 30

    View Slide

  29. Step 3: Fix
    Fix: hash table
    Complications:
    intricate match semantics
    $LOADED FEATURES mutable
    Complete fix:
    http://bugs.ruby-lang.org/issues/7158
    19 / 30

    View Slide

  30. Step 4: Repeat
    20 / 30

    View Slide

  31. Step 1: Measure (again)
    21 / 30

    View Slide

  32. Step 1: Measure (again)
    $ time ruby -I reqtest -e \
    ’(1..10000).each {|i| require "#{i}"}’
    time speedup avg. /
    N (best of 5) file
    2000 0.145 2.0x 0.07ms
    4000 0.241 3.5 0.06
    8000 0.700 4.0 0.09
    16000 2.547 4.2 0.16
    32000 10.053 4.2 0.31
    22 / 30

    View Slide

  33. Step 1: Measure (again)
    $ bundle exec rails runner "p 1"
    Rails startup time,
    version best of 5 speedup
    v1_9_3_194 12.197s
    v1_9_3_194+index 8.688s 1.40x
    23 / 30

    View Slide

  34. Step 2: Profile (again)
    $ perf record -g bundle exec rails runner "p 1"
    $ perf report -g graph,3 -d ruby
    - 19.35% ruby [.] gc_mark
    - gc_mark
    + 3.34% mark_keyvalue
    + 2.17% mark_entry
    1.34% gc_mark_children
    - 13.56% ruby [.] st_foreach
    - st_foreach
    + 5.14% gc_mark_children
    + 3.03% rb_mark_tbl
    + 9.84% ruby [.] gc_mark_children
    6.06% ruby [.] iseq_mark
    24 / 30

    View Slide

  35. Step 2: Profile (again), better
    $ perf report --blackbox garbage collect
    25 / 30

    View Slide

  36. Step 2: Profile (again), better
    $ perf report --blackbox garbage collect
    Not available in stock perf; I made it up
    https://github.com/gnprice/linux/commit/perf
    Maybe in Linux 3.9?
    25 / 30

    View Slide

  37. Step 2: Profile (again), better
    $ perf record -g bundle exec rails runner 1
    $ perf report --blackbox garbage_collect \
    -g graph,3 -d ruby
    - garbage_collect
    + garbage_collect
    - vm_xmalloc
    - 82.87% str_new
    - 96.08% rb_usascii_str_new
    - 98.32% rb_file_expand_path
    rb_get_expanded_load_path
    rb_feature_p
    + rb_require_safe
    26 / 30

    View Slide

  38. Step 3: Fix (again)
    Cache the result of that work
    27 / 30

    View Slide

  39. Step 3: Fix (again)
    Cache the result of that work
    Complications:
    $LOAD PATH mutable (important!)
    cwd, etc, etc mutable too
    27 / 30

    View Slide

  40. Step 3: Fix (again)
    Cache the result of that work
    Complications:
    $LOAD PATH mutable (important!)
    cwd, etc, etc mutable too
    Complete fix:
    http://bugs.ruby-lang.org/issues/7158
    27 / 30

    View Slide

  41. Step 4: Repeat (again)
    28 / 30

    View Slide

  42. Step 1: Measure (last time)
    $ bundle exec rails runner "p 1"
    Rails startup time,
    version best of 5 speedup
    v1_9_3_194 12.197s
    v1_9_3_194+index 8.688s 1.40x
    v1_9_3_194+index+cache 5.538s 2.20x
    29 / 30

    View Slide

  43. End
    My patches in Ruby 2.0:
    http://bugs.ruby-lang.org/issues/7158
    My --blackbox patch to Linux perf:
    http://github.com/gnprice/linux/commit/perf
    These slides:
    http://speakerdeck.com/gnprice/profiling-ruby
    30 / 30

    View Slide