$30 off During Our Annual Pro Sale. View Details »

SLOMO

 SLOMO

No one wants to be stuck in the slow lane, especially Rubyists. In this talk we'll look at the slow process of writing fast code. We'll look at several real world performance optimizations that may surprise you. We'll then rewind to see how these slow spots were found and fixed. Come to this talk and we will "C" how fast your Ruby can "Go".

Richard Schneeman

November 18, 2016
Tweet

More Decks by Richard Schneeman

Other Decks in Programming

Transcript

  1. View Slide

  2. View Slide

  3. slomo
    the
    movie
    www.
    .com
    Special thanks

    View Slide

  4. Introducing
    Dr. John Kitchin
    The neurologist

    View Slide

  5. Introducing
    Dr. John Kitchin
    The inspiration

    View Slide

  6. Introducing
    Dr. John Kitchin
    The skater

    View Slide

  7. What is your goal
    in life?

    View Slide

  8. Grow up

    View Slide

  9. Pick a career

    View Slide

  10. Buy a sports car

    View Slide

  11. Retire

    View Slide

  12. The faster the
    better

    View Slide

  13. Slomo

    View Slide

  14. Said “no”

    View Slide

  15. Introducing
    Schneems
    The narrator

    View Slide

  16. Sounds like
    “schnapps”

    View Slide

  17. The narrator
    said, explaining
    his own
    introduction

    View Slide

  18. View Slide

  19. View Slide

  20. Maintain
    Sprockets

    View Slide

  21. Those are all
    my hobbies

    View Slide

  22. Full time job:

    View Slide

  23. Resisting
    the Trump
    Administration

    View Slide

  24. I consider
    myself a very
    chill person

    View Slide

  25. Like to mix
    my yoga
    habits with
    resistance

    View Slide

  26. “ohm”

    View Slide

  27. BTW

    View Slide

  28. In this talk
    we will not
    use “nil”

    View Slide

  29. Naan

    View Slide

  30. Sorry for ny
    naan jokes

    View Slide

  31. Sorry for ny
    naan jokes

    View Slide

  32. This

    View Slide

  33. talk

    View Slide

  34. is

    View Slide

  35. about

    View Slide

  36. speed

    View Slide

  37. View Slide

  38. The fastest way to
    get where you’re
    going

    View Slide

  39. Depends on where
    you’re going

    View Slide

  40. To get there faster

    View Slide

  41. Let’s slow down

    View Slide

  42. Why would we
    want slower code?

    View Slide

  43. Measure it

    View Slide

  44. task "assets:profile" do
    puts "=============="
    StackProf.run(mode: :wall, out: "tmp/stackprof.dump") do
    Rake::Task["assets:precompile"].invoke
    end
    puts "Running: $ stackprof tmp/stackprof.dump"
    puts `stackprof tmp/stackprof.dump`
    end

    View Slide

  45. task "assets:profile" do
    puts "=============="
    StackProf.run(mode: :wall, out: "tmp/stackprof.dump") do
    Rake::Task["assets:precompile"].invoke
    end
    puts "Running: $ stackprof tmp/stackprof.dump"
    puts `stackprof tmp/stackprof.dump`
    end

    View Slide

  46. task "assets:profile" do
    puts "=============="
    StackProf.run(mode: :wall, out: "tmp/stackprof.dump") do
    Rake::Task["assets:precompile"].invoke
    end
    puts "Running: $ stackprof tmp/stackprof.dump"
    puts `stackprof tmp/stackprof.dump`
    end

    View Slide

  47. $ stackprof
    tmp/stackprof.dump

    View Slide

  48. Running: $ stackprof tmp/stackprof.dump
    ==================================
    Mode: wall(1000)
    Samples: 2083 (62.23% miss rate)
    GC: 282 (13.54%)
    ==================================
    TOTAL (pct) SAMPLES (pct) FRAME
    313 (15.0%) 313 (15.0%) Set#include?
    307 (14.7%) 243 (11.7%) Sprockets::DigestUtils#digest
    743 (35.7%) 137 (6.6%) Kernel#require
    476 (22.9%) 124 (6.0%) Kernel#require
    157 (7.5%) 92 (4.4%) SassC::Rails::Importer#imports
    243 (11.7%) 81 (3.9%) #
    67 (3.2%) 67 (3.2%) NumericWithFormat#to_s
    160 (7.7%) 53 (2.5%) PathUtils#atomic_write
    44 (2.1%) 44 (2.1%) #
    239 (11.5%) 37 (1.8%) Sprockets::Cache::FileStore#set

    View Slide

  49. Running: $ stackprof tmp/stackprof.dump
    ==================================
    Mode: wall(1000)
    Samples: 2083 (62.23% miss rate)
    GC: 282 (13.54%)
    ==================================
    TOTAL (pct) SAMPLES (pct) FRAME
    313 (15.0%) 313 (15.0%) Set#include?
    307 (14.7%) 243 (11.7%) Sprockets::DigestUtils#digest
    743 (35.7%) 137 (6.6%) Kernel#require
    476 (22.9%) 124 (6.0%) Kernel#require
    157 (7.5%) 92 (4.4%) SassC::Rails::Importer#imports
    243 (11.7%) 81 (3.9%) #
    67 (3.2%) 67 (3.2%) NumericWithFormat#to_s
    160 (7.7%) 53 (2.5%) PathUtils#atomic_write
    44 (2.1%) 44 (2.1%) #
    239 (11.5%) 37 (1.8%) Sprockets::Cache::FileStore#set
    Class and method

    View Slide

  50. Running: $ stackprof tmp/stackprof.dump
    ==================================
    Mode: wall(1000)
    Samples: 2083 (62.23% miss rate)
    GC: 282 (13.54%)
    ==================================
    TOTAL (pct) SAMPLES (pct) FRAME
    313 (15.0%) 313 (15.0%) Set#include?
    307 (14.7%) 243 (11.7%) Sprockets::DigestUtils#digest
    743 (35.7%) 137 (6.6%) Kernel#require
    476 (22.9%) 124 (6.0%) Kernel#require
    157 (7.5%) 92 (4.4%) SassC::Rails::Importer#imports
    243 (11.7%) 81 (3.9%) #
    67 (3.2%) 67 (3.2%) NumericWithFormat#to_s
    160 (7.7%) 53 (2.5%) PathUtils#atomic_write
    44 (2.1%) 44 (2.1%) #
    239 (11.5%) 37 (1.8%) Sprockets::Cache::FileStore#set
    Total # of samples

    View Slide

  51. Running: $ stackprof tmp/stackprof.dump
    ==================================
    Mode: wall(1000)
    Samples: 2083 (62.23% miss rate)
    GC: 282 (13.54%)
    ==================================
    TOTAL (pct) SAMPLES (pct) FRAME
    313 (15.0%) 313 (15.0%) Set#include?
    307 (14.7%) 243 (11.7%) Sprockets::DigestUtils#digest
    743 (35.7%) 137 (6.6%) Kernel#require
    476 (22.9%) 124 (6.0%) Kernel#require
    157 (7.5%) 92 (4.4%) SassC::Rails::Importer#imports
    243 (11.7%) 81 (3.9%) #
    67 (3.2%) 67 (3.2%) NumericWithFormat#to_s
    160 (7.7%) 53 (2.5%) PathUtils#atomic_write
    44 (2.1%) 44 (2.1%) #
    239 (11.5%) 37 (1.8%) Sprockets::Cache::FileStore#set
    Percentage

    View Slide

  52. Running: $ stackprof tmp/stackprof.dump
    ==================================
    Mode: wall(1000)
    Samples: 2083 (62.23% miss rate)
    GC: 282 (13.54%)
    ==================================
    TOTAL (pct) SAMPLES (pct) FRAME
    313 (15.0%) 313 (15.0%) Set#include?
    307 (14.7%) 243 (11.7%) Sprockets::DigestUtils#digest
    743 (35.7%) 137 (6.6%) Kernel#require
    476 (22.9%) 124 (6.0%) Kernel#require
    157 (7.5%) 92 (4.4%) SassC::Rails::Importer#imports
    243 (11.7%) 81 (3.9%) #
    67 (3.2%) 67 (3.2%) NumericWithFormat#to_s
    160 (7.7%) 53 (2.5%) PathUtils#atomic_write
    44 (2.1%) 44 (2.1%) #
    239 (11.5%) 37 (1.8%) Sprockets::Cache::FileStore#set
    Total at TOP of stack

    View Slide

  53. Running: $ stackprof tmp/stackprof.dump
    ==================================
    Mode: wall(1000)
    Samples: 2083 (62.23% miss rate)
    GC: 282 (13.54%)
    ==================================
    TOTAL (pct) SAMPLES (pct) FRAME
    313 (15.0%) 313 (15.0%) Set#include?
    307 (14.7%) 243 (11.7%) Sprockets::DigestUtils#digest
    743 (35.7%) 137 (6.6%) Kernel#require
    476 (22.9%) 124 (6.0%) Kernel#require
    157 (7.5%) 92 (4.4%) SassC::Rails::Importer#imports
    243 (11.7%) 81 (3.9%) #
    67 (3.2%) 67 (3.2%) NumericWithFormat#to_s
    160 (7.7%) 53 (2.5%) PathUtils#atomic_write
    44 (2.1%) 44 (2.1%) #
    239 (11.5%) 37 (1.8%) Sprockets::Cache::FileStore#set
    Percent at TOP of stack

    View Slide

  54. Running: $ stackprof tmp/stackprof.dump
    ==================================
    Mode: wall(1000)
    Samples: 2083 (62.23% miss rate)
    GC: 282 (13.54%)
    ==================================
    TOTAL (pct) SAMPLES (pct) FRAME
    313 (15.0%) 313 (15.0%) Set#include?
    307 (14.7%) 243 (11.7%) Sprockets::DigestUtils#digest
    743 (35.7%) 137 (6.6%) Kernel#require
    476 (22.9%) 124 (6.0%) Kernel#require
    157 (7.5%) 92 (4.4%) SassC::Rails::Importer#imports
    243 (11.7%) 81 (3.9%) #
    67 (3.2%) 67 (3.2%) NumericWithFormat#to_s
    160 (7.7%) 53 (2.5%) PathUtils#atomic_write
    44 (2.1%) 44 (2.1%) #
    239 (11.5%) 37 (1.8%) Sprockets::Cache::FileStore#set

    View Slide

  55. 26% of total
    execution samples

    View Slide

  56. What is calling
    so much?
    Set#include?

    View Slide

  57. $ stackprof tmp/stackprof.dump --method Set#include?
    Set#include? (/Users/richardschneeman/.rubies/ruby-2.3.1/lib/ruby/2.3.0/set.rb:214)
    samples: 313 self (15.0%) / 313 total (15.0%)
    callers:
    312 ( 99.7%) Sprockets::ProcessorUtils#valid_processor_metadata_value?
    1 ( 0.3%) Sprockets::Utils#dfs_paths
    code:
    | 214 | def include?(o)
    313 (15.0%) / 313 (15.0%) | 215 | @hash[o]
    | 216 | end

    View Slide

  58. $ stackprof tmp/stackprof.dump --method Set#include?
    Set#include? (/Users/richardschneeman/.rubies/ruby-2.3.1/lib/ruby/2.3.0/set.rb:214)
    samples: 313 self (15.0%) / 313 total (15.0%)
    callers:
    312 ( 99.7%) Sprockets::ProcessorUtils#valid_processor_metadata_value?
    1 ( 0.3%) Sprockets::Utils#dfs_paths
    code:
    | 214 | def include?(o)
    313 (15.0%) / 313 (15.0%) | 215 | @hash[o]
    | 216 | end

    View Slide

  59. What is calling
    so much?
    valid_processor_metadata_value?

    View Slide

  60. $ stackprof tmp/stackprof.dump --method Sprockets::ProcessorUtils#valid_processor_metadata_value?
    Sprockets::ProcessorUtils#valid_processor_metadata_value? (/Users/richardschneeman/.gem/ruby/2.3.1/bundler/
    gems/sprockets-3b0d6732c13f/lib/sprockets/processor_utils.rb:170)
    samples: 24 self (1.2%) / 2129 total (102.2%)
    callers:
    1793 ( 84.2%) Sprockets::ProcessorUtils#valid_processor_metadata_value?
    336 ( 15.8%) Sprockets::ProcessorUtils#validate_processor_result!
    callees (2105 total):
    1793 ( 85.2%) Sprockets::ProcessorUtils#valid_processor_metadata_value?
    312 ( 14.8%) Set#include?
    code:
    | 170 | def valid_processor_metadata_value?(value)
    261 (12.5%) / 2 (0.1%) | 171 | if VALID_METADATA_VALUE_TYPES.include?(value.class)
    | 172 | true
    61 (2.9%) / 8 (0.4%) | 173 | elsif VALID_METADATA_COMPOUND_TYPES.include?(value.class)
    1806 (86.7%) / 13 (0.6%) | 174 | value.all? { |v| valid_processor_metadata_value?(v) }
    | 175 | else
    1 (0.0%) / 1 (0.0%) | 176 | false
    | 177 | end

    View Slide

  61. $ stackprof tmp/stackprof.dump --method Sprockets::ProcessorUtils#valid_processor_metadata_value?
    Sprockets::ProcessorUtils#valid_processor_metadata_value? (/Users/richardschneeman/.gem/ruby/2.3.1/bundler/
    gems/sprockets-3b0d6732c13f/lib/sprockets/processor_utils.rb:170)
    samples: 24 self (1.2%) / 2129 total (102.2%)
    callers:
    1793 ( 84.2%) Sprockets::ProcessorUtils#valid_processor_metadata_value?
    336 ( 15.8%) Sprockets::ProcessorUtils#validate_processor_result!
    callees (2105 total):
    1793 ( 85.2%) Sprockets::ProcessorUtils#valid_processor_metadata_value?
    312 ( 14.8%) Set#include?
    code:
    | 170 | def valid_processor_metadata_value?(value)
    261 (12.5%) / 2 (0.1%) | 171 | if VALID_METADATA_VALUE_TYPES.include?(value.class)
    | 172 | true
    61 (2.9%) / 8 (0.4%) | 173 | elsif VALID_METADATA_COMPOUND_TYPES.include?(value.class)
    1806 (86.7%) / 13 (0.6%) | 174 | value.all? { |v| valid_processor_metadata_value?(v) }
    | 175 | else
    1 (0.0%) / 1 (0.0%) | 176 | false
    | 177 | end

    View Slide

  62. $ stackprof tmp/stackprof.dump --method Sprockets::ProcessorUtils#valid_processor_metadata_value?
    Sprockets::ProcessorUtils#valid_processor_metadata_value? (/Users/richardschneeman/.gem/ruby/2.3.1/bundler/
    gems/sprockets-3b0d6732c13f/lib/sprockets/processor_utils.rb:170)
    samples: 24 self (1.2%) / 2129 total (102.2%)
    callers:
    1793 ( 84.2%) Sprockets::ProcessorUtils#valid_processor_metadata_value?
    336 ( 15.8%) Sprockets::ProcessorUtils#validate_processor_result!
    callees (2105 total):
    1793 ( 85.2%) Sprockets::ProcessorUtils#valid_processor_metadata_value?
    312 ( 14.8%) Set#include?
    code:
    | 170 | def valid_processor_metadata_value?(value)
    261 (12.5%) / 2 (0.1%) | 171 | if VALID_METADATA_VALUE_TYPES.include?(value.class)
    | 172 | true
    61 (2.9%) / 8 (0.4%) | 173 | elsif VALID_METADATA_COMPOUND_TYPES.include?(value.class)
    1806 (86.7%) / 13 (0.6%) | 174 | value.all? { |v| valid_processor_metadata_value?(v) }
    | 175 | else
    1 (0.0%) / 1 (0.0%) | 176 | false
    | 177 | end

    View Slide

  63. def valid_processor_metadata_value?(value)
    if VALID_METADATA_VALUE_TYPES.include?(value.class)
    true
    elsif VALID_METADATA_COMPOUND_TYPES.include?(value.class)
    value.all? { |v| valid_processor_metadata_value?(v) }
    else
    false
    end
    end

    View Slide

  64. Set#include?

    View Slide

  65. # File set.rb, line 214
    def include?(o)
    @hash[o]
    end

    View Slide

  66. Set is powered by
    a Hash under the
    hood

    View Slide

  67. Skip the Set, just
    use a Hash

    View Slide

  68. But why?

    View Slide

  69. Ruby has
    optimized
    instructions for
    hash calls

    View Slide

  70. code = "
    foo = Hash.new
    foo[:bar]
    "
    puts RubyVM::InstructionSequence.compile(code).disasm
    # 0000 trace 1 ( 2)
    # 0002 getinlinecache 9,
    # 0005 getconstant :Hash
    # 0007 setinlinecache
    # 0009 opt_send_without_block ,
    # 0012 setlocal_OP__WC__0 2
    # 0014 trace 1 ( 3)
    # 0016 getlocal_OP__WC__0 2
    # 0018 putobject :bar
    # 0020 opt_aref ,
    # 0023 leave

    View Slide

  71. Compare to Set

    View Slide

  72. code = "
    s = Set.new
    s.include?(:bar)
    "
    puts RubyVM::InstructionSequence.compile(code).disasm
    # 0000 trace 1 ( 2)
    # 0002 getinlinecache 9,
    # 0005 getconstant :Set
    # 0007 setinlinecache
    # 0009 opt_send_without_block ,
    # 0012 setlocal_OP__WC__0 2
    # 0014 trace 1 ( 3)
    # 0016 getlocal_OP__WC__0 2
    # 0018 putobject :bar
    # 0020 opt_send_without_block ,
    # 0023 leave

    View Slide

  73. View Slide

  74. insns.def

    View Slide

  75. /**
    @c optimize
    @e []
    @j 最適化された recv[obj]。
    */
    DEFINE_INSN
    opt_aref
    (CALL_INFO ci, CALL_CACHE cc)
    (VALUE recv, VALUE obj)
    (VALUE val)
    {
    if (!SPECIAL_CONST_P(recv)) {
    if (RBASIC_CLASS(recv) == rb_cArray && BASIC_OP_UNREDEFINED_P(BOP_AREF, ARRAY_REDEFINED_OP_FLAG) &&
    FIXNUM_P(obj)) {
    val = rb_ary_entry(recv, FIX2LONG(obj));
    }
    else if (RBASIC_CLASS(recv) == rb_cHash && BASIC_OP_UNREDEFINED_P(BOP_AREF, HASH_REDEFINED_OP_FLAG)) {
    val = rb_hash_aref(recv, obj);
    }
    else {
    goto INSN_LABEL(normal_dispatch);
    }
    }
    else {
    INSN_LABEL(normal_dispatch):
    PUSH(recv);
    PUSH(obj);
    CALL_SIMPLE_METHOD(recv);
    }
    }

    View Slide

  76. /**
    @c optimize
    @e []
    @j 最適化された recv[obj]。
    */
    DEFINE_INSN
    opt_aref
    (CALL_INFO ci, CALL_CACHE cc)
    (VALUE recv, VALUE obj)
    (VALUE val)
    {
    if (!SPECIAL_CONST_P(recv)) {
    if (RBASIC_CLASS(recv) == rb_cArray && BASIC_OP_UNREDEFINED_P(BOP_AREF, ARRAY_REDEFINED_OP_FLAG) &&
    FIXNUM_P(obj)) {
    val = rb_ary_entry(recv, FIX2LONG(obj));
    }
    else if (RBASIC_CLASS(recv) == rb_cHash && BASIC_OP_UNREDEFINED_P(BOP_AREF, HASH_REDEFINED_OP_FLAG)) {
    val = rb_hash_aref(recv, obj);
    }
    else {
    goto INSN_LABEL(normal_dispatch);
    }
    }
    else {
    INSN_LABEL(normal_dispatch):
    PUSH(recv);
    PUSH(obj);
    CALL_SIMPLE_METHOD(recv);
    }
    }

    View Slide

  77. Versus

    View Slide

  78. /**
    @c optimize
    @e Invoke method without block
    @j Invoke method without block
    */
    DEFINE_INSN
    opt_send_without_block
    (CALL_INFO ci, CALL_CACHE cc)
    (...)
    (VALUE val) // inc += -ci->orig_argc;
    {
    struct rb_calling_info calling;
    calling.blockptr = NULL;
    vm_search_method(ci, cc, calling.recv = TOPN(calling.argc = ci->orig_argc));
    CALL_METHOD(&calling, ci, cc);
    }

    View Slide

  79. /**
    @c optimize
    @e Invoke method without block
    @j Invoke method without block
    */
    DEFINE_INSN
    opt_send_without_block
    (CALL_INFO ci, CALL_CACHE cc)
    (...)
    (VALUE val) // inc += -ci->orig_argc;
    {
    struct rb_calling_info calling;
    calling.blockptr = NULL;
    vm_search_method(ci, cc, calling.recv = TOPN(calling.argc = ci->orig_argc));
    CALL_METHOD(&calling, ci, cc);
    }

    View Slide

  80. Ruby optimizes
    Hash calls by
    Skipping Method
    Lookup

    View Slide

  81. BTW

    View Slide

  82. Don’t subclass
    Hash

    View Slide

  83. /**
    @c optimize
    @e []
    @j 最適化された recv[obj]。
    */
    DEFINE_INSN
    opt_aref
    (CALL_INFO ci, CALL_CACHE cc)
    (VALUE recv, VALUE obj)
    (VALUE val)
    {
    if (!SPECIAL_CONST_P(recv)) {
    if (RBASIC_CLASS(recv) == rb_cArray && BASIC_OP_UNREDEFINED_P(BOP_AREF, ARRAY_REDEFINED_OP_FLAG) &&
    FIXNUM_P(obj)) {
    val = rb_ary_entry(recv, FIX2LONG(obj));
    }
    else if (RBASIC_CLASS(recv) == rb_cHash && BASIC_OP_UNREDEFINED_P(BOP_AREF, HASH_REDEFINED_OP_FLAG)) {
    val = rb_hash_aref(recv, obj);
    }
    else {
    goto INSN_LABEL(normal_dispatch);
    }
    }
    else {
    INSN_LABEL(normal_dispatch):
    PUSH(recv);
    PUSH(obj);
    CALL_SIMPLE_METHOD(recv);
    }
    }
    You lose speed

    View Slide


  84. “I don’t subclass
    hash”

    View Slide

  85. Hash
    With
    Indifferent
    Access

    View Slide

  86. Hashie

    View Slide

  87. Rack::
    Utils::

    HeaderHash

    View Slide

  88. View Slide

  89. Rack is
    23% faster
    without
    HeaderHash

    View Slide

  90. Don’t subclass
    Hash

    View Slide

  91. View Slide

  92. Switch Set to a
    hash

    View Slide

  93. VALID_METADATA_VALUE_TYPES_HASH = VALID_METADATA_VALUE_TYPES.
    each_with_object({}) do |type, hash|
    hash[type] = true
    end.freeze
    def valid_processor_metadata_value?(value)
    if VALID_METADATA_VALUE_TYPES_HASH[value.class]
    true
    elsif VALID_METADATA_COMPOUND_TYPES_HASH[value.class]
    value.all? { |v| valid_processor_metadata_value?(v) }
    else
    false
    end
    end

    View Slide

  94. Skips Method
    Lookup

    View Slide

  95. Did it help?

    View Slide

  96. 18.325s
    17.981s

    View Slide

  97. 1.8% Faster!

    View Slide

  98. Bugger

    View Slide

  99. View Slide

  100. Keep gliding

    View Slide

  101. Round 2

    View Slide

  102. TOTAL (pct) SAMPLES (pct) FRAME
    2328 (109.6%) 362 (17.0%) Sprockets::ProcessorUtils#
    valid_processor_metadata_value?
    348 (16.4%) 256 (12.1%) Sprockets::DigestUtils#digest
    486 (22.9%) 106 (5.0%) Kernel#require
    97 (4.6%) 97 (4.6%) ActiveSupport::
    NumericWithFormat#to_s
    123 (5.8%) 94 (4.4%) Sprockets::PathUtils#atomic_write
    581 (27.4%) 76 (3.6%) Kernel#require
    61 (2.9%) 61 (2.9%) #
    .mechanism
    193 (9.1%) 52 (2.4%) Sprockets::Cache::FileStore#set
    95 (4.5%) 48 (2.3%) SassC::Rails::Importer#
    imports
    36 (1.7%) 36 (1.7%) ExecJS::ExternalRuntime#
    exec_runtime
    59 (2.8%) 25 (1.2%) Kernel#require
    75 (3.5%) 25 (1.2%) Module#delegate

    View Slide

  103. $ stackprof tmp/stackprof.dump --method Sprockets::DigestUtils#digest
    # . . .
    Sprockets::DigestUtils#digest (lib/sprockets/digest_utils.rb:46)
    samples: 4 self (0.2%) / 7 total (0.3%)
    callers:
    5 ( 71.4%) Sprockets::Cache#expand_key
    2 ( 28.6%) Sprockets::Loader#load_from_unloaded
    callees (3 total):
    3 ( 100.0%) Sprockets::DigestUtils#digest_class
    code:
    | 46 | def digest(obj)
    4 (0.2%) / 1 (0.0%) | 47 | digest = digest_class.new
    | 48 | queue = [obj]
    | 49 |
    | 50 | while queue.length > 0
    | 51 | obj = queue.shift
    | 52 | klass = obj.class
    | 53 |
    2 (0.1%) / 2 (0.1%) | 54 | if klass == String
    | 55 | digest << obj
    | 56 | elsif klass == Symbol
    | 57 | digest << 'Symbol'
    | 58 | digest << obj.to_s
    | 59 | elsif klass == Fixnum

    View Slide

  104. $ stackprof tmp/stackprof.dump --method Sprockets::DigestUtils#digest
    # . . .
    Sprockets::DigestUtils#digest (lib/sprockets/digest_utils.rb:46)
    samples: 4 self (0.2%) / 7 total (0.3%)
    callers:
    5 ( 71.4%) Sprockets::Cache#expand_key
    2 ( 28.6%) Sprockets::Loader#load_from_unloaded
    callees (3 total):
    3 ( 100.0%) Sprockets::DigestUtils#digest_class
    code:
    | 46 | def digest(obj)
    4 (0.2%) / 1 (0.0%) | 47 | digest = digest_class.new
    | 48 | queue = [obj]
    | 49 |
    | 50 | while queue.length > 0
    | 51 | obj = queue.shift
    | 52 | klass = obj.class
    | 53 |
    2 (0.1%) / 2 (0.1%) | 54 | if klass == String
    | 55 | digest << obj
    | 56 | elsif klass == Symbol
    | 57 | digest << 'Symbol'
    | 58 | digest << obj.to_s
    | 59 | elsif klass == Fixnum

    View Slide

  105. def digest(obj)
    digest = digest_class.new
    queue = [obj]
    while queue.length > 0
    obj = queue.shift
    klass = obj.class
    if klass == String
    digest << obj
    elsif klass == Symbol
    digest << 'Symbol'
    digest << obj.to_s
    elsif klass == Fixnum
    digest << 'Fixnum'
    digest << obj.to_s
    elsif klass == Bignum
    digest << 'Bignum'
    digest << obj.to_s
    elsif klass == TrueClass
    digest << 'TrueClass'
    elsif klass == FalseClass
    digest << 'FalseClass'

    View Slide

  106. Let me paraphrase

    View Slide

  107. if String

    View Slide

  108. if String
    elsif Symbol

    View Slide

  109. if String
    elsif Symbol
    elsif Fixnum

    View Slide

  110. if String
    elsif Symbol
    elsif Fixnum
    elsif Bignum

    View Slide

  111. if String
    elsif Symbol
    elsif Fixnum
    elsif Bignum
    elsif TrueClass

    View Slide

  112. if String
    elsif Symbol
    elsif Fixnum
    elsif Bignum
    elsif TrueClass
    elsif FalseClass

    View Slide

  113. if String
    elsif Symbol
    elsif Fixnum
    elsif Bignum
    elsif TrueClass
    elsif FalseClass
    elsif NilClass

    View Slide

  114. if String
    elsif Symbol
    elsif Fixnum
    elsif Bignum
    elsif TrueClass
    elsif FalseClass
    elsif NilClass
    elsif Array

    View Slide

  115. if String
    elsif Symbol
    elsif Fixnum
    elsif Bignum
    elsif TrueClass
    elsif FalseClass
    elsif NilClass
    elsif Array
    elsif Hash

    View Slide

  116. if String
    elsif Symbol
    elsif Fixnum
    elsif Bignum
    elsif TrueClass
    elsif FalseClass
    elsif NilClass
    elsif Array
    elsif Hash
    elsif Set

    View Slide

  117. if String
    elsif Symbol
    elsif Fixnum
    elsif Bignum
    elsif TrueClass
    elsif FalseClass
    elsif NilClass
    elsif Array
    elsif Hash
    elsif Set
    elsif Encoding

    View Slide

  118. If we pass in a Set
    object, we must
    make 10
    comparisons

    View Slide

  119. if String
    elsif Symbol
    elsif Fixnum
    elsif Bignum
    elsif TrueClass
    elsif FalseClass
    elsif NilClass
    elsif Array
    elsif Hash
    elsif Set
    elsif Encoding
    Expand
    and
    Iterate

    View Slide

  120. if/elsif is
    hidden
    iteration

    View Slide

  121. How do we go
    faster?

    View Slide

  122. Get rid of iteration

    View Slide

  123. Case statements

    View Slide

  124. or

    View Slide

  125. Hash loOkups

    View Slide

  126. Before

    View Slide

  127. def digest(obj)
    digest = digest_class.new
    queue = [obj]
    while queue.length > 0
    obj = queue.shift
    klass = obj.class
    if klass == String
    digest << obj
    elsif klass == Symbol
    digest << 'Symbol'
    digest << obj.to_s
    elsif klass == Fixnum
    digest << 'Fixnum'
    digest << obj.to_s
    elsif klass == Bignum
    digest << 'Bignum'
    digest << obj.to_s
    elsif klass == TrueClass
    digest << 'TrueClass'
    elsif klass == FalseClass
    digest << 'FalseClass'
    elsif klass == NilClass
    digest << 'NilClass'.freeze
    elsif klass == Array
    digest << 'Array'
    queue.concat(obj)
    elsif klass == Hash
    digest << 'Hash'
    queue.concat(obj.sort)
    elsif klass == Set
    digest << 'Set'
    queue.concat(obj.to_a)
    elsif klass == Encoding
    digest << 'Encoding'
    digest << obj.name
    else
    raise TypeError, "couldn't digest #{klass}"
    end
    end
    digest.digest
    end

    View Slide

  128. After

    View Slide

  129. def digest(obj)
    digest = digest_class.new
    ADD_VALUE_TO_DIGEST[obj.class].call(obj, digest)
    digest.digest
    end

    View Slide

  130. Store logic in a
    Hash

    View Slide

  131. def digest(obj)
    digest = digest_class.new
    ADD_VALUE_TO_DIGEST[obj.class].call(obj, digest)
    digest.digest
    end
    Constant time lookup

    View Slide

  132. Logic Lives in
    Lambdas

    View Slide

  133. ADD_VALUE_TO_DIGEST = {
    String => ->(val, digest) { digest << val },
    FalseClass => ->(val, digest) { digest << 'FalseClass'.freeze },
    TrueClass => ->(val, digest) { digest << 'TrueClass'.freeze },
    NilClass => ->(val, digest) { digest << 'NilClass'.freeze },
    Symbol => ->(val, digest) {
    digest << 'Symbol'.freeze
    digest << val.to_s
    },
    Integer => ->(val, digest) {
    digest << 'Integer'.freeze
    digest << val.to_s
    },
    Array => ->(val, digest) {
    digest << 'Array'.freeze
    val.each do |element|
    ADD_VALUE_TO_DIGEST[element.class].call(element, digest)
    end
    },

    View Slide

  134. ADD_VALUE_TO_DIGEST = {
    String => ->(val, digest) { digest << val },
    FalseClass => ->(val, digest) { digest << 'FalseClass'.freeze },
    TrueClass => ->(val, digest) { digest << 'TrueClass'.freeze },
    NilClass => ->(val, digest) { digest << 'NilClass'.freeze },
    Symbol => ->(val, digest) {
    digest << 'Symbol'.freeze
    digest << val.to_s
    },
    Integer => ->(val, digest) {
    digest << 'Integer'.freeze
    digest << val.to_s
    },
    Array => ->(val, digest) {
    digest << 'Array'.freeze
    val.each do |element|
    ADD_VALUE_TO_DIGEST[element.class].call(element, digest)
    end
    },

    View Slide

  135. ADD_VALUE_TO_DIGEST = {
    String => ->(val, digest) { digest << val },
    FalseClass => ->(val, digest) { digest << 'FalseClass'.freeze },
    TrueClass => ->(val, digest) { digest << 'TrueClass'.freeze },
    NilClass => ->(val, digest) { digest << 'NilClass'.freeze },
    Symbol => ->(val, digest) {
    digest << 'Symbol'.freeze
    digest << val.to_s
    },
    Integer => ->(val, digest) {
    digest << 'Integer'.freeze
    digest << val.to_s
    },
    Array => ->(val, digest) {
    digest << 'Array'.freeze
    val.each do |element|
    ADD_VALUE_TO_DIGEST[element.class].call(element, digest)
    end
    },

    View Slide

  136. ADD_VALUE_TO_DIGEST = {
    String => ->(val, digest) { digest << val },
    FalseClass => ->(val, digest) { digest << 'FalseClass'.freeze },
    TrueClass => ->(val, digest) { digest << 'TrueClass'.freeze },
    NilClass => ->(val, digest) { digest << 'NilClass'.freeze },
    Symbol => ->(val, digest) {
    digest << 'Symbol'.freeze
    digest << val.to_s
    },
    Integer => ->(val, digest) {
    digest << 'Integer'.freeze
    digest << val.to_s
    },
    Array => ->(val, digest) {
    digest << 'Array'.freeze
    val.each do |element|
    ADD_VALUE_TO_DIGEST[element.class].call(element, digest)
    end
    },

    View Slide

  137. zOMG WAT

    View Slide

  138. Recursive
    hash
    is
    Recursive

    View Slide

  139. Faster?

    View Slide

  140. 14.9s
    12.54s

    View Slide

  141. 16% faster asset
    compilation with
    no cache

    View Slide

  142. Repeat for all
    supported types

    View Slide

  143. Who likes
    audience
    participation?

    View Slide

  144. Can we go faster?

    View Slide

  145. ADD_VALUE_TO_DIGEST[String]
    ADD_VALUE_TO_DIGEST[Hash]
    ADD_VALUE_TO_DIGEST[true]
    ADD_VALUE_TO_DIGEST[false]
    ADD_VALUE_TO_DIGEST[nil]
    ADD_VALUE_TO_DIGEST[Array]
    ADD_VALUE_TO_DIGEST[Set]
    ADD_VALUE_TO_DIGEST[Integer]

    View Slide

  146. String
    Hash
    true
    false
    nil
    Array
    Set
    Integer
    All
    Constants

    View Slide

  147. Hash#
    compare_by_identity

    View Slide

  148. Compare object,
    not value

    View Slide

  149. String
    Hash
    true
    false
    nil
    Array
    Set
    Integer
    All
    Constants

    View Slide

  150. View Slide

  151. 7% speed
    improvement

    View Slide

  152. Let’s slow it down

    View Slide

  153. What did we do?

    View Slide

  154. Identify a goal

    View Slide

  155. View Slide

  156. View Slide

  157. Gather potentially
    useful tools

    View Slide

  158. View Slide

  159. View Slide

  160. View Slide

  161. View Slide

  162. Iterate on the
    problem

    View Slide

  163. View Slide

  164. View Slide

  165. View Slide

  166. View Slide

  167. RubyVM::InstructionSequence.compile(code).disasm

    View Slide

  168. Shoulders of
    giants

    View Slide

  169. View Slide

  170. The more you
    understand about
    _how_ your code
    works, the faster you
    can make it

    View Slide

  171. Find a speed
    buddy

    View Slide

  172. (or buddies)

    View Slide

  173. “C”
    how fast your
    program will go

    View Slide

  174. gem 'sassc-rails'

    View Slide

  175. Sassc uses
    libsass

    View Slide

  176. 31% faster

    View Slide

  177. Sprockets 4 will
    support sassc out
    of the box
    (but not yet)

    View Slide

  178. Don’t abandon
    your language for
    speed

    View Slide

  179. Find the slow
    parts and tag team
    them with other
    languages

    View Slide

  180. C-extensions

    View Slide

  181. Use Rust
    with helix

    View Slide

  182. Use Go
    with gorb

    View Slide

  183. p.s. have you tried
    JRuby?

    View Slide

  184. What about
    cached compile
    times

    View Slide

  185. View Slide

  186. Bootscale
    caches
    require lookups

    View Slide

  187. Without it, the more
    gems on your
    system the LONGER
    `require` takes

    View Slide

  188. 6s cached compile
    times
    2s cached compile
    times

    View Slide

  189. 300% faster

    View Slide

  190. View Slide

  191. Don’t be an
    asshole

    View Slide

  192. Don’t appologize
    for the
    programing
    language you love

    View Slide

  193. Set your own
    goals

    View Slide

  194. Go at your own
    speed

    View Slide

  195. These are the
    good days of our
    programming life

    View Slide

  196. Live them to the
    fullest

    View Slide

  197. View Slide

  198. View Slide

  199. View Slide

  200. View Slide