$30 off During Our Annual Pro Sale. View Details »

High Performance Template Engine

High Performance Template Engine

RubyKaigi 2015
http://rubykaigi.org/2015

High Performance Template Engine
A guide to optimizing your Ruby code

Kohei Suzuki, Takashi Kokubun

Takashi Kokubun

December 11, 2015
Tweet

More Decks by Takashi Kokubun

Other Decks in Programming

Transcript

  1. Kohei Suzuki, Takashi Kokubun
    High Performance Template Engine
    A guide to optimizing your Ruby code

    View Slide

  2. Self introduction
    • Kohei Suzuki
    • @eagletmt
    • Developer Productivity Group,

    Cookpad Inc.
    • Favorite library: pathname

    View Slide

  3. Self introduction
    • Takashi Kokubun
    • @k0kubun
    • Developer Productivity Group,

    Cookpad Inc.
    • Favorite library: ripper

    View Slide

  4. ✤ What is a template engine?
    ✤ Template Engine Examples
    • Template Engine Internals
    • Performance
    • How to optimize Ruby code?
    • What did we do for high performance
    template engines?

    View Slide

  5. What is a Template Engine?
    • Template engines render text (typically HTML)
    by combining data with a template written in
    a template language
    • ERB, Haml, Slim, ...

    View Slide

  6. ERB
    • ERB is a template engine included in the
    Ruby standard library
    <%= @title %>

    <%- @items.each do |item| %>
    <%= item %>
    <% end %>

    View Slide

  7. It works!

    item 1
    item 2
    item 3

    ERB
    • Rendered output

    View Slide

  8. Haml
    %h1{ class: 'title' }= @title
    %ul
    - @items.each do |item|
    %li.item= item
    • Haml is an elegant, structured (X)HTML/XML
    templating engine

    View Slide

  9. Haml
    It works!

    Item 1
    Item 2
    Item 3

    • Rendered output

    View Slide

  10. Slim
    h1 class='title' = @title
    ul
    - @items.each do |item|
    li.item= item
    • Slim is a fast, lightweight template engine for
    Ruby

    View Slide

  11. Slim
    It works!

    Item 1
    Item 2
    Item 3

    • Rendered output

    View Slide

  12. ✤ What is a template engine?
    • Template Engine Examples
    ✤ Template Engine Internals
    • Performance
    • How to optimize Ruby code?
    • What did we do for high performance
    template engines?

    View Slide

  13. Template Engine Internals
    • Template engines compile templates in Ruby
    code
    Template Ruby code
    compile
    %h1 It works!
    _hamlout.push_text(
    "It works!\n"
    , 0, false);

    View Slide

  14. Template Engine Internals
    • Ruby code renders HTML
    Ruby code HTML
    render
    _hamlout.push_text(
    "It works!\n"
    , 0, false);
    It works!

    View Slide

  15. Haml Example
    %a{href: 'http://rubykaigi.org/2015'}
    %a{ href: 'http://rubykaigi.org/2015' }

    View Slide

  16. Haml Example
    %a{href: 'http://rubykaigi.org/2015'}
    _hamlout.push_text(
    "{},
    nil,
    href: 'http://rubykaigi.org/2015'
    )}>\n",
    0,
    false
    );

    View Slide

  17. ✤ What is a template engine?
    • Template Engine Examples
    • Template Engine Internals
    ✤ Performance
    • How to optimize Ruby code?
    • What did we do for high performance
    template engines?

    View Slide

  18. Haml vs Slim
    • Haml has nice syntax, but its implementation
    is not very performant
    • Slim's syntax is not as nice, but it has a great,
    performant implementation

    View Slide

  19. Faster Haml Engine
    • We love Haml language, so we both
    implemented faster Haml engines
    individually
    w IUUQTHJUIVCDPNFBHMFUNUGBNM
    w IUUQTHJUIVCDPNLLVCVOIBNMJU

    View Slide

  20. • What is a template engine?
    ✤ How to optimize Ruby code?
    • What did we do for high performance
    template engines?

    View Slide

  21. Optimize your Ruby code
    • YOUR CODE IS SLOW
    • if you don't know how to write fast code

    View Slide

  22. 3 steps of optimization
    1. Benchmark
    2. Profiling
    3. Improvement

    View Slide

  23. • What is a template engine?
    ✤ How to optimize Ruby code?
    ✤ Benchmark
    • Profiling
    • Improvement
    • What did we do for high performance
    template engines?

    View Slide

  24. Why is benchmarking necessary?
    • To measure performance accurately
    • Profilers have overhead
    • Even if it is fast in the profiler, it may benchmark slow
    • For continuous improvement
    • You can't detect performance regression without
    benchmark

    View Slide

  25. How to benchmark?
    • Use benchmark-ips gem
    • Show a result in an easy-to-understand way
    Rendering of slim/benchmarks with HTML escaped
    hamlit v2.0.1: 122622.3 i/s
    faml v0.7.1: 94239.1 i/s - 1.30x slower
    slim v3.0.6: 89143.0 i/s - 1.38x slower
    erubis v2.7.0: 65047.8 i/s - 1.89x slower
    haml v5.0.0.beta.2: 14363.6 i/s - 8.54x slower

    View Slide

  26. What to measure?
    • Sometimes a problem has a trade-off
    • trade-off between compilation time and
    rendering time
    Rendering of haml/test/templates/standard.haml
    hamlit v2.0.1: 12351.8 i/s (0.081ms)
    faml v0.7.0: 9713.4 i/s (0.103ms) - 1.27x slower
    haml v5.0.0.beta.2: 2296.5 i/s (0.435ms) - 5.38x slower

    View Slide

  27. What to measure?
    • Sometimes a problem has a trade-off
    • trade-off between compilation time and
    rendering time
    Compilation of haml/test/templates/standard.haml
    haml v5.0.0.beta.2: 388.2 i/s (2.576ms)
    hamlit v2.0.1: 193.7 i/s (5.163ms) - 2.00x slower
    faml v0.7.0: 188.0 i/s (5.320ms) - 2.07x slower

    View Slide

  28. • What is a template engine?
    ✤ How to optimize Ruby code?
    • Benchmark
    ✤ Profiling
    • Improvement
    • What did we do for high performance
    template engines?

    View Slide

  29. Fundamental Rule of Optimisation
    • Don't guess, measure
    • It's a waste of time to optimize trivial things
    • The bottleneck may change at any time

    View Slide

  30. Recommended profilers
    • stackprof gem
    • rblineprof gem
    http://rubykaigi.org/2014/presentation/S-AmanGupta
    For detail: RubyKaigi 2014 "Ruby 2.1 in Production"

    View Slide

  31. stackprof usage in Hamlit repo
    • To search the entire stack to find the
    bottlenecks in template compilation
    $ bin/stackprof test/haml/templates/standard.haml
    ==================================
    Mode: wall(1)
    Samples: 8034 (70.35% miss rate)
    GC: 787 (9.80%)
    ==================================
    TOTAL (pct) SAMPLES (pct) FRAME
    498 (6.2%) 498 (6.2%) Temple::Mixins::CompiledDispatcher#disp
    893 (11.1%) 319 (4.0%) Ripper::Lexer#lex
    2999 (37.3%) 237 (2.9%) Hamlit::HTML#dispatcher
    2070 (25.8%) 220 (2.7%) Temple::Filters::ControlFlow#dispatcher
    4600 (57.3%) 189 (2.4%) Hamlit::Escapable#dispatcher
    164 (2.0%) 164 (2.0%) Temple::Mixins::CompiledDispatcher::Dis
    174 (2.2%) 160 (2.0%) block in Temple::ImmutableMap#[]

    View Slide

  32. rblineprof usage in Hamlit repo
    • To find bottlenecks in the compiled
    template code
    $ bin/lineprof test/haml/templates/standard.haml
    [Lineprof] ======================================================================
    /private/var/folders/my/syd7zn_d495dmjm7_y8lqby80000gp/T/
    compiled20151204-39353-9l8fvy
    | 16 ; _hamlit_compiler1 = ( 1 + 9 + 8 + 2 #numbers should work and
    this should be ignored;
    0.2ms 200 | 17 ; ); _buf <<
    (::Hamlit::Utils.escape_html(((_hamlit_compiler1).to_s))); _buf << ("\n\nid='body'> Quotes should be loved! Just like people!\n".freeze);
    57.5ms 100 | 18 ; 120.times do |number|;
    | 19 ; _hamlit_compiler2 = ( number;
    31.5ms 24000 | 20 ; ); _buf <<

    View Slide

  33. • What is a template engine?
    ✤ How to optimize Ruby code?
    • Benchmark
    • Profiling
    ✤ Improvement
    • What did we do for high performance
    template engines?

    View Slide

  34. How to improve
    1. Don't guess, measure (again)
    • Profiler tells you what to optimize
    • Benchmark tells you which code is faster

    View Slide

  35. 2. Profiling
    1. Benchmark
    3. Improvement
    How to improve
    2. Keep this iteration

    View Slide

  36. How to improve
    3. Learn from others
    • We'll show you examples of template
    engine optimization

    View Slide

  37. • What is a template engine?
    • How to optimize Ruby code?
    ✤ What did we do for high performance
    template engines?
    ✤ Faml side
    • Hamlit side

    View Slide

  38. Faml
    • @eagletmt started faml development as a
    complete replacement of haml
    • High compatibility with improved performance
    • Basic ideas for high performance:
    • Follow Slim
    • Perform optimization at compile time

    View Slide

  39. Slim's Benchmark
    Compiled benchmark (i/s)
    0
    20000
    40000
    60000
    80000
    erb slim ugly haml ugly
    https://travis-ci.org/slim-template/slim/jobs/94130074#L188-L195

    View Slide

  40. Why does Slim perform well?
    • Slim uses Temple gem as backend
    • Temple performs generic optimization
    automatically
    • I decided to use Temple as backend
    • https://github.com/judofyr/temple

    View Slide

  41. • Haml generates naive Ruby code
    Haml
    %a{ href: 'http://rubykaigi.org/2015' }
    _hamlout.buffer << "_hamlout.attributes(
    {},
    nil,
    href: 'http://rubykaigi.org/2015'
    )
    }>\n";

    View Slide

  42. Slim
    a href='http://rubykaigi.org/2015'
    _buf = []; _buf <<
    ("".freeze);
    ; _buf = _buf.join
    • Slim generates a static string literal at
    compile time

    View Slide

  43. • What is a template engine?
    • How to optimize Ruby code?
    ✤ What did we do for high performance
    template engines?
    ✤ Faml side
    ✤ Attribute Optimization
    • Faster Runtime Attribute Builder
    • Hamlit side

    View Slide

  44. Static Analysis
    • Haml should also be compiled into static
    string literal like Slim
    • But Ruby parser is required to achieve it
    %a{ href: 'http://rubykaigi.org/2015' }
    %a{ :href=>'http://rubykaigi.org/2015' }
    %a{ 'href'=>'http://rubykaigi.org/2015' }
    %a{ 'href': 'http://rubykaigi.org/2015' }

    View Slide

  45. parser gem
    • https://github.com/whitequark/parser
    • Ruby parser, used by RuboCop, Transpec, ...
    • Easy to use
    • AST with rich source code information

    View Slide

  46. Attribute Optimization
    • Faml categorizes attributes into 3 types by
    parsing Ruby code
    • Static
    • Dynamic
    • Runtime

    View Slide

  47. Static Attribute
    • Both key and value are static
    • Fastest
    • No operations in runtime
    %a{ href: 'http://rubykaigi.org/2015' }

    View Slide

  48. Dynamic Attribute
    %a{ href: url }
    • Key is static, but the value is dynamic
    • Relatively fast
    • Escape url and concat it in runtime

    View Slide

  49. Runtime Attribute
    • Key and value are dynamic
    • Slow
    • Build whole attribute list in runtime
    %a{ key => url }

    View Slide

  50. • Sometimes optimization is impossible
    • Dynamic attributes?
    Multiple line attributes
    %a{ class: 'link',
    href: url }

    View Slide

  51. Multiple line attributes
    %a{ class: 'link',
    href: url }

    View Slide

  52. Line Numbers
    • We have to keep line numbers
    • for correct backtrace
    • (for correct __LINE__ value)

    View Slide

  53. • It have to be compiled as runtime attributes
    Line Numbers
    1 %a{ class: 'link',
    2 href: url }
    1 buf << ("(::Faml::AttributeBuilder.build("'", true, nil, class: 'link',
    2 href: url )); _buf << (">\n".freeze);
    3 ; _buf = _buf.join

    View Slide

  54. Line Numbers
    1 %a{ class: 'link',
    2 href: url }

    View Slide

  55. • What is a template engine?
    • How to optimize Ruby code?
    ✤ What did we do for high performance
    template engines?
    ✤ Faml side
    • Attribute Optimization
    ✤ Faster Runtime Attribute Builder
    • Hamlit side

    View Slide

  56. C extension
    • C is faster than Ruby!
    • If performance is really important, writing C
    extension is a good choice.

    View Slide

  57. C extension
    • I wrote runtime attribute builder in C++
    • Ruby version (before v0.1.0)
    • 41889.8 i/s
    • C++ version (v0.7.1)
    • 90168.6 i/s

    View Slide

  58. In Production
    • Cookpad http://cookpad.com
    • Cookpad Blog https://cookpad-blog.jp
    • Cookpad Video https://cookpad-video.jp

    View Slide

  59. • What is a template engine?
    • How to optimize Ruby code?
    ✤ What did we do for high performance
    template engines?
    • Faml side
    ✤ Hamlit side

    View Slide

  60. Hamlit
    • Designed to defeat Slim
    • I've heard many people said “migrating
    from Haml to Slim because it's faster.”
    • Hamlit means “Haml it” (write it with Haml)

    View Slide

  61. Slim's compiled benchmark with HTML-escaping (i/s)
    0
    35000
    70000
    105000
    140000
    Hamlit Faml Slim Haml
    https://travis-ci.org/k0kubun/hamlit/jobs/93928561#L247-L251
    Hamlit is faster than Slim

    View Slide

  62. Hamlit’s strategy
    • Reduce string allocation and concatenation by:
    • compiling string interpolation
    • dropping unused behaviors

    View Slide

  63. • What is a template engine?
    • How to optimize Ruby code?
    ✤ What did we do for high performance
    template engines?
    • Faml side
    ✤ Hamlit side
    ✤ Compiling string interpolation
    • Dropping unused behaviors

    View Slide

  64. How to compile template?
    • We should care about:
    1. String allocation
    2. String concatenation

    View Slide

  65. 1. String allocation
    • Utilize frozen string literal
    • Thanks to Temple::Generator, static string
    is frozen automatically!
    • Slim, Faml and Hamlit use this

    View Slide

  66. 2. String concatenation
    • String interpolation is fast
    Benchmark.ips do |x|
    x.report("Array#join") { ['hello', 1234].join }
    x.report("interpolation") { "#{'hello'}#{1234}" }
    x.compare!
    end

    View Slide

  67. 2. String concatenation
    • String interpolation is fast
    $ ruby bench.rb
    Comparison:
    interpolation: 1115751.8 i/s
    Array#join: 507283.5 i/s - 2.20x slower

    View Slide

  68. How should we compile
    interpolated String?
    • Suppose that you are a Ruby interpreter,
    what code would be pleasant?
    - year = 2015
    %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

    View Slide

  69. How should we compile
    interpolated String?
    year = 2015
    _hamlout.buffer <<
    ""http://rubykaigi.org/#{year}"
    )}>#{
    "RubyKaigi #{Haml::Helpers.html_escape((year))}"
    }\n";
    Haml
    - year = 2015
    %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

    View Slide

  70. How should we compile
    interpolated String?
    year = 2015
    _hamlout.buffer <<
    ""http://rubykaigi.org/#{year}"
    )}>#{
    "RubyKaigi #{Haml::Helpers.html_escape((year))}"
    }\n";
    Haml
    - year = 2015
    %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

    View Slide

  71. How should we compile
    interpolated String?
    _buf = []; year = 2015;
    ; _buf << ("#{year}"); case (_faml_html1); when true; _buf << ("
    href".freeze); when false, nil; else; _buf << (" href='".freeze);
    _buf << (::Temple::Utils.escape_html((_faml_html1))); _buf <<
    ("'".freeze); end; _buf << (">RubyKaigi ".freeze); _buf <<
    (::Temple::Utils.escape_html((year))); _buf << ("\n".freeze);
    ; _buf = _buf.join
    Faml
    - year = 2015
    %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

    View Slide

  72. How should we compile
    interpolated String?
    _buf = []; year = 2015;
    ; _buf << ("#{year}"); case (_faml_html1); when true; _buf << ("
    href".freeze); when false, nil; else; _buf << (" href='".freeze);
    _buf << (::Temple::Utils.escape_html((_faml_html1))); _buf <<
    ("'".freeze); end; _buf << (">RubyKaigi ".freeze); _buf <<
    (::Temple::Utils.escape_html((year))); _buf << ("\n".freeze);
    ; _buf = _buf.join
    Faml
    - year = 2015
    %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

    View Slide

  73. How should we compile
    interpolated String?
    _buf = []; year = 2015;
    ; _buf << ("RubyKaigi
    ".freeze); _buf << (::Hamlit::Utils.escape_html((year)));
    ; _buf << ("\n".freeze); _buf = _buf.join
    Hamlit
    - year = 2015
    %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

    View Slide

  74. How should we compile
    interpolated String?
    _buf = []; year = 2015;
    ; _buf << ("RubyKaigi
    ".freeze); _buf << (::Hamlit::Utils.escape_html((year)));
    ; _buf << ("\n".freeze); _buf = _buf.join
    Hamlit
    - year = 2015
    %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

    View Slide

  75. How should we compile
    interpolated String?
    Comparison:
    hamlit v2.0.1: 301640.2 i/s
    faml v0.7.1: 199001.5 i/s - 1.52x slower
    haml v5.0.0.beta.2: 14714.4 i/s - 20.50x slower

    View Slide

  76. Tips to write faster code
    • Don't allocate string
    • Reduce string concatenation
    • Fastest way to concatenate string is
    not to concatenate string

    View Slide

  77. • What is a template engine?
    • How to optimize Ruby code?
    ✤ What did we do for high performance
    template engines?
    • Faml side
    ✤ Hamlit side
    • Compiling string interpolation
    ✤ Dropping unused behaviors

    View Slide

  78. Dropping unused behavior
    • Since Haml and Slim have rich syntax and
    behaviors in attributes, rendering attributes
    is a bottleneck
    • In other words, optimization chance

    View Slide

  79. Dropping unused behavior
    • To optimize attribute rendering, Faml and
    Hamlit drop some unused behavior
    • Let's see how they are different!

    View Slide

  80. Dropped features in Hamlit
    • Hamlit supports following features for
    limited attributes
    • Data attribute hyphenation
    • Boolean attribute

    View Slide

  81. Data attribute hyphenation
    • In Haml, nested Hash is expanded with
    hyphen for all attributes


    %div{ foo: { bar: 'baz' } }
    %div{ data: { bar: 'baz' } }
    Haml

    View Slide

  82. • In Faml and Hamlit, data attribute hyphenation
    is supported only for data attribute
    Data attribute hyphenation
    Haml
    Faml, Hamlit




    %div{ foo: { bar: 'baz' } }
    %div{ data: { bar: 'baz' } }

    View Slide

  83. Data attribute hyphenation
    • Hyphenating data attribute is expensive
    • So we dropped it to generate faster code
    in non-data attributes

    View Slide

  84. ; _buf << ("case ((_hamlit_compiler1 = (disabled)));
    when true;
    _buf << (" disabled".freeze);
    when false, nil;
    else;
    _buf << (" disabled='".freeze);
    _buf <<
    (::Hamlit::Utils.escape_html((_hamlit_compiler1)));
    _buf << ("'".freeze);
    end
    ; _buf << (">\n".freeze);
    Data attribute hyphenation
    • No code to hyphenate Hash

    View Slide

  85. Data attribute hyphenation
    https://travis-ci.org/k0kubun/hamlit/jobs/96207038#L257-L260
    - disabled = false
    %input{ disabled: disabled }
    - disabled = true
    %input{ disabled: disabled }
    Comparison:
    hamlit v2.0.1: 819212.4 i/s (0.001ms)
    faml v0.7.1: 614993.4 i/s (0.002ms) - 1.33x slower
    haml v5.0.0.beta.2: 15073.2 i/s (0.066ms) - 54.35x slower
    • Benchmark for non-data attribute

    View Slide

  86. Boolean support
    • Only with Hamlit, non-boolean attributes are
    not deleted by falsey values (nil, false)
    Haml, Faml
    Hamlit
    %a{ href: false }
    %a{ disabled: false }




    View Slide

  87. Boolean support
    • Only with Hamlit, non-boolean attributes are
    not deleted by falsey values (nil, false)
    • It means that Hamlit doesn't need to check
    and concatenate value on runtime for non-
    boolean attributes

    View Slide

  88. Boolean support
    _buf = []; url = 'http://rubykaigi.org/2015';
    ; _buf << ("_faml_html1 = (url);
    case (_faml_html1);
    when true;
    _buf << (" href".freeze);
    when false, nil;
    else;
    _buf << (" href='".freeze);
    _buf << (::Temple::Utils.escape_html((_faml_html1)));
    _buf << ("'".freeze);
    end;
    _buf << (">\n".freeze);
    • Faml compilation for non-boolean attribute

    View Slide

  89. Boolean support
    _buf = []; url = 'http://rubykaigi.org/2015';
    ; _buf << ("\n".freeze);
    _buf = _buf.join
    • Hamlit compilation for non-boolean attribute

    View Slide

  90. Comparison:
    hamlit v2.0.1: 407851.9 i/s (0.002ms)
    faml v0.7.1: 223612.4 i/s (0.004ms) - 1.82x slower
    haml v5.0.0.beta.2: 21823.1 i/s (0.046ms) - 18.69x slower
    Boolean support

    View Slide

  91. But does it really work?
    • Also in Rails tag helpers, false is not
    deleted for non-boolean attributes
    = content_tag :input, '', value: false

    View Slide

  92. • It could pass 20,000+ tests in the World's
    largest Rails application!
    But does it really work?
    https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-largest-rails-monolith

    View Slide

  93. Why Hamlit is the fastest?
    • Faml and Slim has boolean support for all attributes
    • So Hamlit is faster in non-boolean attributes
    • Give up trivial things to make things better!

    View Slide

  94. Comparison of Haml engines
    • Haml
    • Slow and rarely maintained now
    • I sent a patch to replace backend, but not merged
    • Faml
    • Fast and highly compatible
    • Hamlit
    • Fastest and slightly incompatible

    View Slide

  95. Conclusion
    • How to improve performance
    • Benchmark, Profiling, Improvement
    • Real examples of improvements
    • Faml and Hamlit
    • Try our faster Haml engines!

    View Slide