High Performance Template Engine

High Performance Template Engine

RubyKaigi 2015
http://rubykaigi.org/2015

High Performance Template Engine
A guide to optimizing your Ruby code

Kohei Suzuki, Takashi Kokubun

08d5432a5bc31e6d9edec87b94cb1db1?s=128

Takashi Kokubun

December 11, 2015
Tweet

Transcript

  1. Kohei Suzuki, Takashi Kokubun High Performance Template Engine A guide

    to optimizing your Ruby code
  2. Self introduction • Kohei Suzuki • @eagletmt • Developer Productivity

    Group,
 Cookpad Inc. • Favorite library: pathname
  3. Self introduction • Takashi Kokubun • @k0kubun • Developer Productivity

    Group,
 Cookpad Inc. • Favorite library: ripper
  4. ✤ What is a template engine? ✤ Template Engine Examples

    • Template Engine Internals • Performance • How to optimize Ruby code? • What did we do for high performance template engines?
  5. What is a Template Engine? • Template engines render text

    (typically HTML) by combining data with a template written in a template language • ERB, Haml, Slim, ...
  6. ERB • ERB is a template engine included in the

    Ruby standard library <h1 class='title'><%= @title %></h1> <ul> <%- @items.each do |item| %> <li class='item'><%= item %></li> <% end %> </ul>
  7. <h1 class='title'>It works!</h1> <ul> <li class='item'>item 1</li> <li class='item'>item 2</li>

    <li class='item'>item 3</li> </ul> ERB • Rendered output
  8. Haml %h1{ class: 'title' }= @title %ul - @items.each do

    |item| %li.item= item • Haml is an elegant, structured (X)HTML/XML templating engine
  9. Haml <h1 class='title'>It works!</h1> <ul> <li class='item'>Item 1</li> <li class='item'>Item

    2</li> <li class='item'>Item 3</li> </ul> • Rendered output
  10. Slim h1 class='title' = @title ul - @items.each do |item|

    li.item= item • Slim is a fast, lightweight template engine for Ruby
  11. Slim <h1 class='title'>It works!</h1> <ul> <li class='item'>Item 1</li> <li class='item'>Item

    2</li> <li class='item'>Item 3</li> </ul> • Rendered output
  12. ✤ What is a template engine? • Template Engine Examples

    ✤ Template Engine Internals • Performance • How to optimize Ruby code? • What did we do for high performance template engines?
  13. Template Engine Internals • Template engines compile templates in Ruby

    code Template Ruby code compile %h1 It works! _hamlout.push_text( "<h1>It works!</h1>\n" , 0, false);
  14. Template Engine Internals • Ruby code renders HTML Ruby code

    HTML render _hamlout.push_text( "<h1>It works!</h1>\n" , 0, false); <h1>It works!</h1>
  15. Haml Example %a{href: 'http://rubykaigi.org/2015'} %a{ href: 'http://rubykaigi.org/2015' }

  16. Haml Example %a{href: 'http://rubykaigi.org/2015'} _hamlout.push_text( "<a#{_hamlout.attributes( {}, nil, href: 'http://rubykaigi.org/2015'

    )}></a>\n", 0, false );
  17. ✤ What is a template engine? • Template Engine Examples

    • Template Engine Internals ✤ Performance • How to optimize Ruby code? • What did we do for high performance template engines?
  18. Haml vs Slim • Haml has nice syntax, but its

    implementation is not very performant • Slim's syntax is not as nice, but it has a great, performant implementation
  19. Faster Haml Engine • We love Haml language, so we

    both implemented faster Haml engines individually w IUUQTHJUIVCDPNFBHMFUNUGBNM w IUUQTHJUIVCDPNLLVCVOIBNMJU
  20. • What is a template engine? ✤ How to optimize

    Ruby code? • What did we do for high performance template engines?
  21. Optimize your Ruby code • YOUR CODE IS SLOW •

    if you don't know how to write fast code
  22. 3 steps of optimization 1. Benchmark 2. Profiling 3. Improvement

  23. • What is a template engine? ✤ How to optimize

    Ruby code? ✤ Benchmark • Profiling • Improvement • What did we do for high performance template engines?
  24. Why is benchmarking necessary? • To measure performance accurately •

    Profilers have overhead • Even if it is fast in the profiler, it may benchmark slow • For continuous improvement • You can't detect performance regression without benchmark
  25. How to benchmark? • Use benchmark-ips gem • Show a

    result in an easy-to-understand way Rendering of slim/benchmarks with HTML escaped hamlit v2.0.1: 122622.3 i/s faml v0.7.1: 94239.1 i/s - 1.30x slower slim v3.0.6: 89143.0 i/s - 1.38x slower erubis v2.7.0: 65047.8 i/s - 1.89x slower haml v5.0.0.beta.2: 14363.6 i/s - 8.54x slower
  26. What to measure? • Sometimes a problem has a trade-off

    • trade-off between compilation time and rendering time Rendering of haml/test/templates/standard.haml hamlit v2.0.1: 12351.8 i/s (0.081ms) faml v0.7.0: 9713.4 i/s (0.103ms) - 1.27x slower haml v5.0.0.beta.2: 2296.5 i/s (0.435ms) - 5.38x slower
  27. What to measure? • Sometimes a problem has a trade-off

    • trade-off between compilation time and rendering time Compilation of haml/test/templates/standard.haml haml v5.0.0.beta.2: 388.2 i/s (2.576ms) hamlit v2.0.1: 193.7 i/s (5.163ms) - 2.00x slower faml v0.7.0: 188.0 i/s (5.320ms) - 2.07x slower
  28. • What is a template engine? ✤ How to optimize

    Ruby code? • Benchmark ✤ Profiling • Improvement • What did we do for high performance template engines?
  29. Fundamental Rule of Optimisation • Don't guess, measure • It's

    a waste of time to optimize trivial things • The bottleneck may change at any time
  30. Recommended profilers • stackprof gem • rblineprof gem http://rubykaigi.org/2014/presentation/S-AmanGupta For

    detail: RubyKaigi 2014 "Ruby 2.1 in Production"
  31. stackprof usage in Hamlit repo • To search the entire

    stack to find the bottlenecks in template compilation $ bin/stackprof test/haml/templates/standard.haml ================================== Mode: wall(1) Samples: 8034 (70.35% miss rate) GC: 787 (9.80%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 498 (6.2%) 498 (6.2%) Temple::Mixins::CompiledDispatcher#disp 893 (11.1%) 319 (4.0%) Ripper::Lexer#lex 2999 (37.3%) 237 (2.9%) Hamlit::HTML#dispatcher 2070 (25.8%) 220 (2.7%) Temple::Filters::ControlFlow#dispatcher 4600 (57.3%) 189 (2.4%) Hamlit::Escapable#dispatcher 164 (2.0%) 164 (2.0%) Temple::Mixins::CompiledDispatcher::Dis 174 (2.2%) 160 (2.0%) block in Temple::ImmutableMap#[]
  32. rblineprof usage in Hamlit repo • To find bottlenecks in

    the compiled template code $ bin/lineprof test/haml/templates/standard.haml [Lineprof] ====================================================================== /private/var/folders/my/syd7zn_d495dmjm7_y8lqby80000gp/T/ compiled20151204-39353-9l8fvy | 16 ; _hamlit_compiler1 = ( 1 + 9 + 8 + 2 #numbers should work and this should be ignored; 0.2ms 200 | 17 ; ); _buf << (::Hamlit::Utils.escape_html(((_hamlit_compiler1).to_s))); _buf << ("\n</div>\n<div id='body'> Quotes should be loved! Just like people!</div>\n".freeze); 57.5ms 100 | 18 ; 120.times do |number|; | 19 ; _hamlit_compiler2 = ( number; 31.5ms 24000 | 20 ; ); _buf <<
  33. • What is a template engine? ✤ How to optimize

    Ruby code? • Benchmark • Profiling ✤ Improvement • What did we do for high performance template engines?
  34. How to improve 1. Don't guess, measure (again) • Profiler

    tells you what to optimize • Benchmark tells you which code is faster
  35. 2. Profiling 1. Benchmark 3. Improvement How to improve 2.

    Keep this iteration
  36. How to improve 3. Learn from others • We'll show

    you examples of template engine optimization
  37. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? ✤ Faml side • Hamlit side
  38. Faml • @eagletmt started faml development as a complete replacement

    of haml • High compatibility with improved performance • Basic ideas for high performance: • Follow Slim • Perform optimization at compile time
  39. Slim's Benchmark Compiled benchmark (i/s) 0 20000 40000 60000 80000

    erb slim ugly haml ugly https://travis-ci.org/slim-template/slim/jobs/94130074#L188-L195
  40. Why does Slim perform well? • Slim uses Temple gem

    as backend • Temple performs generic optimization automatically • I decided to use Temple as backend • https://github.com/judofyr/temple
  41. • Haml generates naive Ruby code Haml %a{ href: 'http://rubykaigi.org/2015'

    } _hamlout.buffer << "<a#{ _hamlout.attributes( {}, nil, href: 'http://rubykaigi.org/2015' ) }></a>\n";
  42. Slim a href='http://rubykaigi.org/2015' _buf = []; _buf << ("<a href=\"http://rubykaigi.org/2015\"></a>".freeze);

    ; _buf = _buf.join • Slim generates a static string literal at compile time
  43. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? ✤ Faml side ✤ Attribute Optimization • Faster Runtime Attribute Builder • Hamlit side
  44. Static Analysis • Haml should also be compiled into static

    string literal like Slim • But Ruby parser is required to achieve it %a{ href: 'http://rubykaigi.org/2015' } %a{ :href=>'http://rubykaigi.org/2015' } %a{ 'href'=>'http://rubykaigi.org/2015' } %a{ 'href': 'http://rubykaigi.org/2015' }
  45. parser gem • https://github.com/whitequark/parser • Ruby parser, used by RuboCop,

    Transpec, ... • Easy to use • AST with rich source code information
  46. Attribute Optimization • Faml categorizes attributes into 3 types by

    parsing Ruby code • Static • Dynamic • Runtime
  47. Static Attribute • Both key and value are static •

    Fastest • No operations in runtime %a{ href: 'http://rubykaigi.org/2015' } <a href='http://rubykaigi.org/2015'></a>
  48. Dynamic Attribute %a{ href: url } • Key is static,

    but the value is dynamic • Relatively fast • Escape url and concat it in runtime <a href='http://rubykaigi.org/2015'></a>
  49. Runtime Attribute • Key and value are dynamic • Slow

    • Build whole attribute list in runtime %a{ key => url } <a href='http://rubykaigi.org/2015'></a>
  50. • Sometimes optimization is impossible • Dynamic attributes? Multiple line

    attributes %a{ class: 'link', href: url }
  51. Multiple line attributes %a{ class: 'link', href: url }

  52. Line Numbers • We have to keep line numbers •

    for correct backtrace • (for correct __LINE__ value)
  53. • It have to be compiled as runtime attributes Line

    Numbers 1 %a{ class: 'link', 2 href: url } 1 buf << ("<a".freeze); _buf << (::Faml::AttributeBuilder.build("'", true, nil, class: 'link', 2 href: url )); _buf << ("></a>\n".freeze); 3 ; _buf = _buf.join
  54. Line Numbers 1 %a{ class: 'link', 2 href: url }

  55. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? ✤ Faml side • Attribute Optimization ✤ Faster Runtime Attribute Builder • Hamlit side
  56. C extension • C is faster than Ruby! • If

    performance is really important, writing C extension is a good choice.
  57. C extension • I wrote runtime attribute builder in C++

    • Ruby version (before v0.1.0) • 41889.8 i/s • C++ version (v0.7.1) • 90168.6 i/s
  58. In Production • Cookpad http://cookpad.com • Cookpad Blog https://cookpad-blog.jp •

    Cookpad Video https://cookpad-video.jp
  59. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? • Faml side ✤ Hamlit side
  60. Hamlit • Designed to defeat Slim • I've heard many

    people said “migrating from Haml to Slim because it's faster.” • Hamlit means “Haml it” (write it with Haml)
  61. Slim's compiled benchmark with HTML-escaping (i/s) 0 35000 70000 105000

    140000 Hamlit Faml Slim Haml https://travis-ci.org/k0kubun/hamlit/jobs/93928561#L247-L251 Hamlit is faster than Slim
  62. Hamlit’s strategy • Reduce string allocation and concatenation by: •

    compiling string interpolation • dropping unused behaviors
  63. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? • Faml side ✤ Hamlit side ✤ Compiling string interpolation • Dropping unused behaviors
  64. How to compile template? • We should care about: 1.

    String allocation 2. String concatenation
  65. 1. String allocation • Utilize frozen string literal • Thanks

    to Temple::Generator, static string is frozen automatically! • Slim, Faml and Hamlit use this
  66. 2. String concatenation • String interpolation is fast Benchmark.ips do

    |x| x.report("Array#join") { ['hello', 1234].join } x.report("interpolation") { "#{'hello'}#{1234}" } x.compare! end
  67. 2. String concatenation • String interpolation is fast $ ruby

    bench.rb Comparison: interpolation: 1115751.8 i/s Array#join: 507283.5 i/s - 2.20x slower
  68. How should we compile interpolated String? • Suppose that you

    are a Ruby interpreter, what code would be pleasant? - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  69. How should we compile interpolated String? year = 2015 _hamlout.buffer

    << "<a#{_hamlout.attributes({}, nil, href: "http://rubykaigi.org/#{year}" )}>#{ "RubyKaigi #{Haml::Helpers.html_escape((year))}" }</a>\n"; Haml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  70. How should we compile interpolated String? year = 2015 _hamlout.buffer

    << "<a#{_hamlout.attributes({}, nil, href: "http://rubykaigi.org/#{year}" )}>#{ "RubyKaigi #{Haml::Helpers.html_escape((year))}" }</a>\n"; Haml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  71. How should we compile interpolated String? _buf = []; year

    = 2015; ; _buf << ("<a".freeze); _faml_html1 = ("http://rubykaigi.org/ #{year}"); case (_faml_html1); when true; _buf << (" href".freeze); when false, nil; else; _buf << (" href='".freeze); _buf << (::Temple::Utils.escape_html((_faml_html1))); _buf << ("'".freeze); end; _buf << (">RubyKaigi ".freeze); _buf << (::Temple::Utils.escape_html((year))); _buf << ("</a>\n".freeze); ; _buf = _buf.join Faml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  72. How should we compile interpolated String? _buf = []; year

    = 2015; ; _buf << ("<a".freeze); _faml_html1 = ("http://rubykaigi.org/ #{year}"); case (_faml_html1); when true; _buf << (" href".freeze); when false, nil; else; _buf << (" href='".freeze); _buf << (::Temple::Utils.escape_html((_faml_html1))); _buf << ("'".freeze); end; _buf << (">RubyKaigi ".freeze); _buf << (::Temple::Utils.escape_html((year))); _buf << ("</a>\n".freeze); ; _buf = _buf.join Faml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  73. How should we compile interpolated String? _buf = []; year

    = 2015; ; _buf << ("<a href='http://rubykaigi.org/".freeze); _buf << (::Hamlit::Utils.escape_html((year))); _buf << ("'>RubyKaigi ".freeze); _buf << (::Hamlit::Utils.escape_html((year))); ; _buf << ("</a>\n".freeze); _buf = _buf.join Hamlit - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  74. How should we compile interpolated String? _buf = []; year

    = 2015; ; _buf << ("<a href='http://rubykaigi.org/".freeze); _buf << (::Hamlit::Utils.escape_html((year))); _buf << ("'>RubyKaigi ".freeze); _buf << (::Hamlit::Utils.escape_html((year))); ; _buf << ("</a>\n".freeze); _buf = _buf.join Hamlit - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  75. How should we compile interpolated String? Comparison: hamlit v2.0.1: 301640.2

    i/s faml v0.7.1: 199001.5 i/s - 1.52x slower haml v5.0.0.beta.2: 14714.4 i/s - 20.50x slower
  76. Tips to write faster code • Don't allocate string •

    Reduce string concatenation • Fastest way to concatenate string is not to concatenate string
  77. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? • Faml side ✤ Hamlit side • Compiling string interpolation ✤ Dropping unused behaviors
  78. Dropping unused behavior • Since Haml and Slim have rich

    syntax and behaviors in attributes, rendering attributes is a bottleneck • In other words, optimization chance
  79. Dropping unused behavior • To optimize attribute rendering, Faml and

    Hamlit drop some unused behavior • Let's see how they are different!
  80. Dropped features in Hamlit • Hamlit supports following features for

    limited attributes • Data attribute hyphenation • Boolean attribute
  81. Data attribute hyphenation • In Haml, nested Hash is expanded

    with hyphen for all attributes <div foo-bar='baz'></div> <div data-bar='baz'></div> %div{ foo: { bar: 'baz' } } %div{ data: { bar: 'baz' } } Haml
  82. • In Faml and Hamlit, data attribute hyphenation is supported

    only for data attribute Data attribute hyphenation Haml Faml, Hamlit <div foo-bar='baz'></div> <div data-bar='baz'></div> <div foo='{:bar=&gt;&quot;baz&quot;}'></div> <div data-bar='baz'></div> %div{ foo: { bar: 'baz' } } %div{ data: { bar: 'baz' } }
  83. Data attribute hyphenation • Hyphenating data attribute is expensive •

    So we dropped it to generate faster code in non-data attributes
  84. ; _buf << ("<input".freeze); case ((_hamlit_compiler1 = (disabled))); when true;

    _buf << (" disabled".freeze); when false, nil; else; _buf << (" disabled='".freeze); _buf << (::Hamlit::Utils.escape_html((_hamlit_compiler1))); _buf << ("'".freeze); end ; _buf << (">\n".freeze); Data attribute hyphenation • No code to hyphenate Hash
  85. Data attribute hyphenation https://travis-ci.org/k0kubun/hamlit/jobs/96207038#L257-L260 - disabled = false %input{ disabled:

    disabled } - disabled = true %input{ disabled: disabled } Comparison: hamlit v2.0.1: 819212.4 i/s (0.001ms) faml v0.7.1: 614993.4 i/s (0.002ms) - 1.33x slower haml v5.0.0.beta.2: 15073.2 i/s (0.066ms) - 54.35x slower • Benchmark for non-data attribute
  86. Boolean support • Only with Hamlit, non-boolean attributes are not

    deleted by falsey values (nil, false) Haml, Faml Hamlit %a{ href: false } %a{ disabled: false } <a></a> <a></a> <a href=''></a> <a></a>
  87. Boolean support • Only with Hamlit, non-boolean attributes are not

    deleted by falsey values (nil, false) • It means that Hamlit doesn't need to check and concatenate value on runtime for non- boolean attributes
  88. Boolean support _buf = []; url = 'http://rubykaigi.org/2015'; ; _buf

    << ("<a".freeze); _faml_html1 = (url); case (_faml_html1); when true; _buf << (" href".freeze); when false, nil; else; _buf << (" href='".freeze); _buf << (::Temple::Utils.escape_html((_faml_html1))); _buf << ("'".freeze); end; _buf << ("></a>\n".freeze); • Faml compilation for non-boolean attribute
  89. Boolean support _buf = []; url = 'http://rubykaigi.org/2015'; ; _buf

    << ("<a href='".freeze); _buf << (::Hamlit::Utils.escape_html((url))); _buf << ("'></a>\n".freeze); _buf = _buf.join • Hamlit compilation for non-boolean attribute
  90. Comparison: hamlit v2.0.1: 407851.9 i/s (0.002ms) faml v0.7.1: 223612.4 i/s

    (0.004ms) - 1.82x slower haml v5.0.0.beta.2: 21823.1 i/s (0.046ms) - 18.69x slower Boolean support
  91. But does it really work? • Also in Rails tag

    helpers, false is not deleted for non-boolean attributes = content_tag :input, '', value: false <input value='false'></input>
  92. • It could pass 20,000+ tests in the World's largest

    Rails application! But does it really work? https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-largest-rails-monolith
  93. Why Hamlit is the fastest? • Faml and Slim has

    boolean support for all attributes • So Hamlit is faster in non-boolean attributes • Give up trivial things to make things better!
  94. Comparison of Haml engines • Haml • Slow and rarely

    maintained now • I sent a patch to replace backend, but not merged • Faml • Fast and highly compatible • Hamlit • Fastest and slightly incompatible
  95. Conclusion • How to improve performance • Benchmark, Profiling, Improvement

    • Real examples of improvements • Faml and Hamlit • Try our faster Haml engines!