Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Performance Template Engine

High Performance Template Engine

RubyKaigi 2015
http://rubykaigi.org/2015

High Performance Template Engine
A guide to optimizing your Ruby code

Kohei Suzuki, Takashi Kokubun

Takashi Kokubun

December 11, 2015
Tweet

More Decks by Takashi Kokubun

Other Decks in Programming

Transcript

  1. Kohei Suzuki, Takashi Kokubun High Performance Template Engine A guide

    to optimizing your Ruby code
  2. Self introduction • Kohei Suzuki • @eagletmt • Developer Productivity

    Group,
 Cookpad Inc. • Favorite library: pathname
  3. Self introduction • Takashi Kokubun • @k0kubun • Developer Productivity

    Group,
 Cookpad Inc. • Favorite library: ripper
  4. ✤ What is a template engine? ✤ Template Engine Examples

    • Template Engine Internals • Performance • How to optimize Ruby code? • What did we do for high performance template engines?
  5. What is a Template Engine? • Template engines render text

    (typically HTML) by combining data with a template written in a template language • ERB, Haml, Slim, ...
  6. ERB • ERB is a template engine included in the

    Ruby standard library <h1 class='title'><%= @title %></h1> <ul> <%- @items.each do |item| %> <li class='item'><%= item %></li> <% end %> </ul>
  7. <h1 class='title'>It works!</h1> <ul> <li class='item'>item 1</li> <li class='item'>item 2</li>

    <li class='item'>item 3</li> </ul> ERB • Rendered output
  8. Haml %h1{ class: 'title' }= @title %ul - @items.each do

    |item| %li.item= item • Haml is an elegant, structured (X)HTML/XML templating engine
  9. Haml <h1 class='title'>It works!</h1> <ul> <li class='item'>Item 1</li> <li class='item'>Item

    2</li> <li class='item'>Item 3</li> </ul> • Rendered output
  10. Slim h1 class='title' = @title ul - @items.each do |item|

    li.item= item • Slim is a fast, lightweight template engine for Ruby
  11. Slim <h1 class='title'>It works!</h1> <ul> <li class='item'>Item 1</li> <li class='item'>Item

    2</li> <li class='item'>Item 3</li> </ul> • Rendered output
  12. ✤ What is a template engine? • Template Engine Examples

    ✤ Template Engine Internals • Performance • How to optimize Ruby code? • What did we do for high performance template engines?
  13. Template Engine Internals • Template engines compile templates in Ruby

    code Template Ruby code compile %h1 It works! _hamlout.push_text( "<h1>It works!</h1>\n" , 0, false);
  14. Template Engine Internals • Ruby code renders HTML Ruby code

    HTML render _hamlout.push_text( "<h1>It works!</h1>\n" , 0, false); <h1>It works!</h1>
  15. Haml Example %a{href: 'http://rubykaigi.org/2015'} %a{ href: 'http://rubykaigi.org/2015' }

  16. Haml Example %a{href: 'http://rubykaigi.org/2015'} _hamlout.push_text( "<a#{_hamlout.attributes( {}, nil, href: 'http://rubykaigi.org/2015'

    )}></a>\n", 0, false );
  17. ✤ What is a template engine? • Template Engine Examples

    • Template Engine Internals ✤ Performance • How to optimize Ruby code? • What did we do for high performance template engines?
  18. Haml vs Slim • Haml has nice syntax, but its

    implementation is not very performant • Slim's syntax is not as nice, but it has a great, performant implementation
  19. Faster Haml Engine • We love Haml language, so we

    both implemented faster Haml engines individually w IUUQTHJUIVCDPNFBHMFUNUGBNM w IUUQTHJUIVCDPNLLVCVOIBNMJU
  20. • What is a template engine? ✤ How to optimize

    Ruby code? • What did we do for high performance template engines?
  21. Optimize your Ruby code • YOUR CODE IS SLOW •

    if you don't know how to write fast code
  22. 3 steps of optimization 1. Benchmark 2. Profiling 3. Improvement

  23. • What is a template engine? ✤ How to optimize

    Ruby code? ✤ Benchmark • Profiling • Improvement • What did we do for high performance template engines?
  24. Why is benchmarking necessary? • To measure performance accurately •

    Profilers have overhead • Even if it is fast in the profiler, it may benchmark slow • For continuous improvement • You can't detect performance regression without benchmark
  25. How to benchmark? • Use benchmark-ips gem • Show a

    result in an easy-to-understand way Rendering of slim/benchmarks with HTML escaped hamlit v2.0.1: 122622.3 i/s faml v0.7.1: 94239.1 i/s - 1.30x slower slim v3.0.6: 89143.0 i/s - 1.38x slower erubis v2.7.0: 65047.8 i/s - 1.89x slower haml v5.0.0.beta.2: 14363.6 i/s - 8.54x slower
  26. What to measure? • Sometimes a problem has a trade-off

    • trade-off between compilation time and rendering time Rendering of haml/test/templates/standard.haml hamlit v2.0.1: 12351.8 i/s (0.081ms) faml v0.7.0: 9713.4 i/s (0.103ms) - 1.27x slower haml v5.0.0.beta.2: 2296.5 i/s (0.435ms) - 5.38x slower
  27. What to measure? • Sometimes a problem has a trade-off

    • trade-off between compilation time and rendering time Compilation of haml/test/templates/standard.haml haml v5.0.0.beta.2: 388.2 i/s (2.576ms) hamlit v2.0.1: 193.7 i/s (5.163ms) - 2.00x slower faml v0.7.0: 188.0 i/s (5.320ms) - 2.07x slower
  28. • What is a template engine? ✤ How to optimize

    Ruby code? • Benchmark ✤ Profiling • Improvement • What did we do for high performance template engines?
  29. Fundamental Rule of Optimisation • Don't guess, measure • It's

    a waste of time to optimize trivial things • The bottleneck may change at any time
  30. Recommended profilers • stackprof gem • rblineprof gem http://rubykaigi.org/2014/presentation/S-AmanGupta For

    detail: RubyKaigi 2014 "Ruby 2.1 in Production"
  31. stackprof usage in Hamlit repo • To search the entire

    stack to find the bottlenecks in template compilation $ bin/stackprof test/haml/templates/standard.haml ================================== Mode: wall(1) Samples: 8034 (70.35% miss rate) GC: 787 (9.80%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 498 (6.2%) 498 (6.2%) Temple::Mixins::CompiledDispatcher#disp 893 (11.1%) 319 (4.0%) Ripper::Lexer#lex 2999 (37.3%) 237 (2.9%) Hamlit::HTML#dispatcher 2070 (25.8%) 220 (2.7%) Temple::Filters::ControlFlow#dispatcher 4600 (57.3%) 189 (2.4%) Hamlit::Escapable#dispatcher 164 (2.0%) 164 (2.0%) Temple::Mixins::CompiledDispatcher::Dis 174 (2.2%) 160 (2.0%) block in Temple::ImmutableMap#[]
  32. rblineprof usage in Hamlit repo • To find bottlenecks in

    the compiled template code $ bin/lineprof test/haml/templates/standard.haml [Lineprof] ====================================================================== /private/var/folders/my/syd7zn_d495dmjm7_y8lqby80000gp/T/ compiled20151204-39353-9l8fvy | 16 ; _hamlit_compiler1 = ( 1 + 9 + 8 + 2 #numbers should work and this should be ignored; 0.2ms 200 | 17 ; ); _buf << (::Hamlit::Utils.escape_html(((_hamlit_compiler1).to_s))); _buf << ("\n</div>\n<div id='body'> Quotes should be loved! Just like people!</div>\n".freeze); 57.5ms 100 | 18 ; 120.times do |number|; | 19 ; _hamlit_compiler2 = ( number; 31.5ms 24000 | 20 ; ); _buf <<
  33. • What is a template engine? ✤ How to optimize

    Ruby code? • Benchmark • Profiling ✤ Improvement • What did we do for high performance template engines?
  34. How to improve 1. Don't guess, measure (again) • Profiler

    tells you what to optimize • Benchmark tells you which code is faster
  35. 2. Profiling 1. Benchmark 3. Improvement How to improve 2.

    Keep this iteration
  36. How to improve 3. Learn from others • We'll show

    you examples of template engine optimization
  37. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? ✤ Faml side • Hamlit side
  38. Faml • @eagletmt started faml development as a complete replacement

    of haml • High compatibility with improved performance • Basic ideas for high performance: • Follow Slim • Perform optimization at compile time
  39. Slim's Benchmark Compiled benchmark (i/s) 0 20000 40000 60000 80000

    erb slim ugly haml ugly https://travis-ci.org/slim-template/slim/jobs/94130074#L188-L195
  40. Why does Slim perform well? • Slim uses Temple gem

    as backend • Temple performs generic optimization automatically • I decided to use Temple as backend • https://github.com/judofyr/temple
  41. • Haml generates naive Ruby code Haml %a{ href: 'http://rubykaigi.org/2015'

    } _hamlout.buffer << "<a#{ _hamlout.attributes( {}, nil, href: 'http://rubykaigi.org/2015' ) }></a>\n";
  42. Slim a href='http://rubykaigi.org/2015' _buf = []; _buf << ("<a href=\"http://rubykaigi.org/2015\"></a>".freeze);

    ; _buf = _buf.join • Slim generates a static string literal at compile time
  43. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? ✤ Faml side ✤ Attribute Optimization • Faster Runtime Attribute Builder • Hamlit side
  44. Static Analysis • Haml should also be compiled into static

    string literal like Slim • But Ruby parser is required to achieve it %a{ href: 'http://rubykaigi.org/2015' } %a{ :href=>'http://rubykaigi.org/2015' } %a{ 'href'=>'http://rubykaigi.org/2015' } %a{ 'href': 'http://rubykaigi.org/2015' }
  45. parser gem • https://github.com/whitequark/parser • Ruby parser, used by RuboCop,

    Transpec, ... • Easy to use • AST with rich source code information
  46. Attribute Optimization • Faml categorizes attributes into 3 types by

    parsing Ruby code • Static • Dynamic • Runtime
  47. Static Attribute • Both key and value are static •

    Fastest • No operations in runtime %a{ href: 'http://rubykaigi.org/2015' } <a href='http://rubykaigi.org/2015'></a>
  48. Dynamic Attribute %a{ href: url } • Key is static,

    but the value is dynamic • Relatively fast • Escape url and concat it in runtime <a href='http://rubykaigi.org/2015'></a>
  49. Runtime Attribute • Key and value are dynamic • Slow

    • Build whole attribute list in runtime %a{ key => url } <a href='http://rubykaigi.org/2015'></a>
  50. • Sometimes optimization is impossible • Dynamic attributes? Multiple line

    attributes %a{ class: 'link', href: url }
  51. Multiple line attributes %a{ class: 'link', href: url }

  52. Line Numbers • We have to keep line numbers •

    for correct backtrace • (for correct __LINE__ value)
  53. • It have to be compiled as runtime attributes Line

    Numbers 1 %a{ class: 'link', 2 href: url } 1 buf << ("<a".freeze); _buf << (::Faml::AttributeBuilder.build("'", true, nil, class: 'link', 2 href: url )); _buf << ("></a>\n".freeze); 3 ; _buf = _buf.join
  54. Line Numbers 1 %a{ class: 'link', 2 href: url }

  55. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? ✤ Faml side • Attribute Optimization ✤ Faster Runtime Attribute Builder • Hamlit side
  56. C extension • C is faster than Ruby! • If

    performance is really important, writing C extension is a good choice.
  57. C extension • I wrote runtime attribute builder in C++

    • Ruby version (before v0.1.0) • 41889.8 i/s • C++ version (v0.7.1) • 90168.6 i/s
  58. In Production • Cookpad http://cookpad.com • Cookpad Blog https://cookpad-blog.jp •

    Cookpad Video https://cookpad-video.jp
  59. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? • Faml side ✤ Hamlit side
  60. Hamlit • Designed to defeat Slim • I've heard many

    people said “migrating from Haml to Slim because it's faster.” • Hamlit means “Haml it” (write it with Haml)
  61. Slim's compiled benchmark with HTML-escaping (i/s) 0 35000 70000 105000

    140000 Hamlit Faml Slim Haml https://travis-ci.org/k0kubun/hamlit/jobs/93928561#L247-L251 Hamlit is faster than Slim
  62. Hamlit’s strategy • Reduce string allocation and concatenation by: •

    compiling string interpolation • dropping unused behaviors
  63. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? • Faml side ✤ Hamlit side ✤ Compiling string interpolation • Dropping unused behaviors
  64. How to compile template? • We should care about: 1.

    String allocation 2. String concatenation
  65. 1. String allocation • Utilize frozen string literal • Thanks

    to Temple::Generator, static string is frozen automatically! • Slim, Faml and Hamlit use this
  66. 2. String concatenation • String interpolation is fast Benchmark.ips do

    |x| x.report("Array#join") { ['hello', 1234].join } x.report("interpolation") { "#{'hello'}#{1234}" } x.compare! end
  67. 2. String concatenation • String interpolation is fast $ ruby

    bench.rb Comparison: interpolation: 1115751.8 i/s Array#join: 507283.5 i/s - 2.20x slower
  68. How should we compile interpolated String? • Suppose that you

    are a Ruby interpreter, what code would be pleasant? - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  69. How should we compile interpolated String? year = 2015 _hamlout.buffer

    << "<a#{_hamlout.attributes({}, nil, href: "http://rubykaigi.org/#{year}" )}>#{ "RubyKaigi #{Haml::Helpers.html_escape((year))}" }</a>\n"; Haml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  70. How should we compile interpolated String? year = 2015 _hamlout.buffer

    << "<a#{_hamlout.attributes({}, nil, href: "http://rubykaigi.org/#{year}" )}>#{ "RubyKaigi #{Haml::Helpers.html_escape((year))}" }</a>\n"; Haml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  71. How should we compile interpolated String? _buf = []; year

    = 2015; ; _buf << ("<a".freeze); _faml_html1 = ("http://rubykaigi.org/ #{year}"); case (_faml_html1); when true; _buf << (" href".freeze); when false, nil; else; _buf << (" href='".freeze); _buf << (::Temple::Utils.escape_html((_faml_html1))); _buf << ("'".freeze); end; _buf << (">RubyKaigi ".freeze); _buf << (::Temple::Utils.escape_html((year))); _buf << ("</a>\n".freeze); ; _buf = _buf.join Faml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  72. How should we compile interpolated String? _buf = []; year

    = 2015; ; _buf << ("<a".freeze); _faml_html1 = ("http://rubykaigi.org/ #{year}"); case (_faml_html1); when true; _buf << (" href".freeze); when false, nil; else; _buf << (" href='".freeze); _buf << (::Temple::Utils.escape_html((_faml_html1))); _buf << ("'".freeze); end; _buf << (">RubyKaigi ".freeze); _buf << (::Temple::Utils.escape_html((year))); _buf << ("</a>\n".freeze); ; _buf = _buf.join Faml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  73. How should we compile interpolated String? _buf = []; year

    = 2015; ; _buf << ("<a href='http://rubykaigi.org/".freeze); _buf << (::Hamlit::Utils.escape_html((year))); _buf << ("'>RubyKaigi ".freeze); _buf << (::Hamlit::Utils.escape_html((year))); ; _buf << ("</a>\n".freeze); _buf = _buf.join Hamlit - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  74. How should we compile interpolated String? _buf = []; year

    = 2015; ; _buf << ("<a href='http://rubykaigi.org/".freeze); _buf << (::Hamlit::Utils.escape_html((year))); _buf << ("'>RubyKaigi ".freeze); _buf << (::Hamlit::Utils.escape_html((year))); ; _buf << ("</a>\n".freeze); _buf = _buf.join Hamlit - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}
  75. How should we compile interpolated String? Comparison: hamlit v2.0.1: 301640.2

    i/s faml v0.7.1: 199001.5 i/s - 1.52x slower haml v5.0.0.beta.2: 14714.4 i/s - 20.50x slower
  76. Tips to write faster code • Don't allocate string •

    Reduce string concatenation • Fastest way to concatenate string is not to concatenate string
  77. • What is a template engine? • How to optimize

    Ruby code? ✤ What did we do for high performance template engines? • Faml side ✤ Hamlit side • Compiling string interpolation ✤ Dropping unused behaviors
  78. Dropping unused behavior • Since Haml and Slim have rich

    syntax and behaviors in attributes, rendering attributes is a bottleneck • In other words, optimization chance
  79. Dropping unused behavior • To optimize attribute rendering, Faml and

    Hamlit drop some unused behavior • Let's see how they are different!
  80. Dropped features in Hamlit • Hamlit supports following features for

    limited attributes • Data attribute hyphenation • Boolean attribute
  81. Data attribute hyphenation • In Haml, nested Hash is expanded

    with hyphen for all attributes <div foo-bar='baz'></div> <div data-bar='baz'></div> %div{ foo: { bar: 'baz' } } %div{ data: { bar: 'baz' } } Haml
  82. • In Faml and Hamlit, data attribute hyphenation is supported

    only for data attribute Data attribute hyphenation Haml Faml, Hamlit <div foo-bar='baz'></div> <div data-bar='baz'></div> <div foo='{:bar=&gt;&quot;baz&quot;}'></div> <div data-bar='baz'></div> %div{ foo: { bar: 'baz' } } %div{ data: { bar: 'baz' } }
  83. Data attribute hyphenation • Hyphenating data attribute is expensive •

    So we dropped it to generate faster code in non-data attributes
  84. ; _buf << ("<input".freeze); case ((_hamlit_compiler1 = (disabled))); when true;

    _buf << (" disabled".freeze); when false, nil; else; _buf << (" disabled='".freeze); _buf << (::Hamlit::Utils.escape_html((_hamlit_compiler1))); _buf << ("'".freeze); end ; _buf << (">\n".freeze); Data attribute hyphenation • No code to hyphenate Hash
  85. Data attribute hyphenation https://travis-ci.org/k0kubun/hamlit/jobs/96207038#L257-L260 - disabled = false %input{ disabled:

    disabled } - disabled = true %input{ disabled: disabled } Comparison: hamlit v2.0.1: 819212.4 i/s (0.001ms) faml v0.7.1: 614993.4 i/s (0.002ms) - 1.33x slower haml v5.0.0.beta.2: 15073.2 i/s (0.066ms) - 54.35x slower • Benchmark for non-data attribute
  86. Boolean support • Only with Hamlit, non-boolean attributes are not

    deleted by falsey values (nil, false) Haml, Faml Hamlit %a{ href: false } %a{ disabled: false } <a></a> <a></a> <a href=''></a> <a></a>
  87. Boolean support • Only with Hamlit, non-boolean attributes are not

    deleted by falsey values (nil, false) • It means that Hamlit doesn't need to check and concatenate value on runtime for non- boolean attributes
  88. Boolean support _buf = []; url = 'http://rubykaigi.org/2015'; ; _buf

    << ("<a".freeze); _faml_html1 = (url); case (_faml_html1); when true; _buf << (" href".freeze); when false, nil; else; _buf << (" href='".freeze); _buf << (::Temple::Utils.escape_html((_faml_html1))); _buf << ("'".freeze); end; _buf << ("></a>\n".freeze); • Faml compilation for non-boolean attribute
  89. Boolean support _buf = []; url = 'http://rubykaigi.org/2015'; ; _buf

    << ("<a href='".freeze); _buf << (::Hamlit::Utils.escape_html((url))); _buf << ("'></a>\n".freeze); _buf = _buf.join • Hamlit compilation for non-boolean attribute
  90. Comparison: hamlit v2.0.1: 407851.9 i/s (0.002ms) faml v0.7.1: 223612.4 i/s

    (0.004ms) - 1.82x slower haml v5.0.0.beta.2: 21823.1 i/s (0.046ms) - 18.69x slower Boolean support
  91. But does it really work? • Also in Rails tag

    helpers, false is not deleted for non-boolean attributes = content_tag :input, '', value: false <input value='false'></input>
  92. • It could pass 20,000+ tests in the World's largest

    Rails application! But does it really work? https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-largest-rails-monolith
  93. Why Hamlit is the fastest? • Faml and Slim has

    boolean support for all attributes • So Hamlit is faster in non-boolean attributes • Give up trivial things to make things better!
  94. Comparison of Haml engines • Haml • Slow and rarely

    maintained now • I sent a patch to replace backend, but not merged • Faml • Fast and highly compatible • Hamlit • Fastest and slightly incompatible
  95. Conclusion • How to improve performance • Benchmark, Profiling, Improvement

    • Real examples of improvements • Faml and Hamlit • Try our faster Haml engines!