Slide 1

Slide 1 text

Kohei Suzuki, Takashi Kokubun High Performance Template Engine A guide to optimizing your Ruby code

Slide 2

Slide 2 text

Self introduction • Kohei Suzuki • @eagletmt • Developer Productivity Group,
 Cookpad Inc. • Favorite library: pathname

Slide 3

Slide 3 text

Self introduction • Takashi Kokubun • @k0kubun • Developer Productivity Group,
 Cookpad Inc. • Favorite library: ripper

Slide 4

Slide 4 text

✤ What is a template engine? ✤ Template Engine Examples • Template Engine Internals • Performance • How to optimize Ruby code? • What did we do for high performance template engines?

Slide 5

Slide 5 text

What is a Template Engine? • Template engines render text (typically HTML) by combining data with a template written in a template language • ERB, Haml, Slim, ...

Slide 6

Slide 6 text

ERB • ERB is a template engine included in the Ruby standard library

<%= @title %>

    <%- @items.each do |item| %>
  • <%= item %>
  • <% end %>

Slide 7

Slide 7 text

It works!

  • item 1
  • item 2
  • item 3
ERB • Rendered output

Slide 8

Slide 8 text

Haml %h1{ class: 'title' }= @title %ul - @items.each do |item| %li.item= item • Haml is an elegant, structured (X)HTML/XML templating engine

Slide 9

Slide 9 text

Haml

It works!

  • Item 1
  • Item 2
  • Item 3
• Rendered output

Slide 10

Slide 10 text

Slim h1 class='title' = @title ul - @items.each do |item| li.item= item • Slim is a fast, lightweight template engine for Ruby

Slide 11

Slide 11 text

Slim

It works!

  • Item 1
  • Item 2
  • Item 3
• Rendered output

Slide 12

Slide 12 text

✤ What is a template engine? • Template Engine Examples ✤ Template Engine Internals • Performance • How to optimize Ruby code? • What did we do for high performance template engines?

Slide 13

Slide 13 text

Template Engine Internals • Template engines compile templates in Ruby code Template Ruby code compile %h1 It works! _hamlout.push_text( "

It works!

\n" , 0, false);

Slide 14

Slide 14 text

Template Engine Internals • Ruby code renders HTML Ruby code HTML render _hamlout.push_text( "

It works!

\n" , 0, false);

It works!

Slide 15

Slide 15 text

Haml Example %a{href: 'http://rubykaigi.org/2015'} %a{ href: 'http://rubykaigi.org/2015' }

Slide 16

Slide 16 text

Haml Example %a{href: 'http://rubykaigi.org/2015'} _hamlout.push_text( "\n", 0, false );

Slide 17

Slide 17 text

✤ What is a template engine? • Template Engine Examples • Template Engine Internals ✤ Performance • How to optimize Ruby code? • What did we do for high performance template engines?

Slide 18

Slide 18 text

Haml vs Slim • Haml has nice syntax, but its implementation is not very performant • Slim's syntax is not as nice, but it has a great, performant implementation

Slide 19

Slide 19 text

Faster Haml Engine • We love Haml language, so we both implemented faster Haml engines individually w IUUQTHJUIVCDPNFBHMFUNUGBNM w IUUQTHJUIVCDPNLLVCVOIBNMJU

Slide 20

Slide 20 text

• What is a template engine? ✤ How to optimize Ruby code? • What did we do for high performance template engines?

Slide 21

Slide 21 text

Optimize your Ruby code • YOUR CODE IS SLOW • if you don't know how to write fast code

Slide 22

Slide 22 text

3 steps of optimization 1. Benchmark 2. Profiling 3. Improvement

Slide 23

Slide 23 text

• What is a template engine? ✤ How to optimize Ruby code? ✤ Benchmark • Profiling • Improvement • What did we do for high performance template engines?

Slide 24

Slide 24 text

Why is benchmarking necessary? • To measure performance accurately • Profilers have overhead • Even if it is fast in the profiler, it may benchmark slow • For continuous improvement • You can't detect performance regression without benchmark

Slide 25

Slide 25 text

How to benchmark? • Use benchmark-ips gem • Show a result in an easy-to-understand way Rendering of slim/benchmarks with HTML escaped hamlit v2.0.1: 122622.3 i/s faml v0.7.1: 94239.1 i/s - 1.30x slower slim v3.0.6: 89143.0 i/s - 1.38x slower erubis v2.7.0: 65047.8 i/s - 1.89x slower haml v5.0.0.beta.2: 14363.6 i/s - 8.54x slower

Slide 26

Slide 26 text

What to measure? • Sometimes a problem has a trade-off • trade-off between compilation time and rendering time Rendering of haml/test/templates/standard.haml hamlit v2.0.1: 12351.8 i/s (0.081ms) faml v0.7.0: 9713.4 i/s (0.103ms) - 1.27x slower haml v5.0.0.beta.2: 2296.5 i/s (0.435ms) - 5.38x slower

Slide 27

Slide 27 text

What to measure? • Sometimes a problem has a trade-off • trade-off between compilation time and rendering time Compilation of haml/test/templates/standard.haml haml v5.0.0.beta.2: 388.2 i/s (2.576ms) hamlit v2.0.1: 193.7 i/s (5.163ms) - 2.00x slower faml v0.7.0: 188.0 i/s (5.320ms) - 2.07x slower

Slide 28

Slide 28 text

• What is a template engine? ✤ How to optimize Ruby code? • Benchmark ✤ Profiling • Improvement • What did we do for high performance template engines?

Slide 29

Slide 29 text

Fundamental Rule of Optimisation • Don't guess, measure • It's a waste of time to optimize trivial things • The bottleneck may change at any time

Slide 30

Slide 30 text

Recommended profilers • stackprof gem • rblineprof gem http://rubykaigi.org/2014/presentation/S-AmanGupta For detail: RubyKaigi 2014 "Ruby 2.1 in Production"

Slide 31

Slide 31 text

stackprof usage in Hamlit repo • To search the entire stack to find the bottlenecks in template compilation $ bin/stackprof test/haml/templates/standard.haml ================================== Mode: wall(1) Samples: 8034 (70.35% miss rate) GC: 787 (9.80%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 498 (6.2%) 498 (6.2%) Temple::Mixins::CompiledDispatcher#disp 893 (11.1%) 319 (4.0%) Ripper::Lexer#lex 2999 (37.3%) 237 (2.9%) Hamlit::HTML#dispatcher 2070 (25.8%) 220 (2.7%) Temple::Filters::ControlFlow#dispatcher 4600 (57.3%) 189 (2.4%) Hamlit::Escapable#dispatcher 164 (2.0%) 164 (2.0%) Temple::Mixins::CompiledDispatcher::Dis 174 (2.2%) 160 (2.0%) block in Temple::ImmutableMap#[]

Slide 32

Slide 32 text

rblineprof usage in Hamlit repo • To find bottlenecks in the compiled template code $ bin/lineprof test/haml/templates/standard.haml [Lineprof] ====================================================================== /private/var/folders/my/syd7zn_d495dmjm7_y8lqby80000gp/T/ compiled20151204-39353-9l8fvy | 16 ; _hamlit_compiler1 = ( 1 + 9 + 8 + 2 #numbers should work and this should be ignored; 0.2ms 200 | 17 ; ); _buf << (::Hamlit::Utils.escape_html(((_hamlit_compiler1).to_s))); _buf << ("\n\n
Quotes should be loved! Just like people!
\n".freeze); 57.5ms 100 | 18 ; 120.times do |number|; | 19 ; _hamlit_compiler2 = ( number; 31.5ms 24000 | 20 ; ); _buf <<

Slide 33

Slide 33 text

• What is a template engine? ✤ How to optimize Ruby code? • Benchmark • Profiling ✤ Improvement • What did we do for high performance template engines?

Slide 34

Slide 34 text

How to improve 1. Don't guess, measure (again) • Profiler tells you what to optimize • Benchmark tells you which code is faster

Slide 35

Slide 35 text

2. Profiling 1. Benchmark 3. Improvement How to improve 2. Keep this iteration

Slide 36

Slide 36 text

How to improve 3. Learn from others • We'll show you examples of template engine optimization

Slide 37

Slide 37 text

• What is a template engine? • How to optimize Ruby code? ✤ What did we do for high performance template engines? ✤ Faml side • Hamlit side

Slide 38

Slide 38 text

Faml • @eagletmt started faml development as a complete replacement of haml • High compatibility with improved performance • Basic ideas for high performance: • Follow Slim • Perform optimization at compile time

Slide 39

Slide 39 text

Slim's Benchmark Compiled benchmark (i/s) 0 20000 40000 60000 80000 erb slim ugly haml ugly https://travis-ci.org/slim-template/slim/jobs/94130074#L188-L195

Slide 40

Slide 40 text

Why does Slim perform well? • Slim uses Temple gem as backend • Temple performs generic optimization automatically • I decided to use Temple as backend • https://github.com/judofyr/temple

Slide 41

Slide 41 text

• Haml generates naive Ruby code Haml %a{ href: 'http://rubykaigi.org/2015' } _hamlout.buffer << "\n";

Slide 42

Slide 42 text

Slim a href='http://rubykaigi.org/2015' _buf = []; _buf << ("".freeze); ; _buf = _buf.join • Slim generates a static string literal at compile time

Slide 43

Slide 43 text

• What is a template engine? • How to optimize Ruby code? ✤ What did we do for high performance template engines? ✤ Faml side ✤ Attribute Optimization • Faster Runtime Attribute Builder • Hamlit side

Slide 44

Slide 44 text

Static Analysis • Haml should also be compiled into static string literal like Slim • But Ruby parser is required to achieve it %a{ href: 'http://rubykaigi.org/2015' } %a{ :href=>'http://rubykaigi.org/2015' } %a{ 'href'=>'http://rubykaigi.org/2015' } %a{ 'href': 'http://rubykaigi.org/2015' }

Slide 45

Slide 45 text

parser gem • https://github.com/whitequark/parser • Ruby parser, used by RuboCop, Transpec, ... • Easy to use • AST with rich source code information

Slide 46

Slide 46 text

Attribute Optimization • Faml categorizes attributes into 3 types by parsing Ruby code • Static • Dynamic • Runtime

Slide 47

Slide 47 text

Static Attribute • Both key and value are static • Fastest • No operations in runtime %a{ href: 'http://rubykaigi.org/2015' }

Slide 48

Slide 48 text

Dynamic Attribute %a{ href: url } • Key is static, but the value is dynamic • Relatively fast • Escape url and concat it in runtime

Slide 49

Slide 49 text

Runtime Attribute • Key and value are dynamic • Slow • Build whole attribute list in runtime %a{ key => url }

Slide 50

Slide 50 text

• Sometimes optimization is impossible • Dynamic attributes? Multiple line attributes %a{ class: 'link', href: url }

Slide 51

Slide 51 text

Multiple line attributes %a{ class: 'link', href: url }

Slide 52

Slide 52 text

Line Numbers • We have to keep line numbers • for correct backtrace • (for correct __LINE__ value)

Slide 53

Slide 53 text

• It have to be compiled as runtime attributes Line Numbers 1 %a{ class: 'link', 2 href: url } 1 buf << ("\n".freeze); 3 ; _buf = _buf.join

Slide 54

Slide 54 text

Line Numbers 1 %a{ class: 'link', 2 href: url }

Slide 55

Slide 55 text

• What is a template engine? • How to optimize Ruby code? ✤ What did we do for high performance template engines? ✤ Faml side • Attribute Optimization ✤ Faster Runtime Attribute Builder • Hamlit side

Slide 56

Slide 56 text

C extension • C is faster than Ruby! • If performance is really important, writing C extension is a good choice.

Slide 57

Slide 57 text

C extension • I wrote runtime attribute builder in C++ • Ruby version (before v0.1.0) • 41889.8 i/s • C++ version (v0.7.1) • 90168.6 i/s

Slide 58

Slide 58 text

In Production • Cookpad http://cookpad.com • Cookpad Blog https://cookpad-blog.jp • Cookpad Video https://cookpad-video.jp

Slide 59

Slide 59 text

• What is a template engine? • How to optimize Ruby code? ✤ What did we do for high performance template engines? • Faml side ✤ Hamlit side

Slide 60

Slide 60 text

Hamlit • Designed to defeat Slim • I've heard many people said “migrating from Haml to Slim because it's faster.” • Hamlit means “Haml it” (write it with Haml)

Slide 61

Slide 61 text

Slim's compiled benchmark with HTML-escaping (i/s) 0 35000 70000 105000 140000 Hamlit Faml Slim Haml https://travis-ci.org/k0kubun/hamlit/jobs/93928561#L247-L251 Hamlit is faster than Slim

Slide 62

Slide 62 text

Hamlit’s strategy • Reduce string allocation and concatenation by: • compiling string interpolation • dropping unused behaviors

Slide 63

Slide 63 text

• What is a template engine? • How to optimize Ruby code? ✤ What did we do for high performance template engines? • Faml side ✤ Hamlit side ✤ Compiling string interpolation • Dropping unused behaviors

Slide 64

Slide 64 text

How to compile template? • We should care about: 1. String allocation 2. String concatenation

Slide 65

Slide 65 text

1. String allocation • Utilize frozen string literal • Thanks to Temple::Generator, static string is frozen automatically! • Slim, Faml and Hamlit use this

Slide 66

Slide 66 text

2. String concatenation • String interpolation is fast Benchmark.ips do |x| x.report("Array#join") { ['hello', 1234].join } x.report("interpolation") { "#{'hello'}#{1234}" } x.compare! end

Slide 67

Slide 67 text

2. String concatenation • String interpolation is fast $ ruby bench.rb Comparison: interpolation: 1115751.8 i/s Array#join: 507283.5 i/s - 2.20x slower

Slide 68

Slide 68 text

How should we compile interpolated String? • Suppose that you are a Ruby interpreter, what code would be pleasant? - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

Slide 69

Slide 69 text

How should we compile interpolated String? year = 2015 _hamlout.buffer << "#{ "RubyKaigi #{Haml::Helpers.html_escape((year))}" }\n"; Haml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

Slide 70

Slide 70 text

How should we compile interpolated String? year = 2015 _hamlout.buffer << "#{ "RubyKaigi #{Haml::Helpers.html_escape((year))}" }\n"; Haml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

Slide 71

Slide 71 text

How should we compile interpolated String? _buf = []; year = 2015; ; _buf << ("RubyKaigi ".freeze); _buf << (::Temple::Utils.escape_html((year))); _buf << ("\n".freeze); ; _buf = _buf.join Faml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

Slide 72

Slide 72 text

How should we compile interpolated String? _buf = []; year = 2015; ; _buf << ("RubyKaigi ".freeze); _buf << (::Temple::Utils.escape_html((year))); _buf << ("\n".freeze); ; _buf = _buf.join Faml - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

Slide 73

Slide 73 text

How should we compile interpolated String? _buf = []; year = 2015; ; _buf << ("RubyKaigi ".freeze); _buf << (::Hamlit::Utils.escape_html((year))); ; _buf << ("\n".freeze); _buf = _buf.join Hamlit - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

Slide 74

Slide 74 text

How should we compile interpolated String? _buf = []; year = 2015; ; _buf << ("RubyKaigi ".freeze); _buf << (::Hamlit::Utils.escape_html((year))); ; _buf << ("\n".freeze); _buf = _buf.join Hamlit - year = 2015 %a{ href: "http://rubykaigi.org/#{year}" } RubyKaigi #{year}

Slide 75

Slide 75 text

How should we compile interpolated String? Comparison: hamlit v2.0.1: 301640.2 i/s faml v0.7.1: 199001.5 i/s - 1.52x slower haml v5.0.0.beta.2: 14714.4 i/s - 20.50x slower

Slide 76

Slide 76 text

Tips to write faster code • Don't allocate string • Reduce string concatenation • Fastest way to concatenate string is not to concatenate string

Slide 77

Slide 77 text

• What is a template engine? • How to optimize Ruby code? ✤ What did we do for high performance template engines? • Faml side ✤ Hamlit side • Compiling string interpolation ✤ Dropping unused behaviors

Slide 78

Slide 78 text

Dropping unused behavior • Since Haml and Slim have rich syntax and behaviors in attributes, rendering attributes is a bottleneck • In other words, optimization chance

Slide 79

Slide 79 text

Dropping unused behavior • To optimize attribute rendering, Faml and Hamlit drop some unused behavior • Let's see how they are different!

Slide 80

Slide 80 text

Dropped features in Hamlit • Hamlit supports following features for limited attributes • Data attribute hyphenation • Boolean attribute

Slide 81

Slide 81 text

Data attribute hyphenation • In Haml, nested Hash is expanded with hyphen for all attributes
%div{ foo: { bar: 'baz' } } %div{ data: { bar: 'baz' } } Haml

Slide 82

Slide 82 text

• In Faml and Hamlit, data attribute hyphenation is supported only for data attribute Data attribute hyphenation Haml Faml, Hamlit
%div{ foo: { bar: 'baz' } } %div{ data: { bar: 'baz' } }

Slide 83

Slide 83 text

Data attribute hyphenation • Hyphenating data attribute is expensive • So we dropped it to generate faster code in non-data attributes

Slide 84

Slide 84 text

; _buf << ("\n".freeze); Data attribute hyphenation • No code to hyphenate Hash

Slide 85

Slide 85 text

Data attribute hyphenation https://travis-ci.org/k0kubun/hamlit/jobs/96207038#L257-L260 - disabled = false %input{ disabled: disabled } - disabled = true %input{ disabled: disabled } Comparison: hamlit v2.0.1: 819212.4 i/s (0.001ms) faml v0.7.1: 614993.4 i/s (0.002ms) - 1.33x slower haml v5.0.0.beta.2: 15073.2 i/s (0.066ms) - 54.35x slower • Benchmark for non-data attribute

Slide 86

Slide 86 text

Boolean support • Only with Hamlit, non-boolean attributes are not deleted by falsey values (nil, false) Haml, Faml Hamlit %a{ href: false } %a{ disabled: false }

Slide 87

Slide 87 text

Boolean support • Only with Hamlit, non-boolean attributes are not deleted by falsey values (nil, false) • It means that Hamlit doesn't need to check and concatenate value on runtime for non- boolean attributes

Slide 88

Slide 88 text

Boolean support _buf = []; url = 'http://rubykaigi.org/2015'; ; _buf << ("\n".freeze); • Faml compilation for non-boolean attribute

Slide 89

Slide 89 text

Boolean support _buf = []; url = 'http://rubykaigi.org/2015'; ; _buf << ("\n".freeze); _buf = _buf.join • Hamlit compilation for non-boolean attribute

Slide 90

Slide 90 text

Comparison: hamlit v2.0.1: 407851.9 i/s (0.002ms) faml v0.7.1: 223612.4 i/s (0.004ms) - 1.82x slower haml v5.0.0.beta.2: 21823.1 i/s (0.046ms) - 18.69x slower Boolean support

Slide 91

Slide 91 text

But does it really work? • Also in Rails tag helpers, false is not deleted for non-boolean attributes = content_tag :input, '', value: false

Slide 92

Slide 92 text

• It could pass 20,000+ tests in the World's largest Rails application! But does it really work? https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-largest-rails-monolith

Slide 93

Slide 93 text

Why Hamlit is the fastest? • Faml and Slim has boolean support for all attributes • So Hamlit is faster in non-boolean attributes • Give up trivial things to make things better!

Slide 94

Slide 94 text

Comparison of Haml engines • Haml • Slow and rarely maintained now • I sent a patch to replace backend, but not merged • Faml • Fast and highly compatible • Hamlit • Fastest and slightly incompatible

Slide 95

Slide 95 text

Conclusion • How to improve performance • Benchmark, Profiling, Improvement • Real examples of improvements • Faml and Hamlit • Try our faster Haml engines!