Slide 1

Slide 1 text

3x Rails Akira Matsuda RAILSCONF 2016

Slide 2

Slide 2 text

3x Rails? Akira Matsuda RAILSCONF 2016

Slide 3

Slide 3 text

Matz @ RubyKaigi 2015

Slide 4

Slide 4 text

Ruby 3x3 Matz: “Ruby 3.0 will be 3 times faster!”

Slide 5

Slide 5 text

3x Rails? Wait until Ruby 3.0 release Run your Rails app on
 Ruby 3.0 Done.

Slide 6

Slide 6 text

self name: Akira GitHub: amatsuda Twitter: @a_matsuda

Slide 7

Slide 7 text

Ruby

Slide 8

Slide 8 text

Rails

Slide 9

Slide 9 text

Gems kaminari active_decorator motorhead stateful_enum action_args (asakusarb)

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Asakusa.rb Since 2008 356 meetups 30+ Ruby Core committers Attendees from 19 different countries

Slide 12

Slide 12 text

RubyKaigi Chief Organizer

Slide 13

Slide 13 text

RubyKaigi 2015 = ͟ ͟͞͞

Slide 14

Slide 14 text

RubyKaigi 2016 September 8..10 Kyoto (not in Tokyo!)

Slide 15

Slide 15 text

Kyoto (ژ౎)

Slide 16

Slide 16 text

RubyKaigi - Venue

Slide 17

Slide 17 text

RubyKaigi - Main Hall

Slide 18

Slide 18 text

RubyKaigi - Hall B

Slide 19

Slide 19 text

RubyKaigi - Garden

Slide 20

Slide 20 text

RubyKaigi 2016 CFP is open! Tickets are available! http:/ /rubykaigi.org/

Slide 21

Slide 21 text

begin

Slide 22

Slide 22 text

Speeding Up
 the Rails Framework

Slide 23

Slide 23 text

Know the Speed

Slide 24

Slide 24 text

How Can We Measure the Speed?

Slide 25

Slide 25 text

Use benchmark-ips You can benchmark anything inside the block

Slide 26

Slide 26 text

For Example If you want to very roughly benchmark the whole Rails app's request processing...

Slide 27

Slide 27 text

A Horrible Way to Benchmark Rails’ Request Processing # This is not a very beautiful code, but it's just an example... # And it kinda works... require 'benchmark/ips' Rails.application.config.after_initialize do Rails.application.extend Module.new { def call(e) super Benchmark.ips do |x| x.report('Rails.application#call') do super end end super end } end

Slide 28

Slide 28 text

How Can We Improve
 This Score?

Slide 29

Slide 29 text

My 1st Assumption GC

Slide 30

Slide 30 text

Ruby Is Slow Because Ruby GC Is Slow Everyone knows this fact, right? I heard GC takes 30% of the whole Response time in Rails

Slide 31

Slide 31 text

Let's Observe the GC GC.stat gc_tracer

Slide 32

Slide 32 text

Adding GC.stat to the ips Code Rails.application.config.after_initialize do Rails.application.extend Module.new { def call(e) super + p before: GC.stat Benchmark.ips do |x| x.report('Rails.application#call') do super end end + p after: GC.stat super end } end

Slide 33

Slide 33 text

{:before=>{:count=>53, ..., :minor_gc_count=>44, :major_gc_count=>9, …}} Warming up -------------------------------------- Rails.application#call 4.000 i/100ms Calculating ------------------------------------- Rails.application#call 45.233 (± 4.4%) i/s - 228.000 in 5.052723s {:after=>{:count=>107, ..., :minor_gc_count=>95, :major_gc_count=>12, ...}} The GC.stat + ips Result 
 (scaffold index, 100 AR models)

Slide 34

Slide 34 text

The GC.stat + ips Result 
 (scaffold index, 100 AR models) GC is surely happening

Slide 35

Slide 35 text

Let's Stop the GC RUBY_GC_HEAP_INIT_ SLOTS=1000000 + GC.disable & stat

Slide 36

Slide 36 text

Result {:before=>{:count=>3, ..., :minor_gc_count=>1, :major_gc_count=>2, …}} Warming up -------------------------------------- Rails.application#call 4.000 i/100ms Calculating ------------------------------------- Rails.application#call 50.449 (± 5.9%) i/s - 252.000 in 5.008789s {:after=>{:count=>5, ..., :minor_gc_count=>1, :major_gc_count=>4, ...}}

Slide 37

Slide 37 text

Summary (GC) The GC adds about 10% overhead

Slide 38

Slide 38 text

History of Ruby GC Improvement 1.9.3: Lazy Sweep (nari3) 2.0 : Bitmap Marking (nari3) 2.1 : RGen GC (ko1) 2.2 : Incremental GC (ko1),
 2 age RGenGC (ko1),
 Symbol GC (nari3)

Slide 39

Slide 39 text

Hats Off to ko1! ko1 really keeps doing amazing amount & quality of Ruby internal improvements!

Slide 40

Slide 40 text

Garbage Strings

Slide 41

Slide 41 text

Garbage Strings Used to Be a Big Concern There was a trend putting so many `.freeze` here and there in the code I thought that made our codebase super ugly. I did no more want to see PRs that adds `.freeze` to String literals in the framework So I proposed the magic comment
 (ruby 2.3): # frozen-string-literal: true Not mainly in order to improve the speed but in order to stop people's code pollution!

Slide 42

Slide 42 text

# frozen-string-literal: true Have anyone tried this? Maybe Rails will become a little bit faster if you put this to all .rb files in Rails Then we could remove all explicit `.freeze` calls We need to add some `.dup` calls though

Slide 43

Slide 43 text

Anyway,

Slide 44

Slide 44 text

Garbage Strings String garbages do no more affect your app's throughput Let's stop caring about that now!

Slide 45

Slide 45 text

Another Ruby Myth

Slide 46

Slide 46 text

Ruby Is Slow Because It's a Scripting Language?

Slide 47

Slide 47 text

Ruby 2.3 New Features! RubyVM::InstructionSequence#
 to_binary(extra_data = nil) RubyVM::InstructionSequence.
 load_from_binary(binary) RubyVM::InstructionSequence.
 load_from_binary_extra_data(binary)

Slide 48

Slide 48 text

Ruby 2.3 New Features! You can precompile Ruby code now!

Slide 49

Slide 49 text

yomikomu? See ko1's talk tomorrow!

Slide 50

Slide 50 text

So,

Slide 51

Slide 51 text

Which Part of the App Takes Time? Let’s profile!

Slide 52

Slide 52 text

stackprof A sampling call-stack profiler Flamegraph support https:/ /github.com/tmm1/ stackprof

Slide 53

Slide 53 text

peek-rblineprof Shows how much time each line of your Rails application takes throughout a request https:/ /github.com/peek/ peek-rblineprof

Slide 54

Slide 54 text

TracePoint

Slide 55

Slide 55 text

You Can Simply Count the Numbers of Method Calls Without Adding a Gem Use Ruby's built in TracePoint API (ko1)

Slide 56

Slide 56 text

Counting Method Calls Using TracePoint class MethodCounter def initialize(app) @app = app end def call(env) calls = [] trace = TracePoint.new(:call, :c_call) do |tp| calls << [tp.defined_class, tp.method_id, tp.lineno] end trace.enable ret = @app.call env trace.disable pp calls.group_by(&:itself).map {|k, v| {k => v.length}}.sort_by {|h|
 -h.values.first} ret end end use MethodCounter

Slide 57

Slide 57 text

Top 10 Method Calls on the Scaffold Index (100 AR models) {[ActiveSupport::SafeBuffer, :html_safe?, 212] => 1622}, {[Object, :html_safe?, 123] => 1213}, {[Set, :include?, 214] => 1137}, {[CGI::Escape, :escapeHTML, 39] => 913}, {[#, :unwrapped_html_escape, 34] => 913}, {[ActiveSupport::Multibyte::Unicode, :tidy_bytes, 245] => 913}, {[String, :scrub, 248] => 912}, {[ActiveRecord::AttributeSet, :[], 9] => 900}, {[ActiveRecord::LazyAttributeHash, :[], 39] => 900}, {[ActiveRecord::AttributeSet, :fetch_value, 41] => 900},

Slide 58

Slide 58 text

However, These are well known theories that you might heard of before I’ll take a different approach today I’ll show you some known problems (to me) through my experience

Slide 59

Slide 59 text

Rails Consists of
 M, V, and C Which one of
 M, V, or C is working heavily?

Slide 60

Slide 60 text

How about ActionPack? ActionPack sits on top of Rack Let's see if we could find a bottleneck in the middleware stack Or maybe we could compose a minimum Rack middleware stack for our app?

Slide 61

Slide 61 text

Minimum Rack Middleware Stack This is what rails-api (which has been merged into Rails 5) does Let's see which Rack middleware takes time

Slide 62

Slide 62 text

Measuring Each Rack Middleware # Again, very roughly implemented monkey-patch that kinda works... $RACK_BENCH_BEFORE_CALL = $RACK_BENCH_AFTER_CALL = nil module RackBench def call(*) p "#{self.class}: before call" => Time.now - $RACK_BENCH_BEFORE_CALL if $RACK_BENCH_BEFORE_CALL $RACK_BENCH_BEFORE_CALL = Time.now ret = super p "#{self.class}: after call" => Time.now - $RACK_BENCH_AFTER_CALL if $RACK_BENCH_AFTER_CALL $RACK_BENCH_AFTER_CALL = Time.now ret end end Rails.configuration.middleware.each do |m| if m.klass.respond_to? :prepend m.klass.prepend RackBench else m.klass.singleton_class.prepend RackBench end end

Slide 63

Slide 63 text

Measuring Each Rack Middleware - Result {"ActionDispatch::Static: before call" => 8.0e-06} {"ActionDispatch::Executor: before call" => 0.000107} {"AS::Cache::Strategy::LocalCache::Middleware: before call" => 0.002279} {"Rack::Runtime: before call" => 1.9e-05} {"Rack::MethodOverride: before call" => 8.0e-06} {"ActionDispatch::RequestId: before call" => 1.0e-05} {"Rails::Rack::Logger: before call" => 4.8e-05} {"ActionDispatch::ShowExceptions: before call" => 0.00023} {"ActionDispatch::DebugExceptions: before call" => 1.0e-05} {"ActionDispatch::RemoteIp: before call" => 1.0e-05} {"ActionDispatch::Callbacks: before call" => 1.5e-05} {"ActionDispatch::Cookies: before call" => 1.6e-05} {"ActionDispatch::Session::CookieStore: before call" => 1.0e-05} {"Rack::Head: before call" => 3.1e-05} {"Rack::ConditionalGet: before call" => 5.0e-06} {"Rack::ETag: before call" => 6.0e-06}

Slide 64

Slide 64 text

Measuring Each Rack Middleware - Result (2) {"Rack::ConditionalGet: after call" => 2.4e-05} {"Rack::Head: after call" => 5.0e-06} {"ActionDispatch::Session::CookieStore: after call" => 0.000269} {"ActionDispatch::Cookies: after call" => 7.8e-05} {"ActionDispatch::Callbacks: after call" => 4.0e-06} {"ActionDispatch::RemoteIp: after call" => 1.3e-05} {"ActionDispatch::DebugExceptions: after call" => 6.0e-06} {"ActionDispatch::ShowExceptions: after call" => 2.0e-06} {"Rails::Rack::Logger: after call" => 2.2e-05} {"ActionDispatch::RequestId: after call" => 1.0e-05} {"Rack::MethodOverride: after call" => 3.0e-06} {"Rack::Runtime: after call" => 1.4e-05} {"AS::Cache::Strategy::LocalCache::Middleware: after call" => 7.0e-06} {"ActionDispatch::Executor: after call" => 5.0e-06} {"ActionDispatch::Static: after call" => 2.0e-06} {"Rack::Sendfile: after call" => 7.0e-06}

Slide 65

Slide 65 text

There's No Slow Middleware in the Default Stack It wouldn't be that effective if we could speed up or remove some Rack middleware

Slide 66

Slide 66 text

Back to the Method Calls List Again,

Slide 67

Slide 67 text

Top 10 Method Calls on the Scaffold Index (100 AR models) {[ActiveSupport::SafeBuffer, :html_safe?, 212] => 1622}, {[Object, :html_safe?, 123] => 1213}, {[Set, :include?, 214] => 1137}, {[CGI::Escape, :escapeHTML, 39] => 913}, {[#, :unwrapped_html_escape, 34] => 913}, {[ActiveSupport::Multibyte::Unicode, :tidy_bytes, 245] => 913}, {[String, :scrub, 248] => 912}, {[ActiveRecord::AttributeSet, :[], 9] => 900}, {[ActiveRecord::LazyAttributeHash, :[], 39] => 900}, {[ActiveRecord::AttributeSet, :fetch_value, 41] => 900},

Slide 68

Slide 68 text

ActionView

Slide 69

Slide 69 text

ActionView Has Some Performance Problems, For Sure

Slide 70

Slide 70 text

ActionView
 Template Rendering Flow Template lookup Template compilation Template rendering

Slide 71

Slide 71 text

Speeding Up
 Template Lookup

Slide 72

Slide 72 text

Current Implementation of Template Lookup # AV/template/resolver.rb module ActionView class PathResolver < Resolver #:nodoc: def find_template_paths(query) Dir[query].uniq.reject do |filename| File.directory?(filename) || # deals with case-insensitive file systems. !File.fnmatch(query, filename, File::FNM_EXTGLOB) ennnnd

Slide 73

Slide 73 text

Current Implementation The Resolver queries to the filesystem per each template rendering Queries with a Bash-like globbing format

Slide 74

Slide 74 text

Couldn't We Speed This Up? By default, AV uses a Resolver called “OptimizedResolver” Maybe we can create “MoreOptimizedResolver”?

Slide 75

Slide 75 text

MoreOptimizedResolver - Concept Why don't we cache all filenames, and perform the template search in memory?
 (in production env)

Slide 76

Slide 76 text

MoreOptimizedResolver - Implementation https:/ /github.com/ amatsuda/ more_optimized_resolver

Slide 77

Slide 77 text

Benchmark require 'benchmark/ips' view = Class.new(ActionView::Base).new('.') path, _prefix, *args = view.lookup_context.send(
 :args_for_lookup, 'foo', [], false, [], {}) resolver = ::ActionView::OptimizedFileSystemResolver.new '.' Benchmark.ips do |x| x.report('default') { resolver.find_all(path, '', *args) } end

Slide 78

Slide 78 text

Benchmark Result # The original Resolver (OptimizedResolver) Warming up -------------------------------------- default 17.000 i/100ms Calculating ------------------------------------- default 179.250 (± 3.3%) i/s - 901.000 in 5.031392s # MoreOptimizedResolver Warming up -------------------------------------- default 320.000 i/100ms Calculating ------------------------------------- default 3.266k (± 2.8%) i/s - 16.640k in 5.099793s

Slide 79

Slide 79 text

Benchmark Result 18x faster than the AV default Resolver!
 (in a micro benchmark)

Slide 80

Slide 80 text

Inline render partial - Concept render_partial is basically slow Because it looks up the template And creates another buffer, runs another template compilation & rendering per each partial We don't always need a new context for a partial Simply concatenating templates (just like PHP's include) would be enough in some cases e.g. `<%= render 'footer' %>`

Slide 81

Slide 81 text

Inline render partial - Implementation WIP

Slide 82

Slide 82 text

Another Idea `render` method does too much assumptions Maybe we can give more hints to `render` to make template resolution faster?

Slide 83

Slide 83 text

render :path - Concept Maybe `render` can accept full_path template name so that it doesn't have to scan through all PathSets?

Slide 84

Slide 84 text

render :path - API render path: __dir__ + 'foo' render relative: 'foo'

Slide 85

Slide 85 text

render :path - Implementation Unimplemented

Slide 86

Slide 86 text

Parallelized render partial - Concept `render_collection` could be parallelized Partials are basically individual We could render all of them at once using Threads

Slide 87

Slide 87 text

Parellelize render partial - Result I tried, But with this patch, ActiveRecord connections very easily bloats up And that very often causes "Too many connections" error

Slide 88

Slide 88 text

Remote Render Partial - Concept We sometimes want heavy partials to be rendered lazily Would be nice if we could render via Ajax (or ActiveJob, maybe)

Slide 89

Slide 89 text

Remote Render Partial - Implementation https:/ /github.com/ amatsuda/ljax_rails (I forgot what "ljax" stands for)

Slide 90

Slide 90 text

ljax_rails - API <%= render 'users', remote: true %>

Slide 91

Slide 91 text

ljax_rails - Result Kind of works I’m not using it though

Slide 92

Slide 92 text

Template Rendering IMO the most unneeded effort in AV template rendering is Encoding support

Slide 93

Slide 93 text

Current Implementation # template/handlers/erb.rb module ActionView class Template module Handlers class ERB def call(template) # First, convert to BINARY, so in case the encoding is # wrong, we can still find an encoding tag # (<%# encoding %>) inside the String using a regular # expression template_source = template.source.dup.force_encoding(Encoding::ASCII_8BIT) erb = template_source.gsub(ENCODING_TAG, '') encoding = $2 erb.force_encoding valid_encoding(template.source.dup, encoding) # Always make sure we return a String in the default_internal erb.encode! self.class.erb_implementation.new( erb, :escape => (self.class.escape_whitelist.include? template.type), :trim => (self.class.erb_trim_mode == "-") ).src ennnnnd

Slide 94

Slide 94 text

What We Do for the
 Multi Encoding Support `.dup` the given template source .`force_encoding` the source to Binary Extract the magic encoding comment from the template source `.dup` the given template source again `.force_enconding` the template source if a magic comment was found `.force_enconding` the ERB template `.encode!` the ERB template

Slide 95

Slide 95 text

Who Needs This Encoding Support? Who actually writes a non-UTF8 view file? Who actually puts an encoding magic comment in the view files? We see some test cases concerning Shift JIS encoded templates, but I'm sure nobody does this in Japan

Slide 96

Slide 96 text

Current Status 99.9% of Rails apps in the world do not require this feature But this default behavior puts the brakes on everyone’s apps [citation needed]

Slide 97

Slide 97 text

My Suggestion No Encoding conversion! Let’s assume that everybody writes their template in UTF-8 If that’s too aggressive, maybe we could extract this feature to a gem

Slide 98

Slide 98 text

def call(template) - # 4 lines of comments - template_source = template.source.dup.force_encoding(Encoding::ASCII_8BIT) - - erb = template_source.gsub(ENCODING_TAG, '') - encoding = $2 - - erb.force_encoding valid_encoding(template.source.dup, encoding) - - # Always make sure we return a String in the default_internal - erb.encode! + erb = template.source self.class.erb_implementation.new( erb, :escape => (self.class.escape_whitelist.include? template.type), :trim => (self.class.erb_trim_mode == "-") ).src end UTF-8 Only ERBHandler - The Patch

Slide 99

Slide 99 text

UTF-8 Only ERBHandler - Benchmark (200 lines ERB) require 'benchmark/ips' view = Class.new(ActionView::Base).new('.') template = view.lookup_context.find_template('foo') erb = ::ActionView::Template::Handlers::ERB.new Benchmark.ips do |x| x.report('default or patched') { erb.call template } end

Slide 100

Slide 100 text

UTF-8 Only ERBHandler - Benchmark Result # The Original ERBHandler Warming up -------------------------------------- default 836.000 i/100ms Calculating ------------------------------------- default 8.582k (± 4.9%) i/s - 43.472k in 5.077812s # Patched ERBHandler Warming up -------------------------------------- default 1.281k i/100ms Calculating ------------------------------------- default 13.229k (± 6.9%) i/s - 66.612k in 5.058864s

Slide 101

Slide 101 text

UTF-8 Only ERBHandler - Benchmark Result 1.5x faster!

Slide 102

Slide 102 text

Only 1.5x? This process includes erb template => ruby compilation One more thing.
 Memory consumption has to be reduced

Slide 103

Slide 103 text

Profiling the Memory Usage memory_profiler gem https:/ /github.com/ SamSaffron/ memory_profiler

Slide 104

Slide 104 text

Profiling the Memory Usage of the ERB Handler require 'benchmark/ips' view = Class.new(ActionView::Base).new('.') template = view.lookup_context.find_template('foo') erb = ::ActionView::Template::Handlers::ERB.new report = MemoryProfiler.report do erb.call template end report.pretty_print

Slide 105

Slide 105 text

Memory Usage Result # The Original ERBHandler allocated memory by class ----------------------------------- 1989 String 640 MatchData 232 Hash 160 Array 144 ActionView::Template::Handlers::Erubis 80 Symbol 40 Range # Patched ERBHandler allocated memory by class ----------------------------------- 1660 String 640 MatchData 232 Hash 160 Array 144 ActionView::Template::Handlers::Erubis 80 Symbol 40 Range

Slide 106

Slide 106 text

Memory Usage Memory usage is also very important If we could reduce this, we would be able to put more workers in a webapp container

Slide 107

Slide 107 text

So, I’d Like to Propose Removing the Encoding Support from Rails Maybe in Rails 6?

Slide 108

Slide 108 text

BTW, This was about the ERB Handler

Slide 109

Slide 109 text

If You're Using Haml There are faster alternative implementations Faml: https:/ /github.com/ eagletmt/faml Hamlit: https:/ /github.com/ k0kubun/hamlit

Slide 110

Slide 110 text

Just Bundle Either of These Gems, Then You’ll Get the Speed! (Taken from Hamlit’s README)

Slide 111

Slide 111 text

AS::SafeBuffer As we saw in the method calls count, SafeBuffer is heavily used in ActionView

Slide 112

Slide 112 text

AS::SafeBuffer Very adhoc implementation Every String has a flag inside Every template String concatenation is performed here

Slide 113

Slide 113 text

Faster AS::SafeBuffer I tried to use Object#tainted flag... but this didn't work Maybe we could make a faster SafeBuffer in C?

Slide 114

Slide 114 text

I18n Alternative I18n is unnecessarily complex (e.g. who uses a non-Yaml backend?) What if we make a simple I18n alternative that does nothing but just a simple Hash lookup?

Slide 115

Slide 115 text

I18n Alternative - implementation WIP Almost working, but some tests are still failing

Slide 116

Slide 116 text

ActiveRecord

Slide 117

Slide 117 text

Reducing Arel Objects Current AR query creates so many Arel Node objects Since AR 4, AR caches Arel Nodes in memory (AdequateRecord)

Slide 118

Slide 118 text

Reducing Arel Objects - Concept If the query is simple enough, directly compose an SQL statement, just like we were doing in AR1 and 2 If the query is not simple enough, fallback to `super` (AR default behavior) No caching! (because building the whole query is as cheap as computing a cache key)

Slide 119

Slide 119 text

Reducing Arel Objects - Implementation WIP Almost working on Rails 4, not working on Rails 5 The product is called Arenai Arelɹɹ=> ΞϨΔ
 No Arel => ΞϨͳ͍

Slide 120

Slide 120 text

Arenai find - Implementation # The code is a little bit shortened for the presentation slide module Arenai module Base def find(*ids) return super unless ids.length == 1 return super if block_given? || primary_key.nil? || default_scopes.any? || columns_hash.include?(inheritance_column) || ids.first.kind_of?(Array) id = ids.first return super if !((Fixnum === id) || (String === id)) # SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 1]] find_by_sql("SELECT #{quoted_table_name}.* FROM #{quoted_table_name} WHERE #{quoted_table_name}.#{connection.quote_column_name primary_key} = $1", [[columns_hash[primary_key], id]]).first ennnd

Slide 121

Slide 121 text

Arenai - Expectation Get the AR1 speed back Less Object creation Less memory consumption

Slide 122

Slide 122 text

AR Object creation AR 5 creates an Object per each attribute in a model instance This brings a flexibility But it's sometimes too much For example, think of a batch system that selects 100,000 records that has 20 columns. That would create 2,000,000 "attribute" Objects I'm thinking of a plugin that can reduce this Object creation somehow

Slide 123

Slide 123 text

A Plugin or Patch Doing This Nothing have done yet

Slide 124

Slide 124 text

model.present? # We sometimes do this, but… if @current_user.present? ... end

Slide 125

Slide 125 text

Do Never Hit model.present? model.present? causes massive method calls Guess how many. 3? 5?

Slide 126

Slide 126 text

Method Calls That Happen When You Hit @user.present? Object#present? Object#blank? ActiveRecord::AttributeMethods#respond_to? ActiveModel::AttributeMethods#respond_to? Kernel#respond_to? Kernel#respond_to_missing? Kernel#respond_to? Kernel#respond_to_missing? Symbol#to_s ActiveModel::AttributeMethods#matched_attribute_method Kernel#class ActiveModel::AttributeMethods::ClassMethods#attribute_method_matchers_matching ActiveModel::AttributeMethods::ClassMethods#attribute_method_matchers_cache Concurrent::Collection::MriMapBackend#compute_if_absent Concurrent::Collection::NonConcurrentMapBackend#[] Mutex#synchronize Concurrent::Collection::NonConcurrentMapBackend#compute_if_absent Hash#fetch ##attribute_method_matchers Symbol#to_proc Enumerable#partition Array#each ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#plain? Array#reverse Array#flatten Array#map ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ##new Struct#initialize ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher#match Regexp#=~ ##new Struct#initialize Array#compact Enumerable#detect Array#each ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher::AttributeM ethodMatch#attr_name ActiveRecord::AttributeMethods::PrimaryKey#attribute_method? ActiveRecord::AttributeMethods#attribute_method? ActiveRecord::AttributeSet#key? ActiveRecord::LazyAttributeHash#key? Hash#key? Hash#key? Hash#key? ActiveModel::AttributeMethods::ClassMethods::AttributeMethodMatcher::AttributeM ethodMatch#attr_name ActiveRecord::AttributeMethods::PrimaryKey#attribute_method? ActiveRecord::AttributeMethods#attribute_method? ActiveRecord::AttributeSet#key? ActiveRecord::LazyAttributeHash#key? Hash#key? Hash#key? Hash#key? NilClass#nil?

Slide 127

Slide 127 text

85 Method Calls! Significantly becoming high-cost after AR4,5 refactoring What a trap!

Slide 128

Slide 128 text

https://github.com/rails/ rails/pull/23394 I suggested a patch to fix this situation, but the proposal was turned down Because Rails is expecting you all to be careful enough never to walk into this trap

Slide 129

Slide 129 text

Or You Can Monkey-Patch module ActiveRecord class Base def present? true end def blank? false end end end

Slide 130

Slide 130 text

Other Kinds of Performance Concerns Development env Booting up the App Running tests

Slide 131

Slide 131 text

A Rails App Booting Process Bundles the gems Requires the libraries Loads the app Runs the "Initializers"

Slide 132

Slide 132 text

Maybe There's an Initializer Taking Too Much Time # railties/lib/rails/initializable.rb module Rails module Initializable def run_initializers(group=:default, *args) return if instance_variable_defined?(:@ran) initializers.tsort_each do |initializer| + now = Time.now initializer.run(*args) if initializer.belongs_to?(group) + p initializer.name => Time.now - now if Time.now - now > 0.1 end @ran = true end

Slide 133

Slide 133 text

Which Gem Takes Time When Being required? # bundler/lib/bundler/runtime.rb module Bundler class Runtime < Environment def require(*groups) groups.map!(&:to_sym) groups = [:default] if groups.empty? @definition.dependencies.each do |dep| # Skip the dependency if it is not in any of the requested # groups next unless (dep.groups & groups).any? && dep.current_platform? + now = Time.now required_file = nil ... end + p dep.name => Time.now - now ennnnd

Slide 134

Slide 134 text

This Way, I Found Dozens of Gems That Slows Down Our App Boot (And so I sent dozens of patches) Most of them just didn't use `AS.on_load` properly e.g.) https:/ /github.com/zdennis/ activerecord-import/pull/136

Slide 135

Slide 135 text

There Are Some Gems That Shouldn't Be required via Bundler e.g.) pry-doc
 Actually, pry-* Just add `require: false` to each of them in your Gemfile

Slide 136

Slide 136 text

Prying out the Mechanism pry-doc takes 0.2sec to load on my MBP(SSD) When booting pry, it scans through all the installed gems and tries to require every gem that matches pry-* (see lib/pry/plugins.rb) You do not at all have to require pry-* via Bundler. You might not need them until you boot pry

Slide 137

Slide 137 text

Squashing All Gems Into One Directory Every RubyGem has its own path, and its own namespace inside Can’t all these gems be merged into one directory so we could make $LOAD_PATH shorter, then make require faster?

Slide 138

Slide 138 text

bundle-squash - Implementation https:/ /github.com/ amatsuda/bundle- squash

Slide 139

Slide 139 text

bundle-squash Still not perfectly working with Rails No significant performance improvement :<

Slide 140

Slide 140 text

Kernel#require vs Kernel#require_relative ko1 once told me that require_relative must be faster Can we speed up Rails boot by replacing `require` => `require_relative`?

Slide 141

Slide 141 text

require_relative branch I tried. https:/ /github.com/amatsuda/ rails/tree/require_relative No significant speed improvement :<

Slide 142

Slide 142 text

autoload in production We should better avoid autoloading in production, especially on a forked process Let’s make sure that we’re not autoloading anything

Slide 143

Slide 143 text

Detecting autoload TracePoint.new(:call, :c_call) do |tp| if tp.method_id == :autoload b = tp.binding case tp.event when :call p [tp.lineno, tp.defined_class, b.local_variable_get(:const_name)] when :c_call if b.local_variable_defined?(:const_name) p [tp.lineno, tp.defined_class, b.local_variable_get(:const_name), b.local_variable_get(:full), b.local_variable_get(:path)] else p [tp.lineno, tp.defined_class, b.local_variables] puts caller puts end end end end.enable

Slide 144

Slide 144 text

I Found 2 Occurrences When posting a form: rack-2.0.0.alpha/lib/rack/ multipart.rb:8 rack-2.0.0.alpha/lib/rack/ multipart.rb:9 There’ll probably be more?

Slide 145

Slide 145 text

Speeding Up Tests

Slide 146

Slide 146 text

INSERT INTO schema_migrations We noticed that our app with 600 tables took 1 minute to create all tables in CircleCI

Slide 147

Slide 147 text

What Was Happening "INSERT INTO schema_migrations (version) VALUES ('20160504000000');" "INSERT INTO schema_migrations (version) VALUES ('20160504000001');" "INSERT INTO schema_migrations (version) VALUES ('20160504000002');" … (600 SQLs)

Slide 148

Slide 148 text

We Changed This to… "INSERT INTO schema_migrations (version) VALUES ('20160504000000'), ('20160504000001'), ('20160504000002');" (1 SQL!)

Slide 149

Slide 149 text

This Commit Is Included in Rails 5 https:/ /github.com/rails/ rails/commit/42dd233 This patch was provided by MoneyForward (@ppworks and me)

Slide 150

Slide 150 text

If You Feel Like Your Database Cleaning Is Slow database_cleaner’s delete and truncate strategy deletes (truncates) all tables That is unbearably slow if you have hundreds of tables ✕ thousands of test cases

Slide 151

Slide 151 text

In Such Case, Use database_rewinder

Slide 152

Slide 152 text

database_rewinder - Concept It memorizes the inserted table names per each test And deletes only from those tables

Slide 153

Slide 153 text

database_rewinder - Implementation https:/ /github.com/ amatsuda/ database_rewinder

Slide 154

Slide 154 text

ActiveSupport

Slide 155

Slide 155 text

Some Slow Parts in AS Multibyte timezones (AS::TimeWithZone is unbearably slow) How can we not load them?

Slide 156

Slide 156 text

AS::Multibyte Consists of Multibyte::Chars and Multibyte::Unicode Loads the whole Unicode database file (AS/ values/unicode_tables.dat), which consumes time and memory I’m not sure if we still need this (doesn’t ruby have this?) I suppose we Japanese don’t use most of the features provided here

Slide 157

Slide 157 text

AS::TimeWithZone Known as a slower version of Time

Slide 158

Slide 158 text

Time vs AS::TimeWithZone Benchmark.ips do |x| x.report('Time') { Time.now } x.report('Time.zone.now') { Time.zone.now } x.compare! end

Slide 159

Slide 159 text

AS::TimeWithZone is 25x Slower Than Time! Warming up -------------------------------------- Time 145.872k i/100ms Time.zone.now 8.557k i/100ms Calculating ------------------------------------- Time 2.209M (± 7.6%) i/s - 11.086M in 5.048168s Time.zone.now 88.469k (± 4.3%) i/s - 444.964k in 5.039123s Comparison: Time: 2209154.5 i/s Time.zone.now: 88468.8 i/s - 24.97x slower

Slide 160

Slide 160 text

If You’re 100% Sure What You Are Doing Maybe you could speed up your app by using Time instead of AS::TimeWithZone

Slide 161

Slide 161 text

Boosting with C Extensions Sometimes, reimplementing performance hotspot in C would boost performance

Slide 162

Slide 162 text

Boosting with C Extensions CGI.escapeHTML (ruby 2.3) CGI.escape (ruby 2.4) fast_blank (SamSaffron’s gem) hwia (HashWithIndifferentAccess in C for Rails 2)

Slide 163

Slide 163 text

Use Newest Ruby If you’re still using ruby < 2.3 You’ll get the performance for free just by updating ruby

Slide 164

Slide 164 text

Conclusion

Slide 165

Slide 165 text

Maybe What We Need Is More Flexibility and Modularity YMMV

Slide 166

Slide 166 text

YMMV There will be no one single bottleneck for every app Some apps might have 1000 models, some apps might have 3000 lines of routes.rb If you feel your Rails app is slow, you need to find your solution

Slide 167

Slide 167 text

Rails is Omakase It’s a really good thing for newbies that we don’t need no special configuration But in some cases we need some special customization on certain parts

Slide 168

Slide 168 text

Maybe What We Need Is More Flexibility and Modularity I know a software designed that way It used to be called “Merb”!

Slide 169

Slide 169 text

Everybody Let’s Hack! There remains so many problems So many possibilities of improvements Everyone, do reveal your hacks! Let there be more alternatives That would bring the missing “Merbism” back to the community

Slide 170

Slide 170 text

end