$30 off During Our Annual Pro Sale. View Details »

Rails Performance Issues and Solutions

Akira Matsuda
September 29, 2019

Rails Performance Issues and Solutions

Slides for my keynote at RubyConf Indonesia 2019 in Jakarta https://ruby.id/conf/2019/ #RubyconfID

Akira Matsuda

September 29, 2019
Tweet

More Decks by Akira Matsuda

Other Decks in Programming

Transcript

  1. Rails Performance
    Issues

    and

    Solutions
    Akira Matsuda

    View Slide

  2. Akira Matsuda

    View Slide

  3. From Japan
    !

    View Slide

  4. amatsuda on GH

    View Slide

  5. A Rails Committer &

    A Ruby Committer

    View Slide

  6. Rails Commits This
    Month

    View Slide

  7. Mostly Micro-Optimizations
    + Keyword Arguments Fixes

    View Slide

  8. The Keyword
    Arguments Fixes

    View Slide

  9. The Ruby Team Is Going to Introduce
    Some Keyword Arguments
    Incompatibilities in Ruby 3

    And I'm investigating the new
    specification on the Rails
    codebase

    View Slide

  10. I'm Not Going to Talk About
    The Details About This Topic

    View Slide

  11. But Anyway I Work on Such
    Things Between Ruby and Rails

    View Slide

  12. A Gem Author

    View Slide

  13. kaminari, active_decorator,
    action_args, heavens_door, erd,
    i18n_generators, etc

    View Slide

  14. But I'm Not Going to Talk
    About These Gems Today

    View Slide

  15. A Conference
    Organizer

    View Slide

  16. The Organizer of
    RubyKaigi

    https:/
    /rubykaigi.org/

    View Slide

  17. But I'm Not Going to Talk
    About That Conference Today

    View Slide

  18. Today I'm Going to Talk
    About the Rails Performance

    View Slide

  19. Chapter 1

    View Slide

  20. Let's Start with a
    Benchmark!

    View Slide

  21. ruby -v

    ruby 2.6.3p62 (2019-04-16
    revision 67580) [x86_64-
    darwin18]

    View Slide

  22. rails -v

    Rails 6.0.0

    View Slide

  23. rails new

    $ rails new rubyconfid --skip-
    spring --skip-bootsnap

    View Slide


  24. # Gemfile
    @@ -31,8 +31,6 @@ group :development, :test do
    end
    group :development do
    - # Access an interactive console on exception pages or by calling 'console'
    anywhere in the code.
    - gem 'web-console', '>= 3.3.0'
    gem 'listen', '>= 3.0.5', '< 3.2'
    end
    @@ -43,6 +41,3 @@ group :test do
    # Easy installation and use of web drivers to run system tests with browsers
    gem 'webdrivers'
    end
    -
    -# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
    -gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]

    View Slide

  25. scaffold

    $ rails g scaffold post title

    View Slide

  26. Prepare 50 "Hello"
    Records

    $ rails r "50.times { Post.create!
    title: 'hello' }"

    View Slide

  27. Starting Up the Rails Server
    in `production` Mode

    $ rails assets:precompile

    $ cp db/development.sqlite3 db/production.sqlite3

    $ env RAILS_SERVE_STATIC_FILES=1 rails s -e production

    View Slide

  28. How Can We Measure the
    Application Performance?

    View Slide

  29. Maybe We Can Use
    time Command?

    $ time curl http:/
    /localhost:3000/posts/ > /dev/null

    View Slide

  30. Maybe We Can Use
    Something Like ab or gatling?

    $ ab -n 100 -c 10 -l http:/
    /127.0.0.1:3000/posts

    View Slide

  31. These Approaches Have
    Certain HTTP Overhead

    View Slide

  32. So, Instead,

    View Slide

  33. How About Benchmarking from
    Inside the Application Using Ruby?

    View Slide

  34. The Default Benchmarking
    Library in Ruby

    The stdlib benchmark.rb

    View Slide

  35. benchmark.rb Usage
    Benchmark.realtime do
    ...
    end

    View Slide

  36. Which Part of the App
    Would We Measure?

    Maybe the controllers'
    `process_action` method?

    View Slide

  37. A Very Rough Monkey-patch for
    Benchmarking `process_action`
    # config/initializers/perf.rb
    ActionController::Base.prepend(
    Module.new {
    def process_action(*)
    result = nil
    p Benchmark.realtime { result = super }
    result
    end
    }
    )

    View Slide

  38. config/initializers/*.rb

    I usually draft this sort of quick
    monkey-patches in `config/
    initializers/`

    No need to explicitly `require`

    No need to give a name

    Maybe nobody else does this, but
    works for me...

    View Slide

  39. Now the App Prints Each
    Request Processing Time

    But `ab`ing with `-n 100` prints
    100 lines

    What we want is a summary of
    100 runs

    View Slide

  40. How About Iterating
    Inside the Request

    Rather than making hundreds
    of requests?

    View Slide

  41. Other Benchmarking Libraries
    for Doing This in Ruby

    benchmark-ips

    benchmark_driver

    View Slide

  42. A Benchmarking Script that
    Iterates the `process_action`
    # config/initializers/perf.rb
    ActionController::Base.prepend(
    Module.new {
    def process_action(*)
    Benchmark.ips do |x|
    x.report('process_action') { super }
    end
    super
    end
    }
    )

    View Slide

  43. Usage

    Start up the server and request
    from outside just once

    View Slide

  44. Like This

    $ curl http:/
    /localhost:3000/posts/

    View Slide

  45. This Would Return Pure Execution
    Time of the Rails Land Alone

    View Slide

  46. Result
    Warming up --------------------------------------
    process_action 218.000 i/100ms
    Calculating -------------------------------------
    process_action 2.321k (± 7.5%) i/s - 11.554k in 5.009905s

    View Slide

  47. Or We Could Include the Rack
    Middleware Stack and
    `before_action` and Such
    # config/initializers/perf.rb
    Rails::Engine.prepend(
    Module.new {
    def call(*)
    Benchmark.ips do |x|
    x.report('call') { super }
    end
    super
    end
    }
    )

    View Slide

  48. Result
    Warming up --------------------------------------
    call 22.000 i/100ms
    Calculating -------------------------------------
    call 240.911 (± 7.1%) i/s - 1.210k in 5.048689s

    View Slide

  49. This Means That the Action
    Runs 240 Times per Second

    Seems like quite an accurate
    number than the time
    command one

    View Slide

  50. Now We've Got the Basic
    Benchmarking Tool

    View Slide

  51. This Approach Is Quite
    Flexible

    We can `Benchmark.ips` any
    part of the application for
    measuring its speed

    So we can

    "devide and conquer"

    View Slide

  52. Benchmarking only the "M"
    in the Whole MVC Stack
    Benchmark.ips do |x|
    Model.find(...)
    end

    View Slide

  53. Benchmarking only the "V"
    in the Whole MVC Stack
    Benchmark.ips do |x|
    render(...)
    end

    View Slide

  54. How Can We Measure
    the "C"?

    View Slide

  55. In Fact, We Can Create a Simple
    Action That Does Not find nor
    render

    View Slide

  56. Route to /posts/ok
    # config/routes.rb
    Rails.application.routes.draw do
    - resources :posts
    + resources :posts do
    + collection do
    + get :ok
    + end
    + end
    # For details on the DSL available within this file, see
    https://guides.rubyonrails.org/routing.html
    end

    View Slide

  57. The Controller Action
    # app/controllers/posts_controller.rb
    class PostsController < ApplicationController
    before_action :set_post, only:
    [:show, :edit, :update, :destroy]
    + def ok
    + head :ok
    + end
    +
    # GET /posts
    # GET /posts.json
    def index

    View Slide

  58. These Results Tell Us
    That...

    Finding a record is slow

    Rendering it is also slow

    Request handling is slow as well

    View Slide

  59. There's No Single Bottle
    Neck in This Simple App

    View Slide

  60. Conclusion

    View Slide

  61. Rails Is Slow Because
    Everything Is Slow!

    View Slide

  62. Chapter 1 End

    View Slide

  63. Chapter 2

    View Slide

  64. Let's Make Everything
    Faster!

    View Slide

  65. Why Rails Is Slow, and How
    Can We Make Rails Faster?

    View Slide

  66. There Are Some "Numbers" That
    We Can Get from the Ruby Process

    Time elapsed

    Memory usage

    Method invocations

    System calls

    etc...

    View Slide

  67. The Number of Object
    Allocations

    Today, we're going to focus on
    the number of Object
    allocations

    View Slide

  68. Why Does the Number
    of Allocations Matters?

    Allocating Objects in Ruby is a
    very heavy process

    Plus, it causes GC pressure

    The number would not be
    affected by the environment

    View Slide

  69. Actually, Rails 6 Shows
    This in the Log

    Completed 200 OK in 58ms
    (Views: 9.0ms | ActiveRecord:
    2.6ms | Allocations: 7601)

    View Slide

  70. How Can We Measure
    Object Allocations?
    $ ruby -e "c = GC.stat(:total_allocated_objects); 100.times
    { _ = 'hello' }; p GC.stat(:total_allocated_objects) - c"
    100

    View Slide

  71. Quite Simple, Right?

    View Slide

  72. Number of Objects Allocated
    During a Rails Request
    # config/initializers/perf.rb
    Rails::Engine.prepend(
    Module.new {
    def call(*)
    c = GC.stat(:total_allocated_objects)
    result = super
    p GC.stat(:total_allocated_objects) - c
    result
    end
    }
    )

    View Slide

  73. Result

    $ curl http:/
    /localhost:3000/posts/ok

    #=> 691

    View Slide

  74. Or If You'd Like a Little
    Bit More Detailed Report

    View Slide

  75. Gems for Profiling Object
    Allocations & Memory Usage

    memory_profiler

    allocation_stats

    allocation_tracer

    etc.

    View Slide

  76. bundle
    memory_profiler
    # Gemfile
    gem 'byebug', platforms: [:mri, :mingw, :x64_mingw]
    end
    gem 'benchmark-ips'
    +gem 'memory_profiler'

    View Slide

  77. An Utility Method That
    Prints the Memory Usage
    module Prof
    def self.mem
    result = nil
    MemoryProfiler.report { result =
    yield }.pretty_print(retained_strings: 0,
    allocated_strings: 100, normalize_paths: true)
    result
    end
    end

    View Slide

  78. Then Profile
    Rails::Engine.prepend(
    Module.new {
    def call(*)
    Prof.mem { super }
    end
    }
    )

    View Slide

  79. curl the ok Action

    $ curl http:/
    /localhost:3000/posts/ok

    (abandoning the first trial)

    View Slide

  80. Result
    Total allocated: 63830 bytes (599 objects)
    Total retained: 21391 bytes (134 objects)
    allocated memory by gem
    -----------------------------------
    27104 actionpack-6.0.0/lib
    22574 activesupport-6.0.0/lib
    5718 rack-2.0.7/lib
    1976 actionview-6.0.0/lib
    1944 logger
    1520 ipaddr
    1157 railties-6.0.0/lib
    677 securerandom
    504 concurrent-ruby-1.1.5/lib
    320 activerecord-6.0.0/lib
    176 monitor
    80 turbolinks-5.2.1
    40 i18n-1.6.0/lib
    40 uri
    allocated memory by file
    -----------------------------------
    7128 actionpack-6.0.0/lib/action_controller/renderer.rb
    3702 rack-2.0.7/lib/rack/utils.rb
    3624 actionpack-6.0.0/lib/action_dispatch/http/response.rb
    ...

    View Slide

  81. It Turned Out That,

    a simple action that just returns
    a "200 OK" header creates 599
    objects and consumes 63830
    bytes of memory.

    View Slide

  82. Isn't That Too Much?

    View Slide

  83. Let's Try Reducing This
    As Much As Possible

    View Slide

  84. The memory_profiler
    Result Indicates That,

    actionpack allocates the most

    Especially,
    ActionController::Renderer

    View Slide

  85. But Who Creates the
    Renderer?

    View Slide

  86. We Actually Aren't
    Rendering Anything!

    We're just returning the "200
    OK" header

    View Slide

  87. Let's See Who Is
    Instantiating AC::Renderer
    # actionpack/lib/action_controller/renderer.rb
    @@ -63,6 +63,7 @@ module ActionController
    # It will be merged with the default Rack environment
    defined by
    # +ActionController::Renderer::DEFAULTS+.
    def initialize(controller, env, defaults)
    + puts caller
    @controller = controller
    @defaults = defaults
    @env = normalize_keys defaults.merge(env)

    View Slide

  88. Result
    .../actionpack-6.0.0/lib/action_controller/renderer.rb:54:in `new'
    .../actionpack-6.0.0/lib/action_controller/renderer.rb:54:in `new'
    .../actiontext-6.0.0/lib/action_text/engine.rb:50:in `block (3 levels)
    in '
    .../activesupport-6.0.0/lib/active_support/callbacks.rb:429:in
    `instance_exec'

    View Slide

  89. action_text/
    engine.rb??

    We aren't using action_text in
    this action!

    View Slide

  90. An Initializer in
    ActionText::Engine
    # actiontext-6.0.0/lib/action_text/engine.rb
    initializer "action_text.renderer" do |app|
    app.executor.to_run { ActionText::Content.renderer =
    ApplicationController.renderer }
    app.executor.to_complete { ActionText::Content.renderer =
    ApplicationController.renderer }
    ActiveSupport.on_load(:action_text_content) do
    self.renderer = ApplicationController.renderer
    end
    ActiveSupport.on_load(:action_controller_base) do
    before_action { ActionText::Content.renderer =
    ApplicationController.renderer.new(request.env) }
    end
    end

    View Slide

  91. So,

    View Slide

  92. Action Text 6.0.0
    Includes This Bug

    View Slide

  93. When Action Text Is Bundled and
    Required, It Instantiates An AC::Renderer
    Before Each Action In Every Controller

    View Slide

  94. The Solution

    View Slide

  95. Skip Requiring actiontext/
    railtie (unless you use it)
    # config/application.rb
    require_relative 'boot'
    -require 'rails/all'
    +%w(
    + active_record/railtie
    + active_storage/engine
    + action_controller/railtie
    + action_view/railtie
    + action_mailer/railtie
    + active_job/railtie
    + action_cable/engine
    + action_mailbox/engine
    + rails/test_unit/railtie
    + sprockets/railtie
    +).each do |railtie|
    + begin
    + require railtie
    + rescue LoadError
    + end
    +end

    View Slide

  96. Then Let's Measure the
    Memory Usage Again

    View Slide

  97. Result
    Total allocated: 63830 bytes (599 objects)
    !
    Total allocated: 55560 bytes (574 objects)

    View Slide

  98. The Memory Usage
    Improves 8000+ Bytes

    View Slide

  99. I Filed This Issue

    https:/
    /github.com/rails/
    rails/issues/36963

    View Slide

  100. So, This Would Be Fixed
    in 6.0.1, Hopefully

    View Slide

  101. What If We Skip Requiring Some
    More Other Components That We
    Don't Use Here?
    # config/application.rb
    @@ -2,13 +2,10 @@ require_relative 'boot'
    %w(
    active_record/railtie
    - active_storage/engine
    action_controller/railtie
    action_view/railtie
    action_mailer/railtie
    active_job/railtie
    - action_cable/engine
    - action_mailbox/engine

    View Slide

  102. Result

    Total allocated: 55560 bytes
    (574 objects)



    Total allocated: 55560 bytes
    (574 objects)

    View Slide

  103. Nothing Changed!

    View Slide

  104. Benchmark Result for
    Each Action (Allocation)

    ok: 55560 bytes (574 objects)

    show: 127210 bytes (1301 objects)

    new 173108 bytes (1657 objects)

    index: 597728 bytes (6333 objects)

    View Slide

  105. Now We Got the
    Numbers

    We need such numbers to see
    how we're optimizing things

    View Slide

  106. So, Let's Start
    Optimizing Rails

    View Slide

  107. Now We Have a List of Files
    that Consume Memory
    allocated memory by file
    -----------------------------------
    3702 rack-2.0.7/lib/rack/utils.rb
    3624 actionpack-6.0.0/lib/action_dispatch/http/response.rb
    2944 activesupport-6.0.0/lib/active_support/callbacks.rb
    2728 activesupport-6.0.0/lib/active_support/
    hash_with_indifferent_access.rb
    2720 actionpack-6.0.0/lib/action_dispatch/middleware/cookies.rb
    2144 activesupport-6.0.0/lib/active_support/messages/rotator.rb
    1944 ruby/lib/logger.rb
    1856 activesupport-6.0.0/lib/active_support/parameter_filter.rb
    1680 activesupport-6.0.0/lib/active_support/inflector/methods.rb
    1520 ruby/lib/ipaddr.rb
    1480 actionpack-6.0.0/lib/action_dispatch/journey/router.rb
    1280 actionpack-6.0.0/lib/action_controller/metal/
    instrumentation.rb

    View Slide

  108. rack, actionpack,
    activesupport, ruby...

    View Slide

  109. These Things Are Part
    of Your Application

    View Slide

  110. You Can Make Your Applications
    Fast by Making Rails Fast

    View Slide

  111. Rails Is Just a Ruby Program
    That You Can Patch

    View Slide

  112. First, Set Up gem-src

    View Slide

  113. Then Re-install rails

    View Slide

  114. gem-src Will Turn These
    Gems Into Git Repos

    View Slide

  115. And Now You're Ready
    to Commit Your Patches

    $ cd (bundle show actionpack)

    % cd `bundle show actionpack`
    (zsh)

    View Slide

  116. Just Read These Files in
    the Memory Usage List

    View Slide

  117. And Try to Reduce Object
    Creations as Much as Possible

    View Slide

  118. For Example,

    View Slide

  119. Regexp#=~ vs
    Regexp#match?

    View Slide

  120. Regexp#=~
    $ ruby -rmemory_profiler -e "s = 'ABC'; MemoryProfiler.report { s =~ /
    A/ }.pretty_print"
    Total allocated: 208 bytes (2 objects)
    allocated memory by location
    -----------------------------------
    208 -e:1
    allocated memory by class
    -----------------------------------
    168 MatchData
    40 String
    allocated objects by class
    -----------------------------------
    1 MatchData
    1 String

    View Slide

  121. Regexp#match?
    $ ruby -rmemory_profiler -e "s = 'ABC';
    MemoryProfiler.report { s.match?(/A/) }.pretty_print"
    Total allocated: 0 bytes (0 objects)
    Total retained: 0 bytes (0 objects)
    allocated memory by gem
    -----------------------------------
    NO DATA

    View Slide

  122. match? Is Way More Efficient
    Where We Don't Need the
    MatchData Instance

    View Slide

  123. The Patch

    View Slide

  124. Hash#merge vs
    Hash#[]=

    View Slide

  125. Hash#merge
    $ ruby -rmemory_profiler -e "h1, h2, h3 = {a: 1}, {a: 2},
    {a: 3}; MemoryProfiler.report
    { h1.merge(h2.merge(h3)) }.pretty_print"
    Total allocated: 336 bytes (2 objects)
    allocated memory by class
    -----------------------------------
    336 Hash
    allocated objects by class
    -----------------------------------
    2 Hash

    View Slide

  126. Hash#[]=
    $ ruby -rmemory_profiler -e "h1, h2 = {a: 1}, {a: 2};
    MemoryProfiler.report { h = h1.merge(h2); h[:a] = 3;
    h }.pretty_print"
    Total allocated: 168 bytes (1 objects)
    allocated memory by class
    -----------------------------------
    168 Hash
    allocated objects by class
    -----------------------------------
    1 Hash

    View Slide

  127. We'd Better Avoid
    Unnecessary Hash#merge

    View Slide

  128. The Patch

    View Slide

  129. Hash#fetch vs
    (Hash#[] || default)

    View Slide

  130. Hash#fetch Taking
    Two Arguments
    $ ruby -rmemory_profiler -e "h = {x: 1}; MemoryProfiler.report
    { h.fetch(:x, 'hello') }.pretty_print"
    Total allocated: 40 bytes (1 objects)
    allocated memory by class
    -----------------------------------
    40 String
    allocated objects by class
    -----------------------------------
    1 String

    View Slide

  131. Hash#[] || default
    $ ruby -rmemory_profiler -e "h = {x: 1};
    MemoryProfiler.report { h[:x] || 'hello' }.pretty_print"
    Total allocated: 0 bytes (0 objects)
    allocated memory by class
    -----------------------------------
    NO DATA

    View Slide

  132. The Patch

    View Slide

  133. I Pushed About 60 Micro-
    Optimization Patches Like These to
    Rails Since the 6.0.0 Release

    Since I decided to talk about
    Rails performance in this
    conference

    And I think I could reduce

    5ʙ10% memory consumption

    View Slide

  134. I Tried to cherry-pick These
    Commits to 6.0.0 to Show You the
    Result

    View Slide

  135. But I Gave Up

    We had too many conflicts

    View Slide

  136. So Instead, Let's Bundle Edge Rails
    and See How Things Are Improved
    in the Current master
    # Gemfile
    -gem 'rails', '~> 6.0.0'
    +gem 'rails', path: 'PATH/TO/RAILS'

    View Slide

  137. Result (posts/ok)

    55560 bytes (574 objects)



    54995 bytes (555 objects)

    View Slide

  138. View Slide

  139. The Improvement Was
    Indeed Only 1% Since 6.0.0

    1% of the memory usage

    3% of the number of the
    allocated objects

    View Slide

  140. Maybe Because We've Been Adding
    Other Heavy Stuffs to master?

    View Slide

  141. What About Other
    Actions?

    View Slide

  142. Result: posts/show

    127931 bytes (1305 objects)



    124773 bytes (1271 objects)

    View Slide

  143. BTW Why Does posts/show
    Consume 70K More Bytes Than
    posts/ok?

    View Slide

  144. Because It Uses Whole
    M + V + C Stack

    Where posts/ok uses only "C"

    View Slide

  145. posts/show = posts/ok
    + find + render

    View Slide

  146. In Order to Divide the
    Problem,

    View Slide

  147. It'd Be Better to Measure
    M and V Separately

    View Slide

  148. Measuring Active
    Record Finder Alone
    # app/controllers/posts_controller.rb
    private
    # Use callbacks to share common setup or constraints
    between actions.
    def set_post
    - @post = Post.find(params[:id])
    + Prof.mem do
    + @post = Post.find(params[:id])
    + end
    end

    View Slide

  149. Measuring Action View
    Renderer Alone
    # app/controllers/posts_controller.rb
    def show
    + Prof.mem { default_render }
    end

    View Slide

  150. Action Controller's
    default_render Method

    A method that is called when
    no explicit render was called

    View Slide

  151. So Easy, Right?

    View Slide

  152. Result: default_render

    40968 bytes (452 objects)

    out of

    124773 bytes (1271 objects)

    View Slide

  153. Approx. 1/3

    View Slide

  154. But Why Does Action View
    Consume 40KB Memory for
    Composing An 1.5KB HTML?

    View Slide

  155. According to the
    Profiling Results

    View Slide

  156. Things That Consume
    Memory in Action View

    Strings and String-ish Objects

    Template engines

    Template lookup

    I18n

    UrlHelper

    View Slide

  157. What Are the String-ish
    Objects?

    SafeBuffer object

    Multibyte::Chars object

    View Slide

  158. ActiveSupport::SafeBuffer

    A variety of String class that
    can distinguish "safe" string
    and "unsafe" string

    Inherits String class to hold a
    flag as an ivar inside

    View Slide

  159. Can't We Use the taint
    Flag for This?

    I tried, but it didn't work well

    Because we cannot create
    "tainted by default" Strings

    View Slide

  160. BTW Who Here Knows
    What "the taint" Flag Is?

    Who actually have ever used it?

    View Slide

  161. It's a Flag That Every Ruby
    Object Has But Nobody Uses

    View Slide

  162. So, the Ruby Core Team Just
    Decided to Remove the Flag
    Around Ruby 3.0

    View Slide

  163. ActiveSupport::SafeBuffer

    We may be able to reduce so
    much garbage in the views if we
    can improve this

    Maybe we can create some kind
    of extension (that reuses the
    flag bit in String?) in the future

    View Slide

  164. Template Engines

    Template Engines tend to
    create so many garbage Strings
    while composing an HTML
    body String

    View Slide

  165. Why Don't We Try "The Fastest
    Template Engine in the World"
    Here?

    View Slide

  166. You Know What?

    View Slide

  167. It's Called
    "string_template"

    https:/
    /github.com/
    amatsuda/string_template

    View Slide

  168. Its Mark Up Syntax Is Ruby's String
    Interpolation That Everybody
    Knows

    It's simple

    View Slide

  169. It Doesn't Internally Create
    the Heavy SafeBuffer Objects

    And fast

    View Slide

  170. But Has the Tradeoff

    It has no auto HTML safety
    guard

    We need to manually call `h()`
    just like we used to do before
    Rails 3

    View Slide

  171. Stringification
    -<%= notice %>
    +#{h notice }

    Title:
    - <%= @post.title %>
    + #{h @post.title }

    -<%= link_to 'Edit', edit_post_path(@post) %> |
    -<%= link_to 'Back', posts_path %>
    +#{ link_to 'Edit', edit_post_path(@post) } |
    +#{ link_to 'Back', posts_path }

    View Slide

  172. Result: After Stringifying
    the Templates

    40968 bytes (452 objects)



    40467 bytes (454 objects)

    View Slide

  173. Memory Usage Decreased, But
    Allocated Objects Increased...!

    Because string_template

    could create some extra
    String / SafeBuffer objects
    when concatenating strings
    with explicit `h()` call

    View Slide

  174. string_template

    Maybe efficient in some particular cases

    Inside heavy loops

    With lots of string interpolations

    That needs no HTML escaping

    But Action View is not generally
    optimized for this template engine

    View Slide

  175. Now, Let's Take a Look
    at the "new" Action

    View Slide

  176. Result: posts/new

    165836 bytes (1613 objects)

    View Slide

  177. Who Consumes That
    Much Memory?
    allocated memory by file
    -----------------------------------
    8088 i18n-1.6.0/lib/i18n/interpolate/ruby.rb
    7272 i18n-1.6.0/lib/i18n/backend/base.rb
    6396 actionview/lib/action_view/helpers/tag_helper.rb
    5088 actionview/lib/action_view/lookup_context.rb
    5060 rack-2.0.7/lib/rack/utils.rb
    4944 activesupport/lib/active_support/core_ext/string/output_safety.rb
    4552 actionview/lib/action_view/helpers/form_helper.rb

    View Slide

  178. Another Newcomer "I18n" Is the
    Top Memory Consumer Here!

    View Slide

  179. I18n.t(:hello) => "͜Μʹͪ͸"

    A very simple function that
    looks up the dictionary with a
    key, and returns a String

    View Slide

  180. Why Is I18n.t Heavy?

    Is it doing something more than
    just a Hash lookup?

    View Slide

  181. Let's See What's
    Happening
    $ env RAILS_ENV=production rails r 'I18n.t(:hello);
    Prof.mem { I18n.t :hello }'

    View Slide

  182. Result
    Total allocated: 1336 bytes (9 objects)
    allocated memory by class
    -----------------------------------
    928 Hash
    368 Array
    40 String
    allocated objects by class
    -----------------------------------
    4 Array
    4 Hash
    1 String

    View Slide

  183. It Turned Out That,

    View Slide

  184. I18n Creates Some Hash and Array
    Objects Per Each I18n.t Invocation

    Especially for manipulating
    given keys and options

    View Slide

  185. How Can We Reduce
    These Object Creations?

    View Slide

  186. The Solution

    Wrap I18n.t and cache

    View Slide

  187. Like This?
    module I18nCache
    def self.extended(obj)
    obj.instance_variable_set :@i18n_cache, {}
    end
    def translate(*args, **options)
    cache = @i18n_cache[I18n.locale] ||= {}
    cache_key = args << options
    cache[cache_key] || begin
    result = super
    cache[cache_key] = result if result
    result
    ennd
    alias t translate
    end
    I18n.extend I18nCache

    View Slide

  188. This Monkey-Patch May Not
    Be Production-Ready Yet

    I'm aware that we at least need to
    handle `:count` option

    There may be some more corner
    cases

    If this approach is too aggressive,
    maybe we could monkey-patch
    `Backend#lookup`

    View Slide

  189. Anyway, Now I18n.t
    Became Much Cheaper
    $ env RAILS_ENV=production rails r 'I18n.t(:hello); Prof.mem
    { I18n.t :hello }'
    Total allocated: 272 bytes (2 objects)
    allocated memory by class
    -----------------------------------
    232 Hash
    40 Array
    allocated objects by class
    -----------------------------------
    1 Array
    1 Hash

    View Slide

  190. Result: I18n.t(:hello)

    Total allocated: 1336 bytes (9 objects)



    Total allocated: 272 bytes (2 objects)

    View Slide

  191. Applying This Monkey-
    Patch to the Rails App

    View Slide

  192. Result: posts/new
    Memory Usage

    165836 bytes (1613 objects)



    143306 bytes (1476 objects)

    View Slide

  193. Next, Let's Investigate
    the index Action

    Which usually becomes the
    main hotspot in the app

    View Slide

  194. Result: 50 "Hello"
    Records

    Total allocated: 580397 bytes
    (6170 objects)

    View Slide

  195. It Consumes This Much Memory for
    Fetching and Showing

    "Hello" 50 Times

    View Slide

  196. Allocated Objects by
    Class
    allocated objects by class
    -----------------------------------
    3235 String
    1300 Array
    1019 Hash
    162 ActiveSupport::SafeBuffer
    100 ActiveModel::Attribute::FromDatabase
    50 ActiveModel::AttributeSet
    50 ActiveModel::LazyAttributeHash
    50 Post

    View Slide

  197. ActiveModel::Attribute::FromDatab
    ase, ActiveModel::AttributeSet,
    ActiveModel::LazyAttributeHash

    These are created through the
    Attributes API

    View Slide

  198. So Here's a Super Monkey-Patch
    That Could Reduce These Objects

    https:/
    /github.com/
    amatsuda/
    lightweight_attributes

    View Slide

  199. lightweight_attributes'
    Approach

    AR model has too many features

    But in most cases, what we need is "data
    transfer object" instances that are "read only"

    Such objects can be made more lightweight
    by dropping some heavy features

    e.g. dirty tracking, type casting

    View Slide

  200. How to Use
    lightweight_attributes

    Bundle the gem.

    Done.

    View Slide

  201. Allocated Objects by Class:
    with lightweight_attributes
    allocated objects by class
    -----------------------------------
    3187 String
    1293 Array
    996 Hash
    162 ActiveSupport::SafeBuffer
    52 Class
    50 LightweightAttributes::AttributeSet
    50 Post

    View Slide

  202. Allocated Objects by
    Class: Before => After
    3235 String
    1300 Array
    1019 Hash
    162 ActiveSupport::SafeBuffer
    100 ActiveModel::Attribute::FromDatabase
    50 ActiveModel::AttributeSet
    50 ActiveModel::LazyAttributeHash
    50 Post
    !
    3187 String
    1293 Array
    996 Hash
    162 ActiveSupport::SafeBuffer
    52 Class
    50 LightweightAttributes::AttributeSet
    50 Post

    View Slide

  203. Yay! We Could Reduce
    So Many Objects!!

    View Slide

  204. Result: Total Allocated with
    lightweight_attributes

    Total allocated: 580397 bytes
    (6170 objects)



    Total allocated: 586565 bytes
    (5981 objects)

    View Slide

  205. Allocated Objects Decreased,
    But Memory Usage Increased...!

    View Slide

  206. Probably Because We're Not
    Accessing All Fields in This
    Example

    The default implementation
    lazily type casts

    The lightweight_attributes
    version immediately type casts

    View Slide

  207. Anyway,

    View Slide

  208. What Consumes Memory
    in the index Action?
    allocated memory by location
    -----------------------------------
    35032 actionpack/lib/action_controller/metal/url_for.rb:42
    35032 actionpack/lib/action_dispatch/routing/route_set.rb:199
    35032 actionpack/lib/action_dispatch/routing/route_set.rb:231
    27200 rubyconfid/app/views/posts/index.html.erb:19
    26000 lightweight_attributes/lib/lightweight_attributes/attribute_set.rb:17
    25587 activesupport/lib/active_support/core_ext/string/output_safety.rb:175
    25368 actionpack/lib/action_dispatch/journey/visitors.rb:49
    23432 actionview/lib/action_view/helpers/url_helper.rb:683
    18920 actionpack/lib/action_dispatch/journey/visitors.rb:38

    View Slide

  209. Mostly from UrlHelper

    View Slide

  210. Improving UrlHelper Would
    Greatly Improve the index Action
    Performance

    We need your ideas!

    View Slide

  211. Another Thing That Is Allocated Per
    Each Action and Can Be Cached

    View Slide

  212. ActiveSupport
    Inflector

    post => posts

    post => Post

    etc.

    View Slide

  213. But We Cannot Just Cache
    Every Inflector Calls

    That'll cause a memory bloat
    vulnerability

    View Slide

  214. What If We Cache Only What's
    Included in the DB Schema?

    View Slide

  215. The Monkey-Patches
    for Inflector
    def camelize(term, uppercase_first_letter = true)
    return super unless uppercase_first_letter
    if (cached = InflectorCache.camelize_cache[term])
    cached.dup
    else
    super
    ennd
    def underscore(camel_cased_word)
    if (cached = InflectorCache.underscore_cache[camel_cased_word])
    cached.dup
    else
    super
    ennd

    View Slide

  216. And Another Monkey-Patch That
    Creates the Cache from Table
    Name
    module SchemaStatementsExtension
    def columns(table_name)
    result = super
    unless InflectorCache.camelize_cache[table_name]
    InflectorCache.camelize_cache[table_name] = table_name.camelize
    end
    unless InflectorCache.underscore_cache[table_name]
    InflectorCache.underscore_cache[table_name] = table_name.underscor
    end
    result
    ennd
    ActiveRecord::ConnectionAdapters::AbstractAdapter.prepend
    SchemaStatementsExtension

    View Slide

  217. Result: posts/ok with
    Inflector Cache

    50569 bytes (533 objects)

    =>

    49235 bytes (520 objects)

    View Slide

  218. Yet Another Thing That
    Can Be Cached

    View Slide

  219. Action View Template
    Lookup

    View Slide

  220. There Are Three Kinds
    of Template Lookup

    Layout

    Main tamplate

    Partial

    View Slide

  221. Caching Layout
    Resolution

    `:layout` per action is almost
    always non dynamic

    So it can be cached

    View Slide

  222. The Monkey-Patch Can
    Be Something Like This
    module CachedLayout
    private def _layout(lookup_context, formats)
    cache = self.class.instance_variable_get :@_layout_cache
    implied_layout_name =
    self.class.send(:_implied_layout_name)
    prefixes = /\blayouts/.match?(implied_layout_name) ? [] :
    ["layouts"]
    cache[[implied_layout_name, prefixes, formats]] ||= super
    ennd

    View Slide

  223. Only if Given `:layout` Is
    `nil` or String
    def _write_layout_method # :nodoc:
    super
    if _layout.nil? || (String === _layout)
    @_layout_cache = {}
    prepend CachedLayout
    ennd

    View Slide

  224. Result: Layout Cache

    116586 bytes (1227 objects)



    113323 bytes (1189 objects)

    View Slide

  225. Before Layout Cache
    allocated memory by file
    -----------------------------------
    4916 rack-2.0.7/lib/rack/utils.rb
    3839 actionpack/lib/action_dispatch/middleware/static.rb
    3821 ruby/lib/logger.rb
    3800 actionpack/lib/action_dispatch/http/response.rb
    3760 actionview/lib/action_view/lookup_context.rb

    View Slide

  226. After Layout Cache
    allocated memory by file
    -----------------------------------
    4884 rack-2.0.7/lib/rack/utils.rb
    3839 actionpack/lib/action_dispatch/middleware/static.rb
    3821 ruby/lib/logger.rb
    3800 actionpack/lib/action_dispatch/http/response.rb
    3688 activesupport/lib/active_support/callbacks.rb
    3552 actionpack/lib/action_dispatch/middleware/cookies.rb
    3504 activesupport/lib/active_support/hash_with_indifferent_access.rb
    3465 activesupport/lib/active_support/json/encoding.rb
    3458 actionview/lib/action_view/helpers/tag_helper.rb
    3311 activesupport/lib/active_support/message_encryptor.rb
    2851 activesupport/lib/active_support/core_ext/string/output_safety.rb
    2384 activesupport/lib/active_support/messages/rotator.rb
    2344 activesupport/lib/active_support/core_ext/hash/keys.rb
    2299 rubyconfid/app/views/layouts/application.html.string
    2198 activesupport/lib/active_support/core_ext/object/json.rb
    2080 actionview/lib/action_view/lookup_context.rb

    View Slide

  227. For Caching Partials

    Here's an experimental
    repository

    https:/
    /github.com/
    amatsuda/turbo_partial

    View Slide

  228. turbo_partial

    `render_partial` takes many
    parameters

    So it can't simply be cached

    turbo_partial adds `render_relative`
    and `render_absolute` features to
    `render_partial`

    View Slide

  229. Rendering with
    turbo_partial
    - render 'form', post: @post
    + render './form', post: @post

    View Slide

  230. I'm Skipping the Result Because It
    Wasn't That Much Efficient in This
    Case...

    View Slide

  231. Anyway,

    View Slide

  232. Who Are Allocating
    Objects Now? (post/ok)
    allocated memory by location
    -----------------------------------
    3451 actionpack/lib/action_dispatch/middleware/static.rb:95
    1347 ruby/lib/logger.rb:597
    1288 actionpack/lib/action_dispatch/http/response.rb:437
    1160 activesupport/lib/active_support/notifications/instrumenter.rb:67
    1152 activesupport/lib/active_support/callbacks.rb:405

    View Slide

  233. Logging

    Creates so many String objects
    that are usually much longer
    than the response body

    View Slide

  234. ActiveSupport::

    Instrumentation

    Creates large payload Hash
    objects and `dup`s them

    View Slide

  235. ActiveSupport::Callbacks

    Creates many Procs and things

    View Slide

  236. And What Is This
    Binding Thing?
    allocated memory by class
    -----------------------------------
    14440 Hash
    9987 String
    9688 Array
    2520 MatchData
    1728 ActionDispatch::Request
    1608 Thread::Backtrace
    1160 ActiveSupport::HashWithIndifferentAccess
    1040 Proc
    944 Rack::Utils::HeaderHash
    912 Class
    512 Regexp
    352 Binding

    View Slide

  237. It's Created Through
    `binding.local_variable_get`

    View Slide

  238. binding.local_variable_
    get

    A magical idiom to get a local
    variable that has a name that
    conflicts with reserved words

    View Slide

  239. This Doesn't Usually Happen, But
    Keyword Arguments Can Create
    This
    def do_something(if: nil, unless: nil)
    if if
    ...
    end
    if unless
    ...
    ennd

    View Slide

  240. Which Causes
    SyntaxError

    View Slide

  241. I Found This When I Was Trying to
    Introduce Keyword Arguments in
    Rails

    View Slide

  242. I Asked @ko1 How to
    Avoid This SyntaxError

    View Slide

  243. Then He Introduced
    This New Magic

    View Slide

  244. binding.local_variable_
    get
    def do_something(if: nil, unless: nil)
    if binding.local_variable_get(:if)
    ...
    end
    if binding.local_variable_get(:unless)
    ...
    end
    end

    View Slide

  245. Rails Started Using This in
    ActiveSupport::HashWithIndifferen
    tAccess Since 6.0

    View Slide

  246. But It Turned Out That It
    Allocates a Binding Object

    I hope we can fix this in Ruby 2.7

    View Slide

  247. HashWithIndifferentAc
    cess

    A variety of Hash that treats
    Symbol keys and String keys as the
    same things

    Converts all Symbol keys to String
    internally

    Abused too much in the Rails
    internal

    View Slide

  248. Maybe We Can Reduce
    HashWithIndifferentAccess

    By defaulting this to be Symbol
    based

    Thanks to Symbol GC from
    @nari3

    Or by implementing this in C

    View Slide

  249. Maybe We Can Reduce
    HashWithIndifferentAccess

    By introducing more keyword
    arguments

    View Slide

  250. Another Thing That Is Making
    This Benchmark Slow

    Thread synchronization

    View Slide

  251. Current Rails Codebase
    Synchronizes Threads Too
    Defensively

    Because not all supported
    Ruby implementations has GIL
    (GVL)

    View Slide

  252. Monitor#synchronize

    posts/show calls this 25 times
    per each request

    We could skip some of these in
    MRI

    View Slide

  253. Ruby 2.6.4 or Current
    Master

    `Monitor#synchronize` creates
    some extra Hash objects

    View Slide

  254. Today's Benchmark
    Result Becomes Like This
    allocated memory by file
    -----------------------------------
    7472 ruby/lib/monitor.rb
    4908 rack-2.0.7/lib/rack/utils.rb
    3839 actionpack/lib/action_dispatch/middleware/static.rb
    3821 ruby/lib/logger.rb
    3800 actionpack/lib/action_dispatch/http/response.rb
    3688 activesupport/lib/active_support/callbacks.rb
    3552 actionpack/lib/action_dispatch/middleware/cookies.rb

    View Slide

  255. We Need to Fix This in
    Ruby 2.7

    View Slide

  256. Summary

    View Slide

  257. posts/ok

    55560 bytes (574 objects)



    50299 bytes (531 objects)

    (-9.5%)

    View Slide

  258. show

    127210 bytes (1301 objects)



    113047 bytes (1192 objects)

    (-11.1%)

    View Slide

  259. new

    173108 bytes (1657 objects)



    139605 bytes (1435 objects)

    (-19.4%)

    View Slide

  260. index

    597728 bytes (6333 objects)



    569575 bytes (6092 objects)

    (-4.7%)

    View Slide

  261. Future Works

    More patches on Cookies,
    Instrumentation, UrlHelper, etc.

    Publish these monkey-patches
    as a gem

    View Slide

  262. Actually I Baked Some of These
    Silly Monkey-Patches into a Gem

    https:/
    /github.com/
    amatsuda/speed_king

    View Slide

  263. Wrap Up

    Rails is a huge software that has no obvious
    bottleneck

    But we can somehow improve it by micro-
    optimizations

    I already pushed so many patches in Rails
    master, so Rails 6.1 has to be faster than 6.0

    After such micro-optimizations, today's
    benchmark highlighted some performance
    hotspots

    View Slide

  264. Wrap Up (2)

    Try speed_king gem (not released yet)

    But don't in production...

    View Slide

  265. Wrap up (3)

    Rails is such a fun thing to play with!

    Performance tuning is like a game
    aiming for a high-score

    Everybody can play with it for free!
    Since it's open sourced!

    I'm waiting for you to join this game!

    View Slide

  266. end

    View Slide