Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Turbo Boosting Real-world Applications

Turbo Boosting Real-world Applications

Slides for RailsConf 2018 talk "Turbo Boosting Real-world Applications" http://railsconf.com/program/sessions#session-596

Akira Matsuda

April 17, 2018
Tweet

More Decks by Akira Matsuda

Other Decks in Programming

Transcript

  1. Turbo Boosting
    Real-world
    Applications
    Akira Matsuda

    View Slide

  2. Turbo Boosting

    View Slide

  3. Real-world Applications

    View Slide

  4. Question

    View Slide

  5. Is Your Application Fast
    Enough?

    View Slide

  6. My Answer
    No. My app is not.

    View Slide

  7. When We Started a Project with a
    Simple Scaffold, It Wasn't That Slow

    View Slide

  8. But Our Production App
    Today Is Slow
    I guess this applies to any and
    all Rails applications

    View Slide

  9. Is That Essentially
    Because Ruby Is Slow?
    I don't think so

    View Slide

  10. Ruby Is Already Doing
    Very Well
    Even if we completely disable
    Ruby GC, we don't actually get
    that much performance gain
    Freezing Strings in your
    application code may not solve
    the performance problem

    View Slide

  11. The Real Problem Lies in the
    Framework Architecture
    And some very slow
    components inside the
    framework

    View Slide

  12. Typical Performance Diagram

    (taken from https://www.skylight.io/)

    View Slide

  13. These Are All Serially
    Executed in the Main Thread
    For example, while querying to
    the DB, Ruby is doing nothing.
    Just waiting.

    View Slide

  14. In Other Words, These Are
    All Blocking Operations

    View Slide

  15. What If We Can Perform Them
    Without Blocking the Main Thread?

    View Slide

  16. In Parallel?

    View Slide

  17. Non-blocking?

    View Slide

  18. Menu
    Turbo Boosting External API Calls
    Turbo Boosting DB Queries
    Turbo Boosting Partial Renderings
    Turbo Boosting Lazy Attributes
    Turbo Boosting Named Urls

    View Slide

  19. Turbo Boosting External
    API Calls

    View Slide

  20. Turbo Boosting External
    API Calls
    Let's start with the easiest one

    View Slide

  21. API Calls
    Typically via HTTP
    Actually call some outside APIs
    Or microservices

    View Slide

  22. "Microservices"
    Microservices will not solve
    your performance problem
    It can be a solution for your
    scalability problem
    It would rather add some extra
    network overhead on your app

    View Slide

  23. Problem
    Calling external APIs makes
    your application slow

    View Slide

  24. While Waiting for the
    HTTP Response
    The API call blocks the main
    thread
    The CPU does nothing while
    waiting for the response

    View Slide

  25. Can We Make This Non-
    blocking?
    By doing the work in the
    background thread?

    View Slide

  26. Example

    View Slide

  27. Example
    The client has to call a heavy
    API 3 times
    Each API call takes 1 second

    View Slide

  28. The API
    # Sleeps 1 second and says 'Hello'
    % rackup -b "run ->(e) { sleep 1; [200, {}, ['Hello']] }"

    View Slide

  29. The Client Code
    % ruby -rhttpclient -e "t = Time.now;
    3.times { p HTTPClient.new.get('http://localhost:
    9292/').content };
    p Time.now - t"

    View Slide

  30. Result
    % ruby -rhttpclient -e "t = Time.now;
    3.times { p HTTPClient.new.get('http://localhost:
    9292/').content };
    p Time.now - t"
    #=> This takes 3 seconds

    View Slide

  31. Solution

    View Slide

  32. Using Threads
    % ruby -rhttpclient -e "t = Time.now;
    3.times.map { Thread.new { HTTPClient.new.get('http://
    localhost:9292/') } }.each {|t| p t.value.content };
    p Time.now - t"

    View Slide

  33. Using Threads
    % ruby -rhttpclient -e "t = Time.now;
    3.times.map { Thread.new { HTTPClient.new.get('http://
    localhost:9292/') } }.each {|t| p t.value.content };
    p Time.now - t"
    #=> This finishes in 1 second!

    View Slide

  34. "Future Pattern"
    Thread.new { (do something) }.value
    Thread#value waits for the block
    to finish (internally with
    Thread#join)
    You can do anything else in the
    main thread while other threads
    are running

    View Slide

  35. "Future Pattern"
    Usually we wrap this Thread
    with a "future object"

    View Slide

  36. "Future Pattern"
    future = Future.execute { some_background_tasks }
    do_some_heavy_tasks_in_the_main_thread
    value = future.value # join the background thread

    View Slide

  37. Turbo Boosting External
    API Calls Using Threads
    Push an I/O blocking task to a child Thread
    The main thread can do some other heavy tasks
    I know the reality is not that simple
    For example, in many cases, you will be
    caching some results in the client side. In such
    case, you need to synchronize the threads
    before caching
    But anyway, think about using threads. This is
    the basic idea

    View Slide

  38. Turbo Boosting DB
    Queries

    View Slide

  39. DB Queries Are So

    Time Consuming
    Obviously, the most time-
    consuming tasks in most of the
    real-world Rails apps
    It's essentially just another kind
    of I/O blocking task

    View Slide

  40. While Active Record Is Waiting for the
    DB Server Response, the Ruby Process
    Is Doing Nothing!

    View Slide

  41. How AR Deals with
    Connections
    AR pools the DB connections
    Each HTTP request kicks one
    Ruby Thread (or Process) in the
    app server
    AR checks out a connection
    from the pool per each Thread

    View Slide

  42. So, One Request Uses Only One
    Connection, Although There Are So
    Many More Pooled Connections

    View Slide

  43. Problem

    View Slide

  44. DB Query Blocks the
    Main Thread
    When you throw a query to the
    DB, you need to wait until you
    get the results back

    View Slide

  45. Solution

    View Slide

  46. Querying in a Child
    Thread
    Maybe we can apply the same
    pattern with the API case?

    View Slide

  47. Example

    View Slide

  48. A Very Heavy Finder
    Query
    class User < ApplicationRecord
    def self.heavy_find(id)
    select('*, sleep(id)').where(id: id).first
    end
    end

    View Slide

  49. Takes 3 Seconds for
    heavy_finding User 1 and 2
    % rails r "User.first;
    p Benchmark.realtime {
    p User.heavy_find(1).name, User.heavy_find(2).name
    }"
    "user 1"
    "user 2"
    3.129794000182301

    View Slide

  50. We Can Do This in 2
    Seconds Using Threads!
    % rails r "User.first;
    p Benchmark.realtime {
    t1= Thread.new { User.heavy_find(1).name };
    t2 = Thread.new { User.heavy_find(2).name };
    p t1.value, t2.value
    }"
    "user 1"
    "user 2"
    2.0408139997161925

    View Slide

  51. This Is Great! Why Doesn't Active
    Record Do This by Default?

    View Slide

  52. Problem with This
    Approach
    Each Thread automatically
    establishes a new connection
    You'd better use with_connection
    to explicitly checkout and release
    a connection in a Thread
    User.connection.pool.with_conne
    ction { ... }

    View Slide

  53. So This Checks Out 3
    Connections...
    % rails r "User.first;
    p Benchmark.realtime {
    t1= Thread.new { User.connection.pool.with_connection
    { User.heavy_find(1).name } };
    t2 = Thread.new { User.connection.pool.with_connection
    { User.heavy_find(2).name } };
    p t1.value, t2.value; p User.connection.pool.stat
    }"
    "user 1"
    "user 2"
    {:size=>5, :connections=>3, :busy=>1, :dead=>2, :idle=>0,

    :waiting=>0, :checkout_timeout=>5}
    2.0807580002583563

    View Slide

  54. Implementation

    View Slide

  55. I Baked This into an
    Experimental Plugin

    View Slide

  56. With the Following

    2 APIs:
    # Fires the query in a background Thread. Joins at #records
    call
    AR::Relation#future
    # e.g. @posts = current_user.posts.future
    # Runs the block in a background Thread, checking out a new
    AR connection and releasing it. Returns a Future object
    FutureRecords.future(&block)

    View Slide

  57. GH/amatsuda/
    future_records
    Very roughly implemented
    No tests, no documentations, no
    comments
    But it works
    Actually, it's already used in our
    production app at Money Forward
    Please be careful not to exhaust all the
    connections in the connection pool

    View Slide

  58. Future Improvements
    Introduce a thread pool instead
    of Thread.new for performance
    and safety
    I'll explain this later through
    another example

    View Slide

  59. Two Other Possible
    Approaches
    Don't checkout a new
    connection per Thread. Share
    the main connection
    Use asynchronous connection

    View Slide

  60. Sharing the Main
    Connection
    Mutex.synchronize { Pass the main
    connection to a child thread when
    querying }
    Cannot run queries in parallel.
    Less performance gain
    Maybe we can use Thread + Fiber

    View Slide

  61. Async Connection
    For DB adapters that have
    asynchronous query API
    e.g. mysql2, postgres

    View Slide

  62. Async Connection
    Example (mysql2)
    client.query(some_very_heavy_query, async: true) # This
    method immediately returns nil
    # and once the query finishes,
    result = client.async_result # This returns a normal
    ResultSet

    View Slide

  63. You Need to Create a Mechanizm
    to Detect When the Query Is Done

    View Slide

  64. Async Connection + Active
    Record + EventMachine
    I could kind of make this work locally, but it required
    super crazy monkey-patches on AR::Relation,
    FinderMethods, connection adapters, etc.
    Also, maybe we need to create another connection
    pool instance that handles async connections
    There's an existing library for doing this. Check out
    em-synchrony project if you're interested in this
    approach
    I personally don't want my production Rails app to
    heavily depend on EM though

    View Slide

  65. Turbo Boosting Partial
    Renderings

    View Slide

  66. We Often Have Slow
    Partial Templates
    render_partial of course blocks
    the main thread
    And in most cases partials do
    not depend on each other
    So, we may be able to render
    them asynchronously

    View Slide

  67. With Ajax?
    Rails Ajax
    I guess everybody comes up
    with this idea and have
    implemented their own plugin

    View Slide

  68. And Here's My
    Implementation
    <%= render @users, remote: true %>

    View Slide

  69. GH/amatsuda/
    ljax_rails
    Actually I did this 5.years.ago
    And realized that this is not really a good
    approach
    Because the partial needs an extra
    routes and a controller. It’s like creating a
    whole set of API for just a partial template
    It adds another huge overhead for Ajax
    roundtrip, especially on narrowband

    View Slide

  70. Instead, Let's Think About
    Simply Threading render_partial
    Future pattern again
    Doesn't this perfectly work if AR
    connections are not concerned?

    View Slide

  71. Initial Implementation
    module AsyncRenderer
    def render(context, options, block)
    if (options.delete(:async) || (options[:locals]&.delete(:async)))
    FuturePartial.new { super }
    else
    super
    ennnd
    class FuturePartial
    def initialize(&block)
    @thread = Thread.new(&block)
    end
    def to_s
    @thread.value
    ennd
    ActionView::PartialRenderer.prepend AsyncRenderer

    View Slide

  72. Let's Measure!
    Adding <% sleep 1 %> in a
    parent template and a partial,
    and see how the performance
    was changed

    View Slide

  73. Like This
    # routes.rb
    resources :users do
    collection do
    get :a
    end
    end
    # show.html.erb
    A
    <%= render 'b', locals: {async: true} %>
    <% sleep 1 %>
    # _b.html.erb
    B
    <% sleep 1 %>

    View Slide

  74. The Result
    This kinda works! Seems like it
    returns a correct HTML.
    But, NO performance gain. AT
    ALL.

    View Slide

  75. Why?

    View Slide

  76. Let's See What’s Actually
    Executed in Ruby-level

    View Slide

  77. Action View Compiles Each
    Template to a Ruby Method

    View Slide

  78. Let's Check the Compiled
    Template Source
    Maybe the easiest way to show
    the Ruby code is to add
    something like
    puts source; puts
    at the bottom of the bundled
    actionview gem's
    ActionView::Template#compile

    View Slide

  79. The Source
    def
    _app_views_users_a_html_erb__247788595159253739_70287467036600(lo
    cal_assigns, output_buffer)
    _old_virtual_path, @virtual_path = @virtual_path, "users/
    a";_old_output_buffer = @output_buffer;;@output_buffer =
    output_buffer ||
    ActionView::OutputBuffer.new;@output_buffer.safe_append='A
    '.freeze;@output_buffer.append=( render 'b', async:
    true );@output_buffer.safe_append='
    '.freeze; sleep 1
    @output_buffer.to_s
    ensure
    @virtual_path, @output_buffer = _old_virtual_path,
    _old_output_buffer
    end

    View Slide

  80. @output_buffer.append=
    @output_buffer.append=( render 'b', async: true );

    View Slide

  81. @output_buffer.append=
    Creates a future object via
    render async: true, then appends
    the future object to the buffer

    View Slide

  82. Implementation of
    @output_buffer.append=
    module ActionView
    class OutputBuffer < ActiveSupport::SafeBuffer #:nodoc:
    ...
    def <<(value)
    return self if value.nil?
    super(value.to_s)
    end
    alias :append= :<<
    ...

    View Slide

  83. Immediate to_s Call is
    Happening
    @output_buffer.append= calls
    to_s on the future object
    immediately after its creation
    Then it causes the background
    Thread's join

    View Slide

  84. But Why Do We Need to
    Call to_s There?
    Because ActionView::OutputBuffer
    < ActiveSupport::SafeBuffer < String
    You need to make sure that the
    value is_a String before <Otherwise, it may cause an error,
    or an unexpected result

    View Slide

  85. Like This
    '' << :x
    #=> no implicit conversion of Symbol into String

    (TypeError)
    '' << 10
    #=> "\n"

    View Slide

  86. How Can We Make Future
    Partial Objects Live Longer?
    Immediate to_s call is
    inevitable so far as the buffer
    is_a String
    What if we store the view
    fragments in an Array, then
    concat them at the very last?

    View Slide

  87. The Array Buffer
    module ArrayBuffer
    def initialize(*)
    super
    @values = []
    end
    def <<(value)
    @values << value unless value.nil?
    self
    end
    alias :append= :<<
    def to_s
    @values.join # or something like that
    end
    ...
    end
    ActionView::OutputBuffer.prepend AsyncPartial::ArrayBuffer

    View Slide

  88. Measuring Again
    Completed 200 OK in 1026ms
    (Views: 1013.0ms | ActiveRecord:
    0.7ms)

    View Slide

  89. Measuring Again
    It works perfectly!
    Now it returns the result in 1
    second!

    View Slide

  90. BTW, If You're Looking for the Fastest
    Template Engine on the current
    String-based OutputBuffer
    There's an implementation that
    is faster than Erubi, or Haml, or
    any other existing template
    engine in the world
    The gems is called
    string_template

    View Slide

  91. GH/amatsuda/
    string_template
    It compiles the whole template
    in one single String literal with
    interpolations
    Which is of course significantly
    faster than string <<
    another_string <<
    another_string...

    View Slide

  92. Anyway, Now Let's See How the
    Array-based Version Scales!

    View Slide

  93. Extract the Repetition in
    index.html.erb to a Partial
    # app/views/users/index.html.erb

    <% @users.each do |user| %>
    -
    - <%= user.name %>
    - <%= link_to 'Show', user %>
    - <%= link_to 'Edit', edit_user_path(user) %>
    - <%= link_to 'Destroy', user, method: :delete,
    data: { confirm: 'Are you sure?' } %>
    -
    + <%= render partial: 'user', object: user, locals:
    {async: true} %>
    <% end %>

    View Slide

  94. With Some Random
    Slowness to the Partial
    <% sleep(rand(3) / 100.0) %>

    View Slide

  95. Register 10 Users
    % rails r '(1..10).each {|i| User.create! name: "user
    #{i}"}'

    View Slide

  96. The Result

    View Slide

  97. Or a 500 Error
    ActionView::Template::Error
    (Target thread must not be
    current thread)

    View Slide

  98. What the Hell Is
    Happening?

    View Slide

  99. It's Called

    Race Condition

    View Slide

  100. Why Does This Code
    Cause Race Condition?
    def
    _app_views_users__user_html_erb___590070358791478326_70218505010200(local_assigns,
    output_buffer)
    _old_virtual_path, @virtual_path = @virtual_path, "users/_user";_old_output_buffe
    = @output_buffer;user = local_assigns[:user]; user = user;;@output_buffer =
    output_buffer || ActionView::OutputBuffer.new;@output_buffer.safe_append='
    '.freeze;@output_buffer.append=( user.name );@output_buffer.safe_append='
    '.freeze;@output_buffer.append=( link_to 'Show',
    user );@output_buffer.safe_append='
    '.freeze;@output_buffer.append=( link_to 'Edit',
    edit_user_path(user) );@output_buffer.safe_append='
    '.freeze;@output_buffer.append=( link_to 'Destroy', user, method: :delete, data
    { confirm: 'Are you sure?' } );@output_buffer.safe_append='

    '.freeze; sleep(rand(3) / 100.0)
    @output_buffer.to_s
    ensure
    @virtual_path, @output_buffer = _old_virtual_path, _old_output_buffer
    end

    View Slide

  101. Because It Shares an Instance
    Variable @output_buffer Between
    Threads!

    View Slide

  102. We Need to Change the Buffer Object
    to Be a Local Variable or a Thread
    Local Variable

    View Slide

  103. And in Order to Achieve This, We Need
    to Monkey-patch the Erubi Template
    Handler

    View Slide

  104. I'm Not Gonna Paste the Whole Patch
    Here, But It's Been Done Like This
    properties[:bufvar] = "output_buffer"
    # and so on...

    View Slide

  105. And So It Works Now!

    View Slide

  106. Now Let's Try to Render
    _form.html.erb Asynchronously
    # new.html.erb
    <%= render partial: 'form', locals: {user: @user, async:
    true} %>

    View Slide

  107. Then, It Renders
    Something Broken

    View Slide

  108. OMG

    View Slide

  109. This Happens Because of
    Action View's capture Helper
    Which is used to render the
    block content inside <%= ... do %>
    capture creates a new buffer,
    swaps @output_buffer ivar, then
    swaps it back at the end
    It's impossible to do such thing
    for a lvar

    View Slide

  110. But I Could Emulate the Behavior in
    Another Way Somehow

    View Slide

  111. With This Patch, Rails Would Run
    Hundreads or Thousands of Threads
    at Once
    Which would make the whole
    response time rather slower

    View Slide

  112. We Need to Control the
    Number of Running Threads

    View Slide

  113. Introducing a Thread
    Pool
    Thread.new in Ruby is not
    cheap
    Running too many Threads at
    once costs unignorable Thread
    switching cost

    View Slide

  114. Thread Pool
    Implementation
    We can create our own
    Or concurrent-ruby ships with a
    good one
    concurrent-ruby should be
    already bundled on your app
    through Active Support

    View Slide

  115. So, I Finally Finished Implementing
    an Async Partial Renderer!
    With a lot of monkey-patches
    But, this works only with Erubi so far
    We have so many other template
    engines, such as Erubis, Haml, Slim, etc.
    Especially, monkey-patching Haml is so
    tough
    (Even for the main maintainer of Haml...!)

    View Slide

  116. The Code

    View Slide

  117. GH/amatsuda/
    async_partial

    View Slide

  118. And, These Are All Template
    Engines for Rendering HTML Files
    What about .json renderers?

    View Slide

  119. Jbuilder
    The Default JSON Renderer
    Completely not working
    Because Jbuilder is
    implemented very differently
    from other orthodox template
    engines

    View Slide

  120. I Suppose Many of You May Have
    Already Switched to a Fast and
    Elegant Alternative

    View Slide

  121. Called Jb

    View Slide

  122. Jb of Course Works Perfectly with This
    Array Buffer and Threaded Partials

    View Slide

  123. GH/amatsuda/jb

    View Slide

  124. Turbo Boosting Lazy
    Attributes

    View Slide

  125. So, Let's Move on to The View
    Code, and Find What's Slow There

    View Slide

  126. Now Let's Try to Make
    Something Heavy and Realistic

    View Slide

  127. Example

    View Slide

  128. Scaffolding
    % rails g scaffold post col1 col2 col3 col4 col5 col6 col7
    col8 col9 col10 col11 col12 col13 col14 col15 col16 col17
    col18 col19 col20 col21 col22 col23 col24 col25 col26
    col27 col28 col29 col30 col31 col32 col33 col34 col35
    col36 col37 col38 col39 col40 col41 col42 col43 col44
    col45 col46 col47 col48 col49 col50 col51 col52 col53
    col54 col55 col56 col57 col58 col59 col60 col61 col62
    col63 col64 col65 col66 col67 col68 col69 col70 col71
    col72 col73 col74 col75 col76 col77 col78 col79 col80
    col81 col82 col83 col84 col85 col86 col87 col88 col89
    col90 col91 col92 col93 col94 col95 col96 col97

    View Slide

  129. With the Data
    % rails r '(1..1000).each {|i| Post.create! col1: i, col2: i, col3:
    i, col4: i, col5: i, col6: i, col7: i, col8: i, col9: i, col10: i,
    col11: i, col12: i, col13: i, col14: i, col15: i, col16: i, col17:
    i, col18: i, col19: i, col20: i, col21: i, col22: i, col23: i,
    col24: i, col25: i, col26: i, col27: i, col28: i, col29: i, col30:
    i, col31: i, col32: i, col33: i, col34: i, col35: i, col36: i,
    col37: i, col38: i, col39: i, col40: i, col41: i, col42: i, col43:
    i, col44: i, col45: i, col46: i, col47: i, col48: i, col49: i,
    col50: i, col51: i, col52: i, col53: i, col54: i, col55: i, col56:
    i, col57: i, col58: i, col59: i, col60: i, col61: i, col62: i,
    col63: i, col64: i, col65: i, col66: i, col67: i, col68: i, col69:
    i, col70: i, col71: i, col72: i, col73: i, col74: i, col75: i,
    col76: i, col77: i, col78: i, col79: i, col80: i, col81: i, col82:
    i, col83: i, col84: i, col85: i, col86: i, col87: i, col88: i,
    col89: i, col90: i, col91: i, col92: i, col93: i, col94: i, col95:
    i, col96: i, col97: i }'

    View Slide

  130. Benchmark
    % curl http:/
    /localhost:3000/
    posts
    Run this several times, abandon
    the fastest and slowest results

    View Slide

  131. Results
    Completed 200 OK in 1610ms (Views: 1568.9ms |
    ActiveRecord: 40.4ms)
    Completed 200 OK in 1693ms (Views: 1511.1ms |
    ActiveRecord: 43.3ms)
    Completed 200 OK in 1555ms (Views: 1484.5ms |
    ActiveRecord: 69.9ms)
    Completed 200 OK in 1668ms (Views: 1626.1ms |
    ActiveRecord: 41.9ms)
    Completed 200 OK in 1791ms (Views: 1737.3ms |
    ActiveRecord: 53.1ms)

    View Slide

  132. Let's See What Takes
    Time in Views

    View Slide

  133. What If We Changed the
    Attribute Accesses to Literals?
    - <%= post.col1 %>
    - ...
    - <%= post.col97 %>
    + <%= 'post.col1' %>
    + ...
    + <%= 'post.col97' %>

    View Slide

  134. Results
    Completed 200 OK in 803ms (Views: 747.5ms |
    ActiveRecord: 55.2ms)
    Completed 200 OK in 827ms (Views: 782.5ms |
    ActiveRecord: 44.2ms)
    Completed 200 OK in 820ms (Views: 775.9ms |
    ActiveRecord: 43.2ms)
    Completed 200 OK in 833ms (Views: 721.8ms |
    ActiveRecord: 110.3ms)
    Completed 200 OK in 834ms (Views: 781.1ms |
    ActiveRecord: 52.6ms)

    View Slide

  135. This Means,

    View Slide

  136. Half of the Response Time Was Spent
    on Reading Values from Already
    Selected AR Model Instance

    View Slide

  137. Why Does Just Accessing
    Attributes Take That Much Time?
    It should be just a method call,
    right?

    View Slide

  138. Let's Count The Number
    of Method Calls
    % rails r 'p = Post.first; (trace = TracePoint.new(:call) {|t| p
    "#{t.defined_class}##{t.method_id}"}).enable; p.col1; trace.disable'
    "#0x00007fbece82af70>#__temp__36f6c613"
    "ActiveRecord::AttributeMethods::Read#_read_attribute"
    "ActiveModel::AttributeSet#fetch_value"
    "ActiveModel::AttributeSet#[]"
    "ActiveModel::LazyAttributeHash#[]"
    "ActiveModel::LazyAttributeHash#assign_default_value"
    "##from_database"
    "ActiveModel::Attribute#initialize"
    "ActiveModel::Attribute#value"
    "ActiveModel::Attribute::FromDatabase#type_cast"
    "ActiveModel::Type::Value#deserialize"
    "ActiveModel::Type::Value#cast"
    "ActiveModel::Type::String#cast_value"

    View Slide

  139. 13 Method Calls per 1
    String Attribute Access!

    View Slide

  140. And 30 Method Calls per 1
    Timestamp Attribute Access!
    % rails r 'p = Post.first; (trace = TracePoint.new(:call) {|t| p "#{t.defined_class}##{t.method_id}"}).enable; p.created_at;
    trace.disable'
    "##__temp__36275616475646f51647"
    "ActiveRecord::AttributeMethods::Read#_read_attribute"
    "ActiveModel::AttributeSet#fetch_value"
    "ActiveModel::AttributeSet#[]"
    "ActiveModel::LazyAttributeHash#[]"
    "ActiveModel::LazyAttributeHash#assign_default_value"
    "##from_database"
    "ActiveModel::Attribute#initialize"
    "ActiveModel::Attribute#value"
    "ActiveModel::Attribute::FromDatabase#type_cast"
    "ActiveRecord::AttributeMethods::TimeZoneConversion::TimeZoneConverter#deserialize"
    "##deserialize"
    "##__getobj__"
    "ActiveModel::Type::Value#deserialize"
    "##cast"
    "ActiveModel::Type::Value#cast"
    "ActiveModel::Type::DateTime#cast_value"
    "ActiveModel::Type::Helpers::TimeValue#fast_string_to_time"
    "ActiveModel::Type::Helpers::TimeValue#new_time"
    "ActiveRecord::Type::Internal::Timezone#default_timezone"
    "##default_timezone"
    "ActiveRecord::AttributeMethods::TimeZoneConversion::TimeZoneConverter#convert_time_to_time_zone"
    "Object#acts_like?"
    "##zone"
    "DateAndTime::Zones#in_time_zone"
    "##find_zone!"
    "Object#acts_like?"
    "DateAndTime::Zones#time_with_zone"
    "ActiveSupport::TimeWithZone#initialize"
    "ActiveSupport::TimeWithZone#transfer_time_values_to_utc_constructor"

    View Slide

  141. So, for Looping 1000 Records
    and Accesing 100 Columns...
    Does Ruby make 13 * 100 * 1000
    = 130,0000 method calls?

    View Slide

  142. Yes, It Really Does
    % rails r 'calls = 0; trace = TracePoint.new(:call) {|t| calls += 1 };
    Post.all.each {|p| trace.enable; p.id; p.col1; p.col2; p.col3; p.col4;
    p.col5; p.col6; p.col7; p.col8; p.col9; p.col10; p.col11; p.col12;
    p.col13; p.col14; p.col15; p.col16; p.col17; p.col18; p.col19; p.col20;
    p.col21; p.col22; p.col23; p.col24; p.col25; p.col26; p.col27; p.col28;
    p.col29; p.col30; p.col31; p.col32; p.col33; p.col34; p.col35; p.col36;
    p.col37; p.col38; p.col39; p.col40; p.col41; p.col42; p.col43; p.col44;
    p.col45; p.col46; p.col47; p.col48; p.col49; p.col50; p.col51; p.col52;
    p.col53; p.col54; p.col55; p.col56; p.col57; p.col58; p.col59; p.col60;
    p.col61; p.col62; p.col63; p.col64; p.col65; p.col66; p.col67; p.col68;
    p.col69; p.col70; p.col71; p.col72; p.col73; p.col74; p.col75; p.col76;
    p.col77; p.col78; p.col79; p.col80; p.col81; p.col82; p.col83; p.col84;
    p.col85; p.col86; p.col87; p.col88; p.col89; p.col90; p.col91; p.col92;
    p.col93; p.col94; p.col95; p.col96; p.col97; p.created_at;
    p.updated_at; trace.disable }; p calls'
    1335000

    View Slide

  143. So, Active Record Is Slow
    Not because Ruby is slow
    But because the code is written
    to be slow

    View Slide

  144. Of Course, the Example I
    Showed Here Is a Silly UI
    We won't usually render 1,000
    records in a single page
    In such case, we would use
    pagination

    View Slide

  145. kaminari/kaminari
    With this plugin

    View Slide

  146. But There Are Some Use Cases

    That We Deal with Thousands of

    AR Model Instances, e.g.
    APIs
    Batches
    Fintech apps

    View Slide

  147. In Fact, We Actually Hit This
    Problem at Money Forward
    We had to render 2,500 models
    in one page, which was
    unbearably slow

    View Slide

  148. IMO Active Record Model is
    Designed to Do Too Much Work
    What we really need here in this
    situation is just a value object (something
    like "entity bean" in the Java world)
    AR model is apparently an overkill for
    this usage
    AR object has too many features such as
    type casting, dirty tracking, serialization,
    validation, etcetc.

    View Slide

  149. AR Implements Two
    Different Roles in One Class
    Data transfer object that
    transfers readonly data between
    MVC layers
    Form object that accepts user
    inputs and safely saves them to
    the DB

    View Slide

  150. And What We Need in This Scenario Is
    Just a Lightweight Readonly Object

    View Slide

  151. Probably We Can Transfer the
    ResultSet into Some Kind of DTO
    (Data Transfer Object)?
    Which is simply based on Ruby
    Struct?

    View Slide

  152. It Should Kinda Work for a Simple Use
    Case Like the Example in This Slides
    But we don't want to do that in
    Ruby. Ruby is not Java.
    And we want to use associations,
    some other methods defined on
    the model class, etc.
    And it won't play nice with our
    favorite decorator plugin

    View Slide

  153. GH/amatsuda/
    active_decorator

    View Slide

  154. Instead, Why Don't We Just Store
    the Attributes as a Hash Instance?
    And just delegate the attribute
    accessors to the Hash instance?
    (Actually, AR used to be
    designed that way)

    View Slide

  155. Problem
    AR attribute reader method is
    slow

    View Slide

  156. Solution

    View Slide

  157. Let’s Solve This Problem Not by
    Adding More Complexity but
    Retrieving Back the Simplicity

    View Slide

  158. Good Old Hash-based
    Attributes
    We need to monkey-patch AR
    internals

    View Slide

  159. Recent Versions of Active Record
    Implements the "Attribute API"

    View Slide

  160. Attribute API
    Highly extensible, elegantly
    customizable
    It's a great feature, indeed
    But... who actually uses this
    feature in production?

    View Slide

  161. Attribute API
    Implementation
    In order to implement this
    feature, AR holds an instance of
    LazyAttribute per each column
    per each model instance

    View Slide

  162. Can’t We Opt-out This
    Rarely Used Feature?
    And let AR objects work
    speedily by default?
    It's great that AR has a lot of
    elegant features, but we want
    the model instances to perform
    as fast as possible by default

    View Slide

  163. Implementation

    View Slide

  164. If The Model Declares No Custom
    Attribute, Return a Good Old Simple
    Hash Based Model Instance
    I suppose this would speed up
    99.8% of AR models in the world

    View Slide

  165. Implementation

    View Slide

  166. An AttributeSet Alternative That
    Simply Delegates to a Given Hash
    Attributes
    module LightweightAttributes
    class AttributeSet
    delegate :each_value, :fetch, :except, :[], :
    []=, :key?, :keys, to: :attributes
    def initialize(attributes)
    @attributes = attributes
    end
    def fetch_value(name)
    self[name]
    end
    ...
    ennd

    View Slide

  167. An AttributeSet Builder that Builds the
    Lightweight AttributeSet when Building
    an Instance from DB Query Result
    module LightweightAttributes
    class AttributeSet
    class Builder
    ...
    def build_from_database(values = {},
    _additional_types = {})
    LightweightAttributes::AttributeSet.new values
    ennnnd

    View Slide

  168. Overriding AR::Base.attributes_builder
    to Return the Lightweight
    AttributeSet Builder
    module ARBaseClassMethods
    def attributes_builder
    # If the model has no custom attribute
    if attributes_to_define_after_schema_loads.empty?
    LightweightAttributes::AttributeSet::Builder.new(...)
    else
    super
    ennnd

    View Slide

  169. Results (Before)
    Completed 200 OK in 1610ms (Views: 1568.9ms |
    ActiveRecord: 40.4ms)
    Completed 200 OK in 1693ms (Views: 1511.1ms |
    ActiveRecord: 43.3ms)
    Completed 200 OK in 1555ms (Views: 1484.5ms |
    ActiveRecord: 69.9ms)
    Completed 200 OK in 1668ms (Views: 1626.1ms |
    ActiveRecord: 41.9ms)
    Completed 200 OK in 1791ms (Views: 1737.3ms |
    ActiveRecord: 53.1ms)

    View Slide

  170. Results (After)
    Completed 200 OK in 971ms (Views: 926.5ms |
    ActiveRecord: 44.4ms)
    Completed 200 OK in 998ms (Views: 950.3ms |
    ActiveRecord: 46.8ms)
    Completed 200 OK in 1128ms (Views: 1073.2ms |
    ActiveRecord: 54.1ms)
    Completed 200 OK in 927ms (Views: 876.1ms |
    ActiveRecord: 50.1ms)
    Completed 200 OK in 963ms (Views: 919.3ms |
    ActiveRecord: 42.9ms)

    View Slide

  171. Results
    The whole scaffold app
    became 40% faster!!!
    Because of less method
    invocations and less object
    creations

    View Slide

  172. It's Still Not Production
    Ready Though
    % rails r 'p [(c = Post.first.created_at), c.class]'
    ["2018-04-16 21:13:21.667499", String]

    View Slide

  173. Other Possible APIs
    Add a new method on
    AR::Relation that returns a
    lightweight Model collection, and
    don't change the default behavior
    Change Relation#readonly
    method to return a lightweight
    Model collection

    View Slide

  174. But I Basically Prefer Automagic
    APIs over Too explicit APIs

    View Slide

  175. The Code

    View Slide

  176. GH/amatsuda/
    lightweight_attributes

    View Slide

  177. Turbo Boosting Named
    Urls

    View Slide

  178. Now the AR Attributes Became Fast
    Enough, What in the View Is Slow
    Next?

    View Slide

  179. What Is the Slowest Thing
    in the Scaffold View?

    View Slide

  180. The Answer Is, the Links

    View Slide

  181. If We Remove these 3 Links
    from posts#index View
    # app/views/posts/index.html.erb
    <%= post.col95 %>
    <%= post.col96 %>
    <%= post.col97 %>
    - <%= link_to 'Show', post %>
    - <%= link_to 'Edit', edit_post_path(post) %>
    td>
    - <%= link_to 'Destroy', post, method: :delete,
    data: { confirm: 'Are you sure?' } %>

    <% end %>

    View Slide

  182. Results (Before)
    Completed 200 OK in 971ms (Views: 926.5ms |
    ActiveRecord: 44.4ms)
    Completed 200 OK in 998ms (Views: 950.3ms |
    ActiveRecord: 46.8ms)
    Completed 200 OK in 1128ms (Views: 1073.2ms |
    ActiveRecord: 54.1ms)
    Completed 200 OK in 927ms (Views: 876.1ms |
    ActiveRecord: 50.1ms)
    Completed 200 OK in 963ms (Views: 919.3ms |
    ActiveRecord: 42.9ms)

    View Slide

  183. Results (After)
    Completed 200 OK in 661ms (Views: 608.2ms |
    ActiveRecord: 51.8ms)
    Completed 200 OK in 604ms (Views: 563.4ms |
    ActiveRecord: 40.0ms)
    Completed 200 OK in 574ms (Views: 533.2ms |
    ActiveRecord: 39.8ms)
    Completed 200 OK in 735ms (Views: 695.3ms |
    ActiveRecord: 38.9ms)
    Completed 200 OK in 698ms (Views: 657.7ms |
    ActiveRecord: 39.3ms)

    View Slide

  184. Results
    35% performance gain even
    with the 100 columns view!
    For a typical models like with
    10-ish columns, it changes
    more, like 70%

    View Slide

  185. Problem
    named_url Is Slow

    View Slide

  186. Solution
    If the OutputBuffer is already
    Array based, there's a very
    simple solution
    We can futurize it

    View Slide

  187. Rendering the Links
    Asynchronously
    module FutureUrlHelper
    def link_to(name = nil, options = nil, html_options =
    nil, &block)
    if ((Hash === options) && options.delete(:async)) ||
    ((Hash === html_options) && html_options.delete(:async))
    FutureObject.new { super }
    else
    super
    ennnd

    View Slide

  188. In This Particular Example, It Won't Be
    That Effective Because the Links Are
    Already at the Very Bottom of the Page

    View Slide

  189. Another Possible
    Solution
    Cache url_for results in memory

    View Slide

  190. I Created This
    2.years.ago
    It may be helpful if your app
    heavily uses named urls

    View Slide

  191. GH/amatsuda/
    turbo_urls

    View Slide

  192. What We Learned

    View Slide

  193. What We Learned (1)
    If you have external API calls in your app, consider
    doing them in child threads
    You can run AR queries in Threads, but be careful not
    to use up all pooled connections
    ActionView::OutputBuffer can be Array based, for
    some future extensions
    Monkey-patching Haml is hard
    LazyAttribute is so lazy, and opting this out may
    drastically boost the performance
    url_for is slow, and we need to fix it

    View Slide

  194. What We Learned (2)
    You can find what’s slow in your
    app
    And YOU can fix it
    If the problem lies inside the
    framework, just hack the framework
    It should be fun!

    View Slide

  195. What We Learned (3)
    Performance is not for free
    There are certain trade offs
    In Rails' case, we need to craft
    so many evil monkey-patches
    Maybe because the framework
    is not flexible enough

    View Slide

  196. What We Learned (4)
    Thread programming,
    especially debugging is hard
    I don’t wanna do this anymore
    I'm really looking forward for
    the new Thread model planned
    to be introduced in Ruby 3

    View Slide

  197. Future Plans
    Finish implementing the plugins that I introduced today
    All these plugins are experimental. They basically have
    no tests, no documentations, no comments at the
    moment
    Put them in actual production apps
    I'm sorry but the title of this talk was probably a little bit
    misleading
    Introduce more extensibility to the framework
    I realized some things that should better be changed in
    the framework side rather than in monkey-patch plugins

    View Slide

  198. end

    View Slide

  199. end
    name: Akira Matsuda
    GitHub: @amatsuda
    Twitter: @a_matsuda

    View Slide