Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Turbo Boosting Real-world Applications

Turbo Boosting Real-world Applications

Slides for RailsConf 2018 talk "Turbo Boosting Real-world Applications" http://railsconf.com/program/sessions#session-596

Akira Matsuda

April 17, 2018
Tweet

More Decks by Akira Matsuda

Other Decks in Programming

Transcript

  1. Turbo Boosting
    Real-world
    Applications
    Akira Matsuda

    View full-size slide

  2. Turbo Boosting

    View full-size slide

  3. Real-world Applications

    View full-size slide

  4. Is Your Application Fast
    Enough?

    View full-size slide

  5. My Answer
    No. My app is not.

    View full-size slide

  6. When We Started a Project with a
    Simple Scaffold, It Wasn't That Slow

    View full-size slide

  7. But Our Production App
    Today Is Slow
    I guess this applies to any and
    all Rails applications

    View full-size slide

  8. Is That Essentially
    Because Ruby Is Slow?
    I don't think so

    View full-size slide

  9. Ruby Is Already Doing
    Very Well
    Even if we completely disable
    Ruby GC, we don't actually get
    that much performance gain
    Freezing Strings in your
    application code may not solve
    the performance problem

    View full-size slide

  10. The Real Problem Lies in the
    Framework Architecture
    And some very slow
    components inside the
    framework

    View full-size slide

  11. Typical Performance Diagram

    (taken from https://www.skylight.io/)

    View full-size slide

  12. These Are All Serially
    Executed in the Main Thread
    For example, while querying to
    the DB, Ruby is doing nothing.
    Just waiting.

    View full-size slide

  13. In Other Words, These Are
    All Blocking Operations

    View full-size slide

  14. What If We Can Perform Them
    Without Blocking the Main Thread?

    View full-size slide

  15. In Parallel?

    View full-size slide

  16. Non-blocking?

    View full-size slide

  17. Menu
    Turbo Boosting External API Calls
    Turbo Boosting DB Queries
    Turbo Boosting Partial Renderings
    Turbo Boosting Lazy Attributes
    Turbo Boosting Named Urls

    View full-size slide

  18. Turbo Boosting External
    API Calls

    View full-size slide

  19. Turbo Boosting External
    API Calls
    Let's start with the easiest one

    View full-size slide

  20. API Calls
    Typically via HTTP
    Actually call some outside APIs
    Or microservices

    View full-size slide

  21. "Microservices"
    Microservices will not solve
    your performance problem
    It can be a solution for your
    scalability problem
    It would rather add some extra
    network overhead on your app

    View full-size slide

  22. Problem
    Calling external APIs makes
    your application slow

    View full-size slide

  23. While Waiting for the
    HTTP Response
    The API call blocks the main
    thread
    The CPU does nothing while
    waiting for the response

    View full-size slide

  24. Can We Make This Non-
    blocking?
    By doing the work in the
    background thread?

    View full-size slide

  25. Example
    The client has to call a heavy
    API 3 times
    Each API call takes 1 second

    View full-size slide

  26. The API
    # Sleeps 1 second and says 'Hello'
    % rackup -b "run ->(e) { sleep 1; [200, {}, ['Hello']] }"

    View full-size slide

  27. The Client Code
    % ruby -rhttpclient -e "t = Time.now;
    3.times { p HTTPClient.new.get('http://localhost:
    9292/').content };
    p Time.now - t"

    View full-size slide

  28. Result
    % ruby -rhttpclient -e "t = Time.now;
    3.times { p HTTPClient.new.get('http://localhost:
    9292/').content };
    p Time.now - t"
    #=> This takes 3 seconds

    View full-size slide

  29. Using Threads
    % ruby -rhttpclient -e "t = Time.now;
    3.times.map { Thread.new { HTTPClient.new.get('http://
    localhost:9292/') } }.each {|t| p t.value.content };
    p Time.now - t"

    View full-size slide

  30. Using Threads
    % ruby -rhttpclient -e "t = Time.now;
    3.times.map { Thread.new { HTTPClient.new.get('http://
    localhost:9292/') } }.each {|t| p t.value.content };
    p Time.now - t"
    #=> This finishes in 1 second!

    View full-size slide

  31. "Future Pattern"
    Thread.new { (do something) }.value
    Thread#value waits for the block
    to finish (internally with
    Thread#join)
    You can do anything else in the
    main thread while other threads
    are running

    View full-size slide

  32. "Future Pattern"
    Usually we wrap this Thread
    with a "future object"

    View full-size slide

  33. "Future Pattern"
    future = Future.execute { some_background_tasks }
    do_some_heavy_tasks_in_the_main_thread
    value = future.value # join the background thread

    View full-size slide

  34. Turbo Boosting External
    API Calls Using Threads
    Push an I/O blocking task to a child Thread
    The main thread can do some other heavy tasks
    I know the reality is not that simple
    For example, in many cases, you will be
    caching some results in the client side. In such
    case, you need to synchronize the threads
    before caching
    But anyway, think about using threads. This is
    the basic idea

    View full-size slide

  35. Turbo Boosting DB
    Queries

    View full-size slide

  36. DB Queries Are So

    Time Consuming
    Obviously, the most time-
    consuming tasks in most of the
    real-world Rails apps
    It's essentially just another kind
    of I/O blocking task

    View full-size slide

  37. While Active Record Is Waiting for the
    DB Server Response, the Ruby Process
    Is Doing Nothing!

    View full-size slide

  38. How AR Deals with
    Connections
    AR pools the DB connections
    Each HTTP request kicks one
    Ruby Thread (or Process) in the
    app server
    AR checks out a connection
    from the pool per each Thread

    View full-size slide

  39. So, One Request Uses Only One
    Connection, Although There Are So
    Many More Pooled Connections

    View full-size slide

  40. DB Query Blocks the
    Main Thread
    When you throw a query to the
    DB, you need to wait until you
    get the results back

    View full-size slide

  41. Querying in a Child
    Thread
    Maybe we can apply the same
    pattern with the API case?

    View full-size slide

  42. A Very Heavy Finder
    Query
    class User < ApplicationRecord
    def self.heavy_find(id)
    select('*, sleep(id)').where(id: id).first
    end
    end

    View full-size slide

  43. Takes 3 Seconds for
    heavy_finding User 1 and 2
    % rails r "User.first;
    p Benchmark.realtime {
    p User.heavy_find(1).name, User.heavy_find(2).name
    }"
    "user 1"
    "user 2"
    3.129794000182301

    View full-size slide

  44. We Can Do This in 2
    Seconds Using Threads!
    % rails r "User.first;
    p Benchmark.realtime {
    t1= Thread.new { User.heavy_find(1).name };
    t2 = Thread.new { User.heavy_find(2).name };
    p t1.value, t2.value
    }"
    "user 1"
    "user 2"
    2.0408139997161925

    View full-size slide

  45. This Is Great! Why Doesn't Active
    Record Do This by Default?

    View full-size slide

  46. Problem with This
    Approach
    Each Thread automatically
    establishes a new connection
    You'd better use with_connection
    to explicitly checkout and release
    a connection in a Thread
    User.connection.pool.with_conne
    ction { ... }

    View full-size slide

  47. So This Checks Out 3
    Connections...
    % rails r "User.first;
    p Benchmark.realtime {
    t1= Thread.new { User.connection.pool.with_connection
    { User.heavy_find(1).name } };
    t2 = Thread.new { User.connection.pool.with_connection
    { User.heavy_find(2).name } };
    p t1.value, t2.value; p User.connection.pool.stat
    }"
    "user 1"
    "user 2"
    {:size=>5, :connections=>3, :busy=>1, :dead=>2, :idle=>0,

    :waiting=>0, :checkout_timeout=>5}
    2.0807580002583563

    View full-size slide

  48. Implementation

    View full-size slide

  49. I Baked This into an
    Experimental Plugin

    View full-size slide

  50. With the Following

    2 APIs:
    # Fires the query in a background Thread. Joins at #records
    call
    AR::Relation#future
    # e.g. @posts = current_user.posts.future
    # Runs the block in a background Thread, checking out a new
    AR connection and releasing it. Returns a Future object
    FutureRecords.future(&block)

    View full-size slide

  51. GH/amatsuda/
    future_records
    Very roughly implemented
    No tests, no documentations, no
    comments
    But it works
    Actually, it's already used in our
    production app at Money Forward
    Please be careful not to exhaust all the
    connections in the connection pool

    View full-size slide

  52. Future Improvements
    Introduce a thread pool instead
    of Thread.new for performance
    and safety
    I'll explain this later through
    another example

    View full-size slide

  53. Two Other Possible
    Approaches
    Don't checkout a new
    connection per Thread. Share
    the main connection
    Use asynchronous connection

    View full-size slide

  54. Sharing the Main
    Connection
    Mutex.synchronize { Pass the main
    connection to a child thread when
    querying }
    Cannot run queries in parallel.
    Less performance gain
    Maybe we can use Thread + Fiber

    View full-size slide

  55. Async Connection
    For DB adapters that have
    asynchronous query API
    e.g. mysql2, postgres

    View full-size slide

  56. Async Connection
    Example (mysql2)
    client.query(some_very_heavy_query, async: true) # This
    method immediately returns nil
    # and once the query finishes,
    result = client.async_result # This returns a normal
    ResultSet

    View full-size slide

  57. You Need to Create a Mechanizm
    to Detect When the Query Is Done

    View full-size slide

  58. Async Connection + Active
    Record + EventMachine
    I could kind of make this work locally, but it required
    super crazy monkey-patches on AR::Relation,
    FinderMethods, connection adapters, etc.
    Also, maybe we need to create another connection
    pool instance that handles async connections
    There's an existing library for doing this. Check out
    em-synchrony project if you're interested in this
    approach
    I personally don't want my production Rails app to
    heavily depend on EM though

    View full-size slide

  59. Turbo Boosting Partial
    Renderings

    View full-size slide

  60. We Often Have Slow
    Partial Templates
    render_partial of course blocks
    the main thread
    And in most cases partials do
    not depend on each other
    So, we may be able to render
    them asynchronously

    View full-size slide

  61. With Ajax?
    Rails Ajax
    I guess everybody comes up
    with this idea and have
    implemented their own plugin

    View full-size slide

  62. And Here's My
    Implementation
    <%= render @users, remote: true %>

    View full-size slide

  63. GH/amatsuda/
    ljax_rails
    Actually I did this 5.years.ago
    And realized that this is not really a good
    approach
    Because the partial needs an extra
    routes and a controller. It’s like creating a
    whole set of API for just a partial template
    It adds another huge overhead for Ajax
    roundtrip, especially on narrowband

    View full-size slide

  64. Instead, Let's Think About
    Simply Threading render_partial
    Future pattern again
    Doesn't this perfectly work if AR
    connections are not concerned?

    View full-size slide

  65. Initial Implementation
    module AsyncRenderer
    def render(context, options, block)
    if (options.delete(:async) || (options[:locals]&.delete(:async)))
    FuturePartial.new { super }
    else
    super
    ennnd
    class FuturePartial
    def initialize(&block)
    @thread = Thread.new(&block)
    end
    def to_s
    @thread.value
    ennd
    ActionView::PartialRenderer.prepend AsyncRenderer

    View full-size slide

  66. Let's Measure!
    Adding <% sleep 1 %> in a
    parent template and a partial,
    and see how the performance
    was changed

    View full-size slide

  67. Like This
    # routes.rb
    resources :users do
    collection do
    get :a
    end
    end
    # show.html.erb
    A
    <%= render 'b', locals: {async: true} %>
    <% sleep 1 %>
    # _b.html.erb
    B
    <% sleep 1 %>

    View full-size slide

  68. The Result
    This kinda works! Seems like it
    returns a correct HTML.
    But, NO performance gain. AT
    ALL.

    View full-size slide

  69. Let's See What’s Actually
    Executed in Ruby-level

    View full-size slide

  70. Action View Compiles Each
    Template to a Ruby Method

    View full-size slide

  71. Let's Check the Compiled
    Template Source
    Maybe the easiest way to show
    the Ruby code is to add
    something like
    puts source; puts
    at the bottom of the bundled
    actionview gem's
    ActionView::Template#compile

    View full-size slide

  72. The Source
    def
    _app_views_users_a_html_erb__247788595159253739_70287467036600(lo
    cal_assigns, output_buffer)
    _old_virtual_path, @virtual_path = @virtual_path, "users/
    a";_old_output_buffer = @output_buffer;;@output_buffer =
    output_buffer ||
    ActionView::OutputBuffer.new;@output_buffer.safe_append='A
    '.freeze;@output_buffer.append=( render 'b', async:
    true );@output_buffer.safe_append='
    '.freeze; sleep 1
    @output_buffer.to_s
    ensure
    @virtual_path, @output_buffer = _old_virtual_path,
    _old_output_buffer
    end

    View full-size slide

  73. @output_buffer.append=
    @output_buffer.append=( render 'b', async: true );

    View full-size slide

  74. @output_buffer.append=
    Creates a future object via
    render async: true, then appends
    the future object to the buffer

    View full-size slide

  75. Implementation of
    @output_buffer.append=
    module ActionView
    class OutputBuffer < ActiveSupport::SafeBuffer #:nodoc:
    ...
    def <<(value)
    return self if value.nil?
    super(value.to_s)
    end
    alias :append= :<<
    ...

    View full-size slide

  76. Immediate to_s Call is
    Happening
    @output_buffer.append= calls
    to_s on the future object
    immediately after its creation
    Then it causes the background
    Thread's join

    View full-size slide

  77. But Why Do We Need to
    Call to_s There?
    Because ActionView::OutputBuffer
    < ActiveSupport::SafeBuffer < String
    You need to make sure that the
    value is_a String before <Otherwise, it may cause an error,
    or an unexpected result

    View full-size slide

  78. Like This
    '' << :x
    #=> no implicit conversion of Symbol into String

    (TypeError)
    '' << 10
    #=> "\n"

    View full-size slide

  79. How Can We Make Future
    Partial Objects Live Longer?
    Immediate to_s call is
    inevitable so far as the buffer
    is_a String
    What if we store the view
    fragments in an Array, then
    concat them at the very last?

    View full-size slide

  80. The Array Buffer
    module ArrayBuffer
    def initialize(*)
    super
    @values = []
    end
    def <<(value)
    @values << value unless value.nil?
    self
    end
    alias :append= :<<
    def to_s
    @values.join # or something like that
    end
    ...
    end
    ActionView::OutputBuffer.prepend AsyncPartial::ArrayBuffer

    View full-size slide

  81. Measuring Again
    Completed 200 OK in 1026ms
    (Views: 1013.0ms | ActiveRecord:
    0.7ms)

    View full-size slide

  82. Measuring Again
    It works perfectly!
    Now it returns the result in 1
    second!

    View full-size slide

  83. BTW, If You're Looking for the Fastest
    Template Engine on the current
    String-based OutputBuffer
    There's an implementation that
    is faster than Erubi, or Haml, or
    any other existing template
    engine in the world
    The gems is called
    string_template

    View full-size slide

  84. GH/amatsuda/
    string_template
    It compiles the whole template
    in one single String literal with
    interpolations
    Which is of course significantly
    faster than string <<
    another_string <<
    another_string...

    View full-size slide

  85. Anyway, Now Let's See How the
    Array-based Version Scales!

    View full-size slide

  86. Extract the Repetition in
    index.html.erb to a Partial
    # app/views/users/index.html.erb

    <% @users.each do |user| %>
    -
    - <%= user.name %>
    - <%= link_to 'Show', user %>
    - <%= link_to 'Edit', edit_user_path(user) %>
    - <%= link_to 'Destroy', user, method: :delete,
    data: { confirm: 'Are you sure?' } %>
    -
    + <%= render partial: 'user', object: user, locals:
    {async: true} %>
    <% end %>

    View full-size slide

  87. With Some Random
    Slowness to the Partial
    <% sleep(rand(3) / 100.0) %>

    View full-size slide

  88. Register 10 Users
    % rails r '(1..10).each {|i| User.create! name: "user
    #{i}"}'

    View full-size slide

  89. Or a 500 Error
    ActionView::Template::Error
    (Target thread must not be
    current thread)

    View full-size slide

  90. What the Hell Is
    Happening?

    View full-size slide

  91. It's Called

    Race Condition

    View full-size slide

  92. Why Does This Code
    Cause Race Condition?
    def
    _app_views_users__user_html_erb___590070358791478326_70218505010200(local_assigns,
    output_buffer)
    _old_virtual_path, @virtual_path = @virtual_path, "users/_user";_old_output_buffe
    = @output_buffer;user = local_assigns[:user]; user = user;;@output_buffer =
    output_buffer || ActionView::OutputBuffer.new;@output_buffer.safe_append='
    '.freeze;@output_buffer.append=( user.name );@output_buffer.safe_append='
    '.freeze;@output_buffer.append=( link_to 'Show',
    user );@output_buffer.safe_append='
    '.freeze;@output_buffer.append=( link_to 'Edit',
    edit_user_path(user) );@output_buffer.safe_append='
    '.freeze;@output_buffer.append=( link_to 'Destroy', user, method: :delete, data
    { confirm: 'Are you sure?' } );@output_buffer.safe_append='

    '.freeze; sleep(rand(3) / 100.0)
    @output_buffer.to_s
    ensure
    @virtual_path, @output_buffer = _old_virtual_path, _old_output_buffer
    end

    View full-size slide

  93. Because It Shares an Instance
    Variable @output_buffer Between
    Threads!

    View full-size slide

  94. We Need to Change the Buffer Object
    to Be a Local Variable or a Thread
    Local Variable

    View full-size slide

  95. And in Order to Achieve This, We Need
    to Monkey-patch the Erubi Template
    Handler

    View full-size slide

  96. I'm Not Gonna Paste the Whole Patch
    Here, But It's Been Done Like This
    properties[:bufvar] = "output_buffer"
    # and so on...

    View full-size slide

  97. And So It Works Now!

    View full-size slide

  98. Now Let's Try to Render
    _form.html.erb Asynchronously
    # new.html.erb
    <%= render partial: 'form', locals: {user: @user, async:
    true} %>

    View full-size slide

  99. Then, It Renders
    Something Broken

    View full-size slide

  100. This Happens Because of
    Action View's capture Helper
    Which is used to render the
    block content inside <%= ... do %>
    capture creates a new buffer,
    swaps @output_buffer ivar, then
    swaps it back at the end
    It's impossible to do such thing
    for a lvar

    View full-size slide

  101. But I Could Emulate the Behavior in
    Another Way Somehow

    View full-size slide

  102. With This Patch, Rails Would Run
    Hundreads or Thousands of Threads
    at Once
    Which would make the whole
    response time rather slower

    View full-size slide

  103. We Need to Control the
    Number of Running Threads

    View full-size slide

  104. Introducing a Thread
    Pool
    Thread.new in Ruby is not
    cheap
    Running too many Threads at
    once costs unignorable Thread
    switching cost

    View full-size slide

  105. Thread Pool
    Implementation
    We can create our own
    Or concurrent-ruby ships with a
    good one
    concurrent-ruby should be
    already bundled on your app
    through Active Support

    View full-size slide

  106. So, I Finally Finished Implementing
    an Async Partial Renderer!
    With a lot of monkey-patches
    But, this works only with Erubi so far
    We have so many other template
    engines, such as Erubis, Haml, Slim, etc.
    Especially, monkey-patching Haml is so
    tough
    (Even for the main maintainer of Haml...!)

    View full-size slide

  107. GH/amatsuda/
    async_partial

    View full-size slide

  108. And, These Are All Template
    Engines for Rendering HTML Files
    What about .json renderers?

    View full-size slide

  109. Jbuilder
    The Default JSON Renderer
    Completely not working
    Because Jbuilder is
    implemented very differently
    from other orthodox template
    engines

    View full-size slide

  110. I Suppose Many of You May Have
    Already Switched to a Fast and
    Elegant Alternative

    View full-size slide

  111. Jb of Course Works Perfectly with This
    Array Buffer and Threaded Partials

    View full-size slide

  112. GH/amatsuda/jb

    View full-size slide

  113. Turbo Boosting Lazy
    Attributes

    View full-size slide

  114. So, Let's Move on to The View
    Code, and Find What's Slow There

    View full-size slide

  115. Now Let's Try to Make
    Something Heavy and Realistic

    View full-size slide

  116. Scaffolding
    % rails g scaffold post col1 col2 col3 col4 col5 col6 col7
    col8 col9 col10 col11 col12 col13 col14 col15 col16 col17
    col18 col19 col20 col21 col22 col23 col24 col25 col26
    col27 col28 col29 col30 col31 col32 col33 col34 col35
    col36 col37 col38 col39 col40 col41 col42 col43 col44
    col45 col46 col47 col48 col49 col50 col51 col52 col53
    col54 col55 col56 col57 col58 col59 col60 col61 col62
    col63 col64 col65 col66 col67 col68 col69 col70 col71
    col72 col73 col74 col75 col76 col77 col78 col79 col80
    col81 col82 col83 col84 col85 col86 col87 col88 col89
    col90 col91 col92 col93 col94 col95 col96 col97

    View full-size slide

  117. With the Data
    % rails r '(1..1000).each {|i| Post.create! col1: i, col2: i, col3:
    i, col4: i, col5: i, col6: i, col7: i, col8: i, col9: i, col10: i,
    col11: i, col12: i, col13: i, col14: i, col15: i, col16: i, col17:
    i, col18: i, col19: i, col20: i, col21: i, col22: i, col23: i,
    col24: i, col25: i, col26: i, col27: i, col28: i, col29: i, col30:
    i, col31: i, col32: i, col33: i, col34: i, col35: i, col36: i,
    col37: i, col38: i, col39: i, col40: i, col41: i, col42: i, col43:
    i, col44: i, col45: i, col46: i, col47: i, col48: i, col49: i,
    col50: i, col51: i, col52: i, col53: i, col54: i, col55: i, col56:
    i, col57: i, col58: i, col59: i, col60: i, col61: i, col62: i,
    col63: i, col64: i, col65: i, col66: i, col67: i, col68: i, col69:
    i, col70: i, col71: i, col72: i, col73: i, col74: i, col75: i,
    col76: i, col77: i, col78: i, col79: i, col80: i, col81: i, col82:
    i, col83: i, col84: i, col85: i, col86: i, col87: i, col88: i,
    col89: i, col90: i, col91: i, col92: i, col93: i, col94: i, col95:
    i, col96: i, col97: i }'

    View full-size slide

  118. Benchmark
    % curl http:/
    /localhost:3000/
    posts
    Run this several times, abandon
    the fastest and slowest results

    View full-size slide

  119. Results
    Completed 200 OK in 1610ms (Views: 1568.9ms |
    ActiveRecord: 40.4ms)
    Completed 200 OK in 1693ms (Views: 1511.1ms |
    ActiveRecord: 43.3ms)
    Completed 200 OK in 1555ms (Views: 1484.5ms |
    ActiveRecord: 69.9ms)
    Completed 200 OK in 1668ms (Views: 1626.1ms |
    ActiveRecord: 41.9ms)
    Completed 200 OK in 1791ms (Views: 1737.3ms |
    ActiveRecord: 53.1ms)

    View full-size slide

  120. Let's See What Takes
    Time in Views

    View full-size slide

  121. What If We Changed the
    Attribute Accesses to Literals?
    - <%= post.col1 %>
    - ...
    - <%= post.col97 %>
    + <%= 'post.col1' %>
    + ...
    + <%= 'post.col97' %>

    View full-size slide

  122. Results
    Completed 200 OK in 803ms (Views: 747.5ms |
    ActiveRecord: 55.2ms)
    Completed 200 OK in 827ms (Views: 782.5ms |
    ActiveRecord: 44.2ms)
    Completed 200 OK in 820ms (Views: 775.9ms |
    ActiveRecord: 43.2ms)
    Completed 200 OK in 833ms (Views: 721.8ms |
    ActiveRecord: 110.3ms)
    Completed 200 OK in 834ms (Views: 781.1ms |
    ActiveRecord: 52.6ms)

    View full-size slide

  123. Half of the Response Time Was Spent
    on Reading Values from Already
    Selected AR Model Instance

    View full-size slide

  124. Why Does Just Accessing
    Attributes Take That Much Time?
    It should be just a method call,
    right?

    View full-size slide

  125. Let's Count The Number
    of Method Calls
    % rails r 'p = Post.first; (trace = TracePoint.new(:call) {|t| p
    "#{t.defined_class}##{t.method_id}"}).enable; p.col1; trace.disable'
    "#0x00007fbece82af70>#__temp__36f6c613"
    "ActiveRecord::AttributeMethods::Read#_read_attribute"
    "ActiveModel::AttributeSet#fetch_value"
    "ActiveModel::AttributeSet#[]"
    "ActiveModel::LazyAttributeHash#[]"
    "ActiveModel::LazyAttributeHash#assign_default_value"
    "##from_database"
    "ActiveModel::Attribute#initialize"
    "ActiveModel::Attribute#value"
    "ActiveModel::Attribute::FromDatabase#type_cast"
    "ActiveModel::Type::Value#deserialize"
    "ActiveModel::Type::Value#cast"
    "ActiveModel::Type::String#cast_value"

    View full-size slide

  126. 13 Method Calls per 1
    String Attribute Access!

    View full-size slide

  127. And 30 Method Calls per 1
    Timestamp Attribute Access!
    % rails r 'p = Post.first; (trace = TracePoint.new(:call) {|t| p "#{t.defined_class}##{t.method_id}"}).enable; p.created_at;
    trace.disable'
    "##__temp__36275616475646f51647"
    "ActiveRecord::AttributeMethods::Read#_read_attribute"
    "ActiveModel::AttributeSet#fetch_value"
    "ActiveModel::AttributeSet#[]"
    "ActiveModel::LazyAttributeHash#[]"
    "ActiveModel::LazyAttributeHash#assign_default_value"
    "##from_database"
    "ActiveModel::Attribute#initialize"
    "ActiveModel::Attribute#value"
    "ActiveModel::Attribute::FromDatabase#type_cast"
    "ActiveRecord::AttributeMethods::TimeZoneConversion::TimeZoneConverter#deserialize"
    "##deserialize"
    "##__getobj__"
    "ActiveModel::Type::Value#deserialize"
    "##cast"
    "ActiveModel::Type::Value#cast"
    "ActiveModel::Type::DateTime#cast_value"
    "ActiveModel::Type::Helpers::TimeValue#fast_string_to_time"
    "ActiveModel::Type::Helpers::TimeValue#new_time"
    "ActiveRecord::Type::Internal::Timezone#default_timezone"
    "##default_timezone"
    "ActiveRecord::AttributeMethods::TimeZoneConversion::TimeZoneConverter#convert_time_to_time_zone"
    "Object#acts_like?"
    "##zone"
    "DateAndTime::Zones#in_time_zone"
    "##find_zone!"
    "Object#acts_like?"
    "DateAndTime::Zones#time_with_zone"
    "ActiveSupport::TimeWithZone#initialize"
    "ActiveSupport::TimeWithZone#transfer_time_values_to_utc_constructor"

    View full-size slide

  128. So, for Looping 1000 Records
    and Accesing 100 Columns...
    Does Ruby make 13 * 100 * 1000
    = 130,0000 method calls?

    View full-size slide

  129. Yes, It Really Does
    % rails r 'calls = 0; trace = TracePoint.new(:call) {|t| calls += 1 };
    Post.all.each {|p| trace.enable; p.id; p.col1; p.col2; p.col3; p.col4;
    p.col5; p.col6; p.col7; p.col8; p.col9; p.col10; p.col11; p.col12;
    p.col13; p.col14; p.col15; p.col16; p.col17; p.col18; p.col19; p.col20;
    p.col21; p.col22; p.col23; p.col24; p.col25; p.col26; p.col27; p.col28;
    p.col29; p.col30; p.col31; p.col32; p.col33; p.col34; p.col35; p.col36;
    p.col37; p.col38; p.col39; p.col40; p.col41; p.col42; p.col43; p.col44;
    p.col45; p.col46; p.col47; p.col48; p.col49; p.col50; p.col51; p.col52;
    p.col53; p.col54; p.col55; p.col56; p.col57; p.col58; p.col59; p.col60;
    p.col61; p.col62; p.col63; p.col64; p.col65; p.col66; p.col67; p.col68;
    p.col69; p.col70; p.col71; p.col72; p.col73; p.col74; p.col75; p.col76;
    p.col77; p.col78; p.col79; p.col80; p.col81; p.col82; p.col83; p.col84;
    p.col85; p.col86; p.col87; p.col88; p.col89; p.col90; p.col91; p.col92;
    p.col93; p.col94; p.col95; p.col96; p.col97; p.created_at;
    p.updated_at; trace.disable }; p calls'
    1335000

    View full-size slide

  130. So, Active Record Is Slow
    Not because Ruby is slow
    But because the code is written
    to be slow

    View full-size slide

  131. Of Course, the Example I
    Showed Here Is a Silly UI
    We won't usually render 1,000
    records in a single page
    In such case, we would use
    pagination

    View full-size slide

  132. kaminari/kaminari
    With this plugin

    View full-size slide

  133. But There Are Some Use Cases

    That We Deal with Thousands of

    AR Model Instances, e.g.
    APIs
    Batches
    Fintech apps

    View full-size slide

  134. In Fact, We Actually Hit This
    Problem at Money Forward
    We had to render 2,500 models
    in one page, which was
    unbearably slow

    View full-size slide

  135. IMO Active Record Model is
    Designed to Do Too Much Work
    What we really need here in this
    situation is just a value object (something
    like "entity bean" in the Java world)
    AR model is apparently an overkill for
    this usage
    AR object has too many features such as
    type casting, dirty tracking, serialization,
    validation, etcetc.

    View full-size slide

  136. AR Implements Two
    Different Roles in One Class
    Data transfer object that
    transfers readonly data between
    MVC layers
    Form object that accepts user
    inputs and safely saves them to
    the DB

    View full-size slide

  137. And What We Need in This Scenario Is
    Just a Lightweight Readonly Object

    View full-size slide

  138. Probably We Can Transfer the
    ResultSet into Some Kind of DTO
    (Data Transfer Object)?
    Which is simply based on Ruby
    Struct?

    View full-size slide

  139. It Should Kinda Work for a Simple Use
    Case Like the Example in This Slides
    But we don't want to do that in
    Ruby. Ruby is not Java.
    And we want to use associations,
    some other methods defined on
    the model class, etc.
    And it won't play nice with our
    favorite decorator plugin

    View full-size slide

  140. GH/amatsuda/
    active_decorator

    View full-size slide

  141. Instead, Why Don't We Just Store
    the Attributes as a Hash Instance?
    And just delegate the attribute
    accessors to the Hash instance?
    (Actually, AR used to be
    designed that way)

    View full-size slide

  142. Problem
    AR attribute reader method is
    slow

    View full-size slide

  143. Let’s Solve This Problem Not by
    Adding More Complexity but
    Retrieving Back the Simplicity

    View full-size slide

  144. Good Old Hash-based
    Attributes
    We need to monkey-patch AR
    internals

    View full-size slide

  145. Recent Versions of Active Record
    Implements the "Attribute API"

    View full-size slide

  146. Attribute API
    Highly extensible, elegantly
    customizable
    It's a great feature, indeed
    But... who actually uses this
    feature in production?

    View full-size slide

  147. Attribute API
    Implementation
    In order to implement this
    feature, AR holds an instance of
    LazyAttribute per each column
    per each model instance

    View full-size slide

  148. Can’t We Opt-out This
    Rarely Used Feature?
    And let AR objects work
    speedily by default?
    It's great that AR has a lot of
    elegant features, but we want
    the model instances to perform
    as fast as possible by default

    View full-size slide

  149. Implementation

    View full-size slide

  150. If The Model Declares No Custom
    Attribute, Return a Good Old Simple
    Hash Based Model Instance
    I suppose this would speed up
    99.8% of AR models in the world

    View full-size slide

  151. Implementation

    View full-size slide

  152. An AttributeSet Alternative That
    Simply Delegates to a Given Hash
    Attributes
    module LightweightAttributes
    class AttributeSet
    delegate :each_value, :fetch, :except, :[], :
    []=, :key?, :keys, to: :attributes
    def initialize(attributes)
    @attributes = attributes
    end
    def fetch_value(name)
    self[name]
    end
    ...
    ennd

    View full-size slide

  153. An AttributeSet Builder that Builds the
    Lightweight AttributeSet when Building
    an Instance from DB Query Result
    module LightweightAttributes
    class AttributeSet
    class Builder
    ...
    def build_from_database(values = {},
    _additional_types = {})
    LightweightAttributes::AttributeSet.new values
    ennnnd

    View full-size slide

  154. Overriding AR::Base.attributes_builder
    to Return the Lightweight
    AttributeSet Builder
    module ARBaseClassMethods
    def attributes_builder
    # If the model has no custom attribute
    if attributes_to_define_after_schema_loads.empty?
    LightweightAttributes::AttributeSet::Builder.new(...)
    else
    super
    ennnd

    View full-size slide

  155. Results (Before)
    Completed 200 OK in 1610ms (Views: 1568.9ms |
    ActiveRecord: 40.4ms)
    Completed 200 OK in 1693ms (Views: 1511.1ms |
    ActiveRecord: 43.3ms)
    Completed 200 OK in 1555ms (Views: 1484.5ms |
    ActiveRecord: 69.9ms)
    Completed 200 OK in 1668ms (Views: 1626.1ms |
    ActiveRecord: 41.9ms)
    Completed 200 OK in 1791ms (Views: 1737.3ms |
    ActiveRecord: 53.1ms)

    View full-size slide

  156. Results (After)
    Completed 200 OK in 971ms (Views: 926.5ms |
    ActiveRecord: 44.4ms)
    Completed 200 OK in 998ms (Views: 950.3ms |
    ActiveRecord: 46.8ms)
    Completed 200 OK in 1128ms (Views: 1073.2ms |
    ActiveRecord: 54.1ms)
    Completed 200 OK in 927ms (Views: 876.1ms |
    ActiveRecord: 50.1ms)
    Completed 200 OK in 963ms (Views: 919.3ms |
    ActiveRecord: 42.9ms)

    View full-size slide

  157. Results
    The whole scaffold app
    became 40% faster!!!
    Because of less method
    invocations and less object
    creations

    View full-size slide

  158. It's Still Not Production
    Ready Though
    % rails r 'p [(c = Post.first.created_at), c.class]'
    ["2018-04-16 21:13:21.667499", String]

    View full-size slide

  159. Other Possible APIs
    Add a new method on
    AR::Relation that returns a
    lightweight Model collection, and
    don't change the default behavior
    Change Relation#readonly
    method to return a lightweight
    Model collection

    View full-size slide

  160. But I Basically Prefer Automagic
    APIs over Too explicit APIs

    View full-size slide

  161. GH/amatsuda/
    lightweight_attributes

    View full-size slide

  162. Turbo Boosting Named
    Urls

    View full-size slide

  163. Now the AR Attributes Became Fast
    Enough, What in the View Is Slow
    Next?

    View full-size slide

  164. What Is the Slowest Thing
    in the Scaffold View?

    View full-size slide

  165. The Answer Is, the Links

    View full-size slide

  166. If We Remove these 3 Links
    from posts#index View
    # app/views/posts/index.html.erb
    <%= post.col95 %>
    <%= post.col96 %>
    <%= post.col97 %>
    - <%= link_to 'Show', post %>
    - <%= link_to 'Edit', edit_post_path(post) %>
    td>
    - <%= link_to 'Destroy', post, method: :delete,
    data: { confirm: 'Are you sure?' } %>

    <% end %>

    View full-size slide

  167. Results (Before)
    Completed 200 OK in 971ms (Views: 926.5ms |
    ActiveRecord: 44.4ms)
    Completed 200 OK in 998ms (Views: 950.3ms |
    ActiveRecord: 46.8ms)
    Completed 200 OK in 1128ms (Views: 1073.2ms |
    ActiveRecord: 54.1ms)
    Completed 200 OK in 927ms (Views: 876.1ms |
    ActiveRecord: 50.1ms)
    Completed 200 OK in 963ms (Views: 919.3ms |
    ActiveRecord: 42.9ms)

    View full-size slide

  168. Results (After)
    Completed 200 OK in 661ms (Views: 608.2ms |
    ActiveRecord: 51.8ms)
    Completed 200 OK in 604ms (Views: 563.4ms |
    ActiveRecord: 40.0ms)
    Completed 200 OK in 574ms (Views: 533.2ms |
    ActiveRecord: 39.8ms)
    Completed 200 OK in 735ms (Views: 695.3ms |
    ActiveRecord: 38.9ms)
    Completed 200 OK in 698ms (Views: 657.7ms |
    ActiveRecord: 39.3ms)

    View full-size slide

  169. Results
    35% performance gain even
    with the 100 columns view!
    For a typical models like with
    10-ish columns, it changes
    more, like 70%

    View full-size slide

  170. Problem
    named_url Is Slow

    View full-size slide

  171. Solution
    If the OutputBuffer is already
    Array based, there's a very
    simple solution
    We can futurize it

    View full-size slide

  172. Rendering the Links
    Asynchronously
    module FutureUrlHelper
    def link_to(name = nil, options = nil, html_options =
    nil, &block)
    if ((Hash === options) && options.delete(:async)) ||
    ((Hash === html_options) && html_options.delete(:async))
    FutureObject.new { super }
    else
    super
    ennnd

    View full-size slide

  173. In This Particular Example, It Won't Be
    That Effective Because the Links Are
    Already at the Very Bottom of the Page

    View full-size slide

  174. Another Possible
    Solution
    Cache url_for results in memory

    View full-size slide

  175. I Created This
    2.years.ago
    It may be helpful if your app
    heavily uses named urls

    View full-size slide

  176. GH/amatsuda/
    turbo_urls

    View full-size slide

  177. What We Learned

    View full-size slide

  178. What We Learned (1)
    If you have external API calls in your app, consider
    doing them in child threads
    You can run AR queries in Threads, but be careful not
    to use up all pooled connections
    ActionView::OutputBuffer can be Array based, for
    some future extensions
    Monkey-patching Haml is hard
    LazyAttribute is so lazy, and opting this out may
    drastically boost the performance
    url_for is slow, and we need to fix it

    View full-size slide

  179. What We Learned (2)
    You can find what’s slow in your
    app
    And YOU can fix it
    If the problem lies inside the
    framework, just hack the framework
    It should be fun!

    View full-size slide

  180. What We Learned (3)
    Performance is not for free
    There are certain trade offs
    In Rails' case, we need to craft
    so many evil monkey-patches
    Maybe because the framework
    is not flexible enough

    View full-size slide

  181. What We Learned (4)
    Thread programming,
    especially debugging is hard
    I don’t wanna do this anymore
    I'm really looking forward for
    the new Thread model planned
    to be introduced in Ruby 3

    View full-size slide

  182. Future Plans
    Finish implementing the plugins that I introduced today
    All these plugins are experimental. They basically have
    no tests, no documentations, no comments at the
    moment
    Put them in actual production apps
    I'm sorry but the title of this talk was probably a little bit
    misleading
    Introduce more extensibility to the framework
    I realized some things that should better be changed in
    the framework side rather than in monkey-patch plugins

    View full-size slide

  183. end
    name: Akira Matsuda
    GitHub: @amatsuda
    Twitter: @a_matsuda

    View full-size slide