Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Database Performance for Rails Applications

Database Performance for Rails Applications

Tips on fixing database performance problems and a few war stories from fixing performance at Cookpad

Leonard Chin

March 21, 2018
Tweet

More Decks by Leonard Chin

Other Decks in Technology

Transcript

  1. Database Performance for
    Rails Applications
    @Great Western Rails
    March 20th, 2018

    View Slide

  2. 8IP"SF:PV
    -FPOBSE$IJO
    &OHJOFFSJOH.BOBHFS8FC%FWFMPQFS
    !$PPLQBE(MPCBM
    5PLZP0⒏DF


    lchin l15n

    View Slide

  3. About Cookpad
    • Web/iOS/Android
    • 22 languages
    • 68 countries
    • 30M monthly average
    users
    • 1.1 million+ recipes

    View Slide

  4. Big, but not Huge
    • 10-20 Rails web app servers
    • 5+ database servers (Aurora MySQL)
    • Memcached/Redis instances
    • CDN for assets (Fastly)
    • 1000s of req/s
    • Plenty of data

    View Slide

  5. Fast Enough, On Average
    * NewRelic

    View Slide

  6. On Average?

    View Slide

  7. Average is Easy
    • It’s easy (and feels good) just to look at
    the average
    • Unfortunately, it doesn’t tell you much.
    • Worse, looking at averages hides real
    performance problems.

    View Slide

  8. Percentiles
    Response Time (ms)
    45 54 44
    42 42 41
    51 40 250
    24 35 36
    32 32 49
    28 40 28
    980 18 50
    30 30 42
    50 38 39
    60 32 42
    • Average: 70ms
    • Median: 40ms
    • 90th Percentile: 54.6ms
    • 95th Percentile: 193ms
    • 99th Percentile: 812ms
    • Maximum: 980ms
    * This is illustrative example data

    View Slide

  9. The 99th Percentile
    • Slower than the other 99%
    • Often slow because they have more data
    • Probably your best (heaviest!) users
    • They’re your 1%

    View Slide

  10. The 1%

    View Slide

  11. The 1% at Cookpad
    • Users who publish lots of recipes
    • Users who bookmark and cook lots of
    recipes
    • Uses who follow lots of other users

    View Slide

  12. What’s the
    pattern here?

    View Slide

  13. Big (well, medium) Data
    • Tables with lots of records (e.g. millions)
    • Users with lots of records (10,000s)
    • Commonly used tables (lots of load)
    • i.e. database performance

    View Slide

  14. Finding Slow Endpoints
    If averages are useless, where do we look?
    • Use performance monitoring that can eind
    your 99th percentile slow transactions
    ‣ e.g. NewRelic, Skylight
    • Look at your database’s slow query log
    • Look at your error logs for timeouts
    • Listen to your best users

    View Slide

  15. DATABASE PERFORMANCE
    WAR STORIES
    Examples from Cookpad’s Ruby on Rails app

    View Slide

  16. (1)
    “My Bookmarks Don’t Load”

    View Slide

  17. My Bookmarks Don’t Load
    Rack::Timeout::RequestTimeoutException: Request ran for longer than 19000ms
    app/views/bookmarks/_list.html.erb:4:in `block in
    _app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900'

    app/views/bookmarks/_list.html.erb:2:in `tap'

    app/views/bookmarks/_list.html.erb:2:in
    `_app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900'

    app/views/layouts/_bookmarks_all.html.erb:5:in
    `_app_views_layouts__bookmarks_all_html_erb__1020300051023946889_70283969077
    500'
    current_user.bookmarks.listing.recent, name: "recent_bookmarks" %>
    app/views/layouts/_sidebar.html.erb:87:in
    We were getting timeouts in our error logs on Sentry

    View Slide

  18. But… endpoint looks fast?

    View Slide

  19. Oh right,
    averages.

    View Slide

  20. 99th Percentile is terrible
    Average is 70ms, 99% is > 500ms!

    View Slide

  21. What’s the problem?
    • This is our top page
    • Usually, super fast
    • Unless….

    View Slide

  22. User-specieic content
    • There’s a list of user
    bookmarks in the menu
    • There are lots of
    bookmarks (100M records)
    • A user can have a LOT of
    bookmarks

    View Slide

  23. The implementation
    # In the view
    current_user.bookmarks.listing.recent
    # As SQL
    SELECT `bookmarks`.*
    FROM `bookmarks`
    WHERE `bookmarks`.`user_id` = ?
    ORDER BY visited_at DESC LIMIT ? OFFSET ?
    Looks simple enough?

    View Slide

  24. Let’s look at the slow query log
    This is not good

    View Slide

  25. You an EXPLAIN locally, too
    # In the view
    current_user.bookmarks.listing.recent.explain
    # As SQL
    EXPLAIN
    SELECT `bookmarks`.*
    FROM `bookmarks`
    WHERE `bookmarks`.`user_id` = ?;

    View Slide

  26. Production results are better
    • The amount of data you have is going to
    greatly affect your query performance
    • The load on the database’s CPU from
    other queries will also affect performance
    • You can get good test results locally, but
    nothing beats production results

    View Slide

  27. Understanding the EXPLAIN
    We use MySQL (Aurora), but the principles
    apply to most RDBMS
    • We’re using an index. This is good.
    • extras has using 2ilesort. This is bad.
    • rows_examined is quite large. This is bad,
    sometimes.

    View Slide

  28. Why?
    • Why is an index good to have an index?
    • Why is eilesort bad?
    • Why does rows_examined matter?

    View Slide

  29. A quick dive into
    Relational
    Databases
    * Mostly MySQL

    View Slide

  30. Rails App Database
    Database Client Database Server
    SQL Query
    Result Set
    Let’s find out what
    goes on in here

    View Slide

  31. Database Internals
    Indexes and tables are stored separately.
    • Table rows are stored on disk as…
    ‣ a B-tree in som databases MySQL (InnoDB)
    ‣ a Heap table in others (e.g. Postgres)
    • Indexes are (also) B-trees where the leaf
    nodes point to the table rows

    View Slide

  32. What’s a B-tree?
    • It’s sorted, tree data structure
    • It self-balances and can have lots of
    branches
    • It’s fast: Logarithmic time for search
    ‣ i.e. O(log n)
    ‣ Great for indexes

    View Slide

  33. Index as a B-Tree
    25 40
    1 6 13
    32 3 4
    28 35
    1 99
    110 120 123
    78 12 54
    65
    90
    75 78 89
    41 2 7
    Leaf nodes are primary keys (row ids) pointing to the record

    View Slide

  34. CS 101
    • A SELECT is a search
    • Searching requires sorted data
    • Sorting is slow
    • Searching is fast with a B-tree

    View Slide

  35. Back to the query
    # In the view
    current_user.bookmarks.listing.recent
    # As SQL
    SELECT `bookmarks`.*
    FROM `bookmarks`
    WHERE `bookmarks`.`user_id` = ?
    ORDER BY visited_at DESC LIMIT ? OFFSET ?
    Why is this slow?

    View Slide

  36. The index
    # index_bookmarks_on_user_id
    t.index [“user_id"]
    • This matches the WHERE perfectly!
    SELECT `bookmarks`.*
    FROM `bookmarks`
    WHERE `bookmarks`.`user_id` = ?
    ORDER BY visited_at DESC LIMIT ? OFFSET ?

    View Slide

  37. But…
    # index_bookmarks_on_user_id
    t.index [“user_id"]
    • We’re sorting the result with ORDER BY
    SELECT `bookmarks`.*
    FROM `bookmarks`
    WHERE `bookmarks`.`user_id` = ?
    ORDER BY visited_at DESC LIMIT ? OFFSET ?
    Hence, using filesort

    View Slide

  38. What does the database do?
    • Finds all records with given user_id using
    the index (fast)
    • Retrieves ALL records from the table
    ‣ despite the LIMIT/OFFSET
    • Sorts the records by visited_at
    • Returns LIMIT records at OFFSET

    View Slide

  39. Sorting is slow

    View Slide

  40. The solution
    • Use a sorted composite index
    t.index [“user_id”, “visited_at”]
    +----+-------------+-----------+------+---------+-------+------+-------------+
    | id | select_type | table | type | key_len | ref | rows | Extra |
    +----+-------------+-----------+------+---------+-------+------+-------------+
    | 1 | SIMPLE | bookmarks | ref | 5 | const | 2523 | Using where |
    +----+-------------+-----------+------+---------+-------+------+-------------
    • This eliminates the eilesort, and now its fast

    View Slide

  41. Composite Indexes
    25,
    Feb 1
    40,
    Jan 40
    1,
    Mar 1
    6,
    Feb 3
    6,
    Feb 5
    6,
    Mar 12
    13,
    Feb 22
    32 3 99 42 12
    Branch Node
    Leaf Node
    For an index like (user_id, visited_at), you
    can visualise the sorting on the second
    column as above

    View Slide

  42. Note on Composite Indexes
    • Often, single column indexes are a smell
    • As the application developer, if you know
    what queries you make, specifying
    composite indexes improves index usage
    • Composite indexes are also used by
    subsets of the key, counting from the left

    View Slide

  43. Reusing Composite Indexes
    1,
    22,
    Mar 1
    6,
    14
    Feb 3
    6,
    14
    Feb 5
    6,
    15
    Mar 12
    13,
    5
    Feb 22
    32 3 99 42 12
    Leaf Node
    • Given an index on (user_id, recipe_id, created_at)
    • Used for:
    ‣ user_id
    ‣ user_id, recipe_id
    ‣ user_id, recipe_id, created_at
    •Not Used for:
    ‣ recipe_id
    ‣ recipe_id, created_at
    ‣ created_at

    View Slide

  44. Caveats
    • More indexes will slow down insert,
    update and delete operations. It’s a
    tradeoff
    ‣ Therefore, only optimise when necessary
    • Applying indexes to a large table can lock
    the table
    ‣ You may need to use a tool like Percona Online
    Schema Change (pt-osc)

    View Slide

  45. (2)
    “Too many count queries”

    View Slide

  46. Slow Count Queries
    That’s a lot of rows
    After some reports
    of slow page loads, I
    discovered a
    surprising query in
    the slow query logs

    View Slide

  47. Looks normal?
    SELECT COUNT(*)
    FROM `users`
    INNER JOIN `follows` ON `users`.`id` =
    `follows`.`follower_id`
    WHERE `follows`.`followee_id` = ?
    • It’s counting the number of followers for a
    user

    View Slide

  48. Tables
    JE OBNF
    1 Mary
    2 Jenny
    3 Alice
    JE [email protected] [email protected]
    1 1 2
    2 1 3
    3 2 3
    Follows
    Users

    View Slide

  49. Self-referential
    has_many :through !
    JE OBNF
    1 Mary
    2 Jenny
    3 Alice
    JE [email protected] [email protected]
    1 1 2
    2 1 3
    3 2 3
    Follows
    Users

    View Slide

  50. Counter Cache?
    def follower_count
    object.followers.size
    end
    • You’d expect that we’d just use a counter
    cache to eliminate a query

    View Slide

  51. Yep, Counter Cache
    # follow.rb
    belongs_to :follower, foreign_key: :follower_id, class_name:
    "User", counter_cache: :followees_count
    belongs_to :followee, foreign_key: :followee_id, class_name:
    "User", counter_cache: :followers_count
    # user.rb
    has_many :incoming_follows, class_name: "Follow",
    foreign_key: :followee_id, dependent: :destroy
    has_many :outgoing_follows, class_name: "Follow",
    foreign_key: :follower_id, dependent: :destroy
    has_many :followers, through: :incoming_follows
    has_many :followees, through: :outgoing_follows

    View Slide

  52. But, does it work?
    [2] pry(main)> user.followers.size
    [slave/1] (3752.5ms) SELECT COUNT(*) FROM `users` INNER
    JOIN `follows` ON `users`.`id` = `follows`.`follower_id`
    WHERE `follows`.`followee_id` = 2008217
    => 1712
    [3] pry(main)> user.followees.size
    [slave/1] (71.0ms) SELECT COUNT(*) FROM `users` INNER
    JOIN `follows` ON `users`.`id` = `follows`.`followee_id`
    WHERE `follows`.`follower_id` = 2008217
    => 1087
    Nope.

    View Slide

  53. What’s the problem?
    • The counter cache column exists
    • The column is well-deeined on the Follow
    object
    • Can ActiveRecord eigure out the counter
    cache follow in this self-referential
    has_many :through?

    View Slide

  54. Hidden deep in the docs…
    4.3.2 Options for has_many
    4.3.2.4 :counter_cache
    This option can be used to configure a custom
    named :counter_cache. You only need this option when
    you customized the name of your :counter_cache on
    the belongs_to association.
    • Turns out, in cases like this where Rails
    can’t guess the inverse_relation, we can be
    explicit

    View Slide

  55. The solution
    has_many :followers, -> { order("follows.id DESC")
    }, through: :incoming_follows,
    counter_cache: :followers_count
    has_many :followees, -> { order("follows.id DESC")
    }, through: :outgoing_follows,
    counter_cache: :followees_count
    [2] pry(main)> user.followers.size
    => 1712
    • Now we’re using the cache!

    View Slide

  56. (3)
    “Big IN Clause”

    View Slide

  57. A big IN clause
    Mysql2::Error: MySQL server has gone away: SELECT COUNT(*) FROM `recipe_likes` WHERE
    `recipe_likes`.`recipe_id` = 4612396 AND `recipe_likes`.`user_id` IN (11526355, 10243093,
    10233195, 11675568, 10241820, 11343209, 10239509, 10227928, 10226294, 10230604, 10225640,
    10499978, 10223459, 10230036, 10228518, 10228876, 9111946, 11598081, 11598079, 11598074,
    11598083, 10225955, 11598080, 11598070, 11598072, 11598061, 11598066, 11598058, 11598037,
    11598068, 11598062, 11598043, 11598041, 11598038, 11598034, 11598046, 11598044, 11598042,
    11598014, 11598009, 11598040, 11598036, 11598006, 11598003, 11597999, 11597996, 11598035,
    11598031, 11598039, 11598030, 10229548, 11598025, 11598023, 11597989, 11598029, 11598024,
    11597991, 11598020, 11597990, 11597987, 11598021, 11598019, 11598018, 11598013, 11598008,
    11598012, 11598015, 11598011, 11597997, 11598001, 11597994, 11597993, 11597992, 11597988,
    11597971, 11597967, 11597966, 11597960, 11597959, 11597954, 11597951, 11597948, 11597945,
    11597986, 11597985, 11597983, 11597935, 10229554, 11597978, 11597974, 11597953, 11597958,
    11597949, 11597947, 11597936, 11597956, 11597950, 11597943, 11597939, 11597937, 11597910,
    11597912, 11597933, 11597931, 11597927, 11597925, 11597917, 11597919, 11597920, 11597918,
    11597913, 11597896, 11597892, 11597888, 11597877, 11597921, 11597886, 11597914, 11597882,
    11597879, 11597911, 11597909, 11597908, 11597904, 10225953, 11597903, 11597905, 11597902,
    11597901, 11597893, 11597898, 11597846, 11597874, 11597849, 11597872, 11597869, 11597868,
    11597866, 11597860, 11597864, 11597828, 11597863, 11597858, 11597850, 11597819, 11597862,
    11597857, 11597854, 11597852, 11597842, 11597853, 11597856, 11597848, 11597839, 11597836,
    11597837, 11597835, 11597817, 11597834, 11597832, 11597829, 11597821, 11597807, 11597803,
    11597831, 11597833, 11597820, 11597805, 11597798, 11597818, 11597801, 11597816, 11597793,
    11597813, 11597781, 11597778, 11597811, 11597806, 11597761, 11597799, 11597766, 11597804,
    11597797, 10225733, 11597794, 11597768, 11597762, 11597759, 11597788, 11597789, 11597791,
    11597790, 11597785, 11597786, 11597756, 11597784, 11597779, 11597782, 11597780, 11597775,
    11597773, 11597739, 11597777, 10225943, 11597774, 11597767, 11597763, 11597765, 11597760,
    11597736, 11597757, 11597755, 11597753, 11597751, 11597750, 11597734, 11597735, 11597733,

    View Slide

  58. Why is this happening?
    def liked_by_multiple_followees?
    recipe.

    recipe_likes.
    where(user: subscriber_followee_ids).
    count >= LIKES_REQUIRED_FOR_PUBLISH
    end
    • All we want is to eind out if a recipe has
    been liked by several followees…
    • But… what if there are lots of followees?

    View Slide

  59. A clearer look at the SQL
    SELECT `users`.id
    FROM `users`
    INNER JOIN `follows` ON `users`.`id` =
    `follows`.`followee_id`
    WHERE `follows`.`follower_id` = ?
    ORDER BY follows.id DESC
    SELECT COUNT(*) FROM `recipe_likes`
    WHERE `recipe_likes`.`recipe_id` = ?
    AND `recipe_likes`.`user_id` IN (?+);

    View Slide

  60. What does IN do?
    • You can think of the IN clause as being
    similar to an OR operation
    • Each element of the IN means a separate
    index lookup
    • A better index won’t help… we can only
    avoid using IN

    View Slide

  61. Do it in Ruby
    • All we really want is the intersection
    between followers and “likers”
    • It’s easy to add more app servers
    • It’s harder to add more database
    instances (even slaves)
    • In this case, even with (say) 1 million ids,
    doing it in Ruby is still fast

    View Slide

  62. The solution
    # Find the ids
    liker_ids = RecipeLike.where(recipe:
    recipe).pluck(:user_id)
    follower_ids = Follow.where(follower_id:
    subscriber).pluck(:followee_id)
    # Find the intersection
    liker_ids & subscriber_followee_ids
    • Using Ruby eliminates slow queries and is
    actually quite simple

    View Slide

  63. Cookpad’s approach to Perf
    • Time eixing performance problems is
    time not spent improving the product
    ‣ Have good hygiene (understand indexes, N+1s,
    etc)
    • Rails makes it easy to write clean code,
    but problems can be hard to detect
    ‣ But, clean code can make diagnosis easier
    • Prefer quick and easy solutions

    View Slide

  64. Takeaways
    • Understand how RDBMS execute queries:
    in particular, how indexes work
    • Don’t look at average performance, look
    at the 99th percentile
    • Use tools to eind the bottlenecks that
    matter

    View Slide

  65. Questions?

    View Slide