Database Performance for Rails Applications

Database Performance for Rails Applications

Tips on fixing database performance problems and a few war stories from fixing performance at Cookpad

70762059db0218c6f0ff48042ca0756a?s=128

Leonard Chin

March 21, 2018
Tweet

Transcript

  1. 3.

    About Cookpad • Web/iOS/Android • 22 languages • 68 countries

    • 30M monthly average users • 1.1 million+ recipes
  2. 4.

    Big, but not Huge • 10-20 Rails web app servers

    • 5+ database servers (Aurora MySQL) • Memcached/Redis instances • CDN for assets (Fastly) • 1000s of req/s • Plenty of data
  3. 7.

    Average is Easy • It’s easy (and feels good) just

    to look at the average • Unfortunately, it doesn’t tell you much. • Worse, looking at averages hides real performance problems.
  4. 8.

    Percentiles Response Time (ms) 45 54 44 42 42 41

    51 40 250 24 35 36 32 32 49 28 40 28 980 18 50 30 30 42 50 38 39 60 32 42 • Average: 70ms • Median: 40ms • 90th Percentile: 54.6ms • 95th Percentile: 193ms • 99th Percentile: 812ms • Maximum: 980ms * This is illustrative example data
  5. 9.

    The 99th Percentile • Slower than the other 99% •

    Often slow because they have more data • Probably your best (heaviest!) users • They’re your 1%
  6. 10.
  7. 11.

    The 1% at Cookpad • Users who publish lots of

    recipes • Users who bookmark and cook lots of recipes • Uses who follow lots of other users
  8. 13.

    Big (well, medium) Data • Tables with lots of records

    (e.g. millions) • Users with lots of records (10,000s) • Commonly used tables (lots of load) • i.e. database performance
  9. 14.

    Finding Slow Endpoints If averages are useless, where do we

    look? • Use performance monitoring that can eind your 99th percentile slow transactions ‣ e.g. NewRelic, Skylight • Look at your database’s slow query log • Look at your error logs for timeouts • Listen to your best users
  10. 17.

    My Bookmarks Don’t Load Rack::Timeout::RequestTimeoutException: Request ran for longer than

    19000ms app/views/bookmarks/_list.html.erb:4:in `block in _app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <%= render partial: "bookmarks/bookmark", collection: bookmarks %> app/views/bookmarks/_list.html.erb:2:in `tap' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/bookmarks/_list.html.erb:2:in `_app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/layouts/_bookmarks_all.html.erb:5:in `_app_views_layouts__bookmarks_all_html_erb__1020300051023946889_70283969077 500' <%= render "bookmarks/list", bookmarks: current_user.bookmarks.listing.recent, name: "recent_bookmarks" %> app/views/layouts/_sidebar.html.erb:87:in We were getting timeouts in our error logs on Sentry
  11. 21.

    What’s the problem? • This is our top page •

    Usually, super fast • Unless….
  12. 22.

    User-specieic content • There’s a list of user bookmarks in

    the menu • There are lots of bookmarks (100M records) • A user can have a LOT of bookmarks
  13. 23.

    The implementation # In the view current_user.bookmarks.listing.recent # As SQL

    SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Looks simple enough?
  14. 25.

    You an EXPLAIN locally, too # In the view current_user.bookmarks.listing.recent.explain

    # As SQL EXPLAIN SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ?;
  15. 26.

    Production results are better • The amount of data you

    have is going to greatly affect your query performance • The load on the database’s CPU from other queries will also affect performance • You can get good test results locally, but nothing beats production results
  16. 27.

    Understanding the EXPLAIN We use MySQL (Aurora), but the principles

    apply to most RDBMS • We’re using an index. This is good. • extras has using 2ilesort. This is bad. • rows_examined is quite large. This is bad, sometimes.
  17. 28.

    Why? • Why is an index good to have an

    index? • Why is eilesort bad? • Why does rows_examined matter?
  18. 30.
  19. 31.

    Database Internals Indexes and tables are stored separately. • Table

    rows are stored on disk as… ‣ a B-tree in som databases MySQL (InnoDB) ‣ a Heap table in others (e.g. Postgres) • Indexes are (also) B-trees where the leaf nodes point to the table rows
  20. 32.

    What’s a B-tree? • It’s sorted, tree data structure •

    It self-balances and can have lots of branches • It’s fast: Logarithmic time for search ‣ i.e. O(log n) ‣ Great for indexes
  21. 33.

    Index as a B-Tree 25 40 1 6 13 32

    3 4 28 35 1 99 110 120 123 78 12 54 65 90 75 78 89 41 2 7 Leaf nodes are primary keys (row ids) pointing to the record
  22. 34.

    CS 101 • A SELECT is a search • Searching

    requires sorted data • Sorting is slow • Searching is fast with a B-tree
  23. 35.

    Back to the query # In the view current_user.bookmarks.listing.recent #

    As SQL SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Why is this slow?
  24. 36.

    The index # index_bookmarks_on_user_id t.index [“user_id"] • This matches the

    WHERE perfectly! SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ?
  25. 37.

    But… # index_bookmarks_on_user_id t.index [“user_id"] • We’re sorting the result

    with ORDER BY SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Hence, using filesort
  26. 38.

    What does the database do? • Finds all records with

    given user_id using the index (fast) • Retrieves ALL records from the table ‣ despite the LIMIT/OFFSET • Sorts the records by visited_at • Returns LIMIT records at OFFSET
  27. 40.

    The solution • Use a sorted composite index t.index [“user_id”,

    “visited_at”] +----+-------------+-----------+------+---------+-------+------+-------------+ | id | select_type | table | type | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------+-------+------+-------------+ | 1 | SIMPLE | bookmarks | ref | 5 | const | 2523 | Using where | +----+-------------+-----------+------+---------+-------+------+------------- • This eliminates the eilesort, and now its fast
  28. 41.

    Composite Indexes 25, Feb 1 40, Jan 40 1, Mar

    1 6, Feb 3 6, Feb 5 6, Mar 12 13, Feb 22 32 3 99 42 12 Branch Node Leaf Node For an index like (user_id, visited_at), you can visualise the sorting on the second column as above
  29. 42.

    Note on Composite Indexes • Often, single column indexes are

    a smell • As the application developer, if you know what queries you make, specifying composite indexes improves index usage • Composite indexes are also used by subsets of the key, counting from the left
  30. 43.

    Reusing Composite Indexes 1, 22, Mar 1 6, 14 Feb

    3 6, 14 Feb 5 6, 15 Mar 12 13, 5 Feb 22 32 3 99 42 12 Leaf Node • Given an index on (user_id, recipe_id, created_at) • Used for: ‣ user_id ‣ user_id, recipe_id ‣ user_id, recipe_id, created_at •Not Used for: ‣ recipe_id ‣ recipe_id, created_at ‣ created_at
  31. 44.

    Caveats • More indexes will slow down insert, update and

    delete operations. It’s a tradeoff ‣ Therefore, only optimise when necessary • Applying indexes to a large table can lock the table ‣ You may need to use a tool like Percona Online Schema Change (pt-osc)
  32. 46.

    Slow Count Queries That’s a lot of rows After some

    reports of slow page loads, I discovered a surprising query in the slow query logs
  33. 47.

    Looks normal? SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON

    `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = ? • It’s counting the number of followers for a user
  34. 48.

    Tables JE OBNF 1 Mary 2 Jenny 3 Alice JE

    GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users
  35. 49.

    Self-referential has_many :through ! JE OBNF 1 Mary 2 Jenny

    3 Alice JE GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users
  36. 50.
  37. 51.

    Yep, Counter Cache # follow.rb belongs_to :follower, foreign_key: :follower_id, class_name:

    "User", counter_cache: :followees_count belongs_to :followee, foreign_key: :followee_id, class_name: "User", counter_cache: :followers_count # user.rb has_many :incoming_follows, class_name: "Follow", foreign_key: :followee_id, dependent: :destroy has_many :outgoing_follows, class_name: "Follow", foreign_key: :follower_id, dependent: :destroy has_many :followers, through: :incoming_follows has_many :followees, through: :outgoing_follows
  38. 52.

    But, does it work? [2] pry(main)> user.followers.size [slave/1] (3752.5ms) SELECT

    COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = 2008217 => 1712 [3] pry(main)> user.followees.size [slave/1] (71.0ms) SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`followee_id` WHERE `follows`.`follower_id` = 2008217 => 1087 Nope.
  39. 53.

    What’s the problem? • The counter cache column exists •

    The column is well-deeined on the Follow object • Can ActiveRecord eigure out the counter cache follow in this self-referential has_many :through?
  40. 54.

    Hidden deep in the docs… 4.3.2 Options for has_many 4.3.2.4

    :counter_cache This option can be used to configure a custom named :counter_cache. You only need this option when you customized the name of your :counter_cache on the belongs_to association. • Turns out, in cases like this where Rails can’t guess the inverse_relation, we can be explicit
  41. 55.

    The solution has_many :followers, -> { order("follows.id DESC") }, through:

    :incoming_follows, counter_cache: :followers_count has_many :followees, -> { order("follows.id DESC") }, through: :outgoing_follows, counter_cache: :followees_count [2] pry(main)> user.followers.size => 1712 • Now we’re using the cache!
  42. 57.

    A big IN clause Mysql2::Error: MySQL server has gone away:

    SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = 4612396 AND `recipe_likes`.`user_id` IN (11526355, 10243093, 10233195, 11675568, 10241820, 11343209, 10239509, 10227928, 10226294, 10230604, 10225640, 10499978, 10223459, 10230036, 10228518, 10228876, 9111946, 11598081, 11598079, 11598074, 11598083, 10225955, 11598080, 11598070, 11598072, 11598061, 11598066, 11598058, 11598037, 11598068, 11598062, 11598043, 11598041, 11598038, 11598034, 11598046, 11598044, 11598042, 11598014, 11598009, 11598040, 11598036, 11598006, 11598003, 11597999, 11597996, 11598035, 11598031, 11598039, 11598030, 10229548, 11598025, 11598023, 11597989, 11598029, 11598024, 11597991, 11598020, 11597990, 11597987, 11598021, 11598019, 11598018, 11598013, 11598008, 11598012, 11598015, 11598011, 11597997, 11598001, 11597994, 11597993, 11597992, 11597988, 11597971, 11597967, 11597966, 11597960, 11597959, 11597954, 11597951, 11597948, 11597945, 11597986, 11597985, 11597983, 11597935, 10229554, 11597978, 11597974, 11597953, 11597958, 11597949, 11597947, 11597936, 11597956, 11597950, 11597943, 11597939, 11597937, 11597910, 11597912, 11597933, 11597931, 11597927, 11597925, 11597917, 11597919, 11597920, 11597918, 11597913, 11597896, 11597892, 11597888, 11597877, 11597921, 11597886, 11597914, 11597882, 11597879, 11597911, 11597909, 11597908, 11597904, 10225953, 11597903, 11597905, 11597902, 11597901, 11597893, 11597898, 11597846, 11597874, 11597849, 11597872, 11597869, 11597868, 11597866, 11597860, 11597864, 11597828, 11597863, 11597858, 11597850, 11597819, 11597862, 11597857, 11597854, 11597852, 11597842, 11597853, 11597856, 11597848, 11597839, 11597836, 11597837, 11597835, 11597817, 11597834, 11597832, 11597829, 11597821, 11597807, 11597803, 11597831, 11597833, 11597820, 11597805, 11597798, 11597818, 11597801, 11597816, 11597793, 11597813, 11597781, 11597778, 11597811, 11597806, 11597761, 11597799, 11597766, 11597804, 11597797, 10225733, 11597794, 11597768, 11597762, 11597759, 11597788, 11597789, 11597791, 11597790, 11597785, 11597786, 11597756, 11597784, 11597779, 11597782, 11597780, 11597775, 11597773, 11597739, 11597777, 10225943, 11597774, 11597767, 11597763, 11597765, 11597760, 11597736, 11597757, 11597755, 11597753, 11597751, 11597750, 11597734, 11597735, 11597733,
  43. 58.

    Why is this happening? def liked_by_multiple_followees? recipe.
 recipe_likes. where(user: subscriber_followee_ids).

    count >= LIKES_REQUIRED_FOR_PUBLISH end • All we want is to eind out if a recipe has been liked by several followees… • But… what if there are lots of followees?
  44. 59.

    A clearer look at the SQL SELECT `users`.id FROM `users`

    INNER JOIN `follows` ON `users`.`id` = `follows`.`followee_id` WHERE `follows`.`follower_id` = ? ORDER BY follows.id DESC SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = ? AND `recipe_likes`.`user_id` IN (?+);
  45. 60.

    What does IN do? • You can think of the

    IN clause as being similar to an OR operation • Each element of the IN means a separate index lookup • A better index won’t help… we can only avoid using IN
  46. 61.

    Do it in Ruby • All we really want is

    the intersection between followers and “likers” • It’s easy to add more app servers • It’s harder to add more database instances (even slaves) • In this case, even with (say) 1 million ids, doing it in Ruby is still fast
  47. 62.

    The solution # Find the ids liker_ids = RecipeLike.where(recipe: recipe).pluck(:user_id)

    follower_ids = Follow.where(follower_id: subscriber).pluck(:followee_id) # Find the intersection liker_ids & subscriber_followee_ids • Using Ruby eliminates slow queries and is actually quite simple
  48. 63.

    Cookpad’s approach to Perf • Time eixing performance problems is

    time not spent improving the product ‣ Have good hygiene (understand indexes, N+1s, etc) • Rails makes it easy to write clean code, but problems can be hard to detect ‣ But, clean code can make diagnosis easier • Prefer quick and easy solutions
  49. 64.

    Takeaways • Understand how RDBMS execute queries: in particular, how

    indexes work • Don’t look at average performance, look at the 99th percentile • Use tools to eind the bottlenecks that matter