Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Database Performance for Rails Applications

Database Performance for Rails Applications

Tips on fixing database performance problems and a few war stories from fixing performance at Cookpad

Leonard Chin

March 21, 2018
Tweet

More Decks by Leonard Chin

Other Decks in Technology

Transcript

  1. About Cookpad • Web/iOS/Android • 22 languages • 68 countries

    • 30M monthly average users • 1.1 million+ recipes
  2. Big, but not Huge • 10-20 Rails web app servers

    • 5+ database servers (Aurora MySQL) • Memcached/Redis instances • CDN for assets (Fastly) • 1000s of req/s • Plenty of data
  3. Average is Easy • It’s easy (and feels good) just

    to look at the average • Unfortunately, it doesn’t tell you much. • Worse, looking at averages hides real performance problems.
  4. Percentiles Response Time (ms) 45 54 44 42 42 41

    51 40 250 24 35 36 32 32 49 28 40 28 980 18 50 30 30 42 50 38 39 60 32 42 • Average: 70ms • Median: 40ms • 90th Percentile: 54.6ms • 95th Percentile: 193ms • 99th Percentile: 812ms • Maximum: 980ms * This is illustrative example data
  5. The 99th Percentile • Slower than the other 99% •

    Often slow because they have more data • Probably your best (heaviest!) users • They’re your 1%
  6. The 1% at Cookpad • Users who publish lots of

    recipes • Users who bookmark and cook lots of recipes • Uses who follow lots of other users
  7. Big (well, medium) Data • Tables with lots of records

    (e.g. millions) • Users with lots of records (10,000s) • Commonly used tables (lots of load) • i.e. database performance
  8. Finding Slow Endpoints If averages are useless, where do we

    look? • Use performance monitoring that can eind your 99th percentile slow transactions ‣ e.g. NewRelic, Skylight • Look at your database’s slow query log • Look at your error logs for timeouts • Listen to your best users
  9. My Bookmarks Don’t Load Rack::Timeout::RequestTimeoutException: Request ran for longer than

    19000ms app/views/bookmarks/_list.html.erb:4:in `block in _app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <%= render partial: "bookmarks/bookmark", collection: bookmarks %> app/views/bookmarks/_list.html.erb:2:in `tap' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/bookmarks/_list.html.erb:2:in `_app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/layouts/_bookmarks_all.html.erb:5:in `_app_views_layouts__bookmarks_all_html_erb__1020300051023946889_70283969077 500' <%= render "bookmarks/list", bookmarks: current_user.bookmarks.listing.recent, name: "recent_bookmarks" %> app/views/layouts/_sidebar.html.erb:87:in We were getting timeouts in our error logs on Sentry
  10. What’s the problem? • This is our top page •

    Usually, super fast • Unless….
  11. User-specieic content • There’s a list of user bookmarks in

    the menu • There are lots of bookmarks (100M records) • A user can have a LOT of bookmarks
  12. The implementation # In the view current_user.bookmarks.listing.recent # As SQL

    SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Looks simple enough?
  13. You an EXPLAIN locally, too # In the view current_user.bookmarks.listing.recent.explain

    # As SQL EXPLAIN SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ?;
  14. Production results are better • The amount of data you

    have is going to greatly affect your query performance • The load on the database’s CPU from other queries will also affect performance • You can get good test results locally, but nothing beats production results
  15. Understanding the EXPLAIN We use MySQL (Aurora), but the principles

    apply to most RDBMS • We’re using an index. This is good. • extras has using 2ilesort. This is bad. • rows_examined is quite large. This is bad, sometimes.
  16. Why? • Why is an index good to have an

    index? • Why is eilesort bad? • Why does rows_examined matter?
  17. Database Internals Indexes and tables are stored separately. • Table

    rows are stored on disk as… ‣ a B-tree in som databases MySQL (InnoDB) ‣ a Heap table in others (e.g. Postgres) • Indexes are (also) B-trees where the leaf nodes point to the table rows
  18. What’s a B-tree? • It’s sorted, tree data structure •

    It self-balances and can have lots of branches • It’s fast: Logarithmic time for search ‣ i.e. O(log n) ‣ Great for indexes
  19. Index as a B-Tree 25 40 1 6 13 32

    3 4 28 35 1 99 110 120 123 78 12 54 65 90 75 78 89 41 2 7 Leaf nodes are primary keys (row ids) pointing to the record
  20. CS 101 • A SELECT is a search • Searching

    requires sorted data • Sorting is slow • Searching is fast with a B-tree
  21. Back to the query # In the view current_user.bookmarks.listing.recent #

    As SQL SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Why is this slow?
  22. The index # index_bookmarks_on_user_id t.index [“user_id"] • This matches the

    WHERE perfectly! SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ?
  23. But… # index_bookmarks_on_user_id t.index [“user_id"] • We’re sorting the result

    with ORDER BY SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Hence, using filesort
  24. What does the database do? • Finds all records with

    given user_id using the index (fast) • Retrieves ALL records from the table ‣ despite the LIMIT/OFFSET • Sorts the records by visited_at • Returns LIMIT records at OFFSET
  25. The solution • Use a sorted composite index t.index [“user_id”,

    “visited_at”] +----+-------------+-----------+------+---------+-------+------+-------------+ | id | select_type | table | type | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------+-------+------+-------------+ | 1 | SIMPLE | bookmarks | ref | 5 | const | 2523 | Using where | +----+-------------+-----------+------+---------+-------+------+------------- • This eliminates the eilesort, and now its fast
  26. Composite Indexes 25, Feb 1 40, Jan 40 1, Mar

    1 6, Feb 3 6, Feb 5 6, Mar 12 13, Feb 22 32 3 99 42 12 Branch Node Leaf Node For an index like (user_id, visited_at), you can visualise the sorting on the second column as above
  27. Note on Composite Indexes • Often, single column indexes are

    a smell • As the application developer, if you know what queries you make, specifying composite indexes improves index usage • Composite indexes are also used by subsets of the key, counting from the left
  28. Reusing Composite Indexes 1, 22, Mar 1 6, 14 Feb

    3 6, 14 Feb 5 6, 15 Mar 12 13, 5 Feb 22 32 3 99 42 12 Leaf Node • Given an index on (user_id, recipe_id, created_at) • Used for: ‣ user_id ‣ user_id, recipe_id ‣ user_id, recipe_id, created_at •Not Used for: ‣ recipe_id ‣ recipe_id, created_at ‣ created_at
  29. Caveats • More indexes will slow down insert, update and

    delete operations. It’s a tradeoff ‣ Therefore, only optimise when necessary • Applying indexes to a large table can lock the table ‣ You may need to use a tool like Percona Online Schema Change (pt-osc)
  30. Slow Count Queries That’s a lot of rows After some

    reports of slow page loads, I discovered a surprising query in the slow query logs
  31. Looks normal? SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON

    `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = ? • It’s counting the number of followers for a user
  32. Tables JE OBNF 1 Mary 2 Jenny 3 Alice JE

    GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users
  33. Self-referential has_many :through ! JE OBNF 1 Mary 2 Jenny

    3 Alice JE GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users
  34. Yep, Counter Cache # follow.rb belongs_to :follower, foreign_key: :follower_id, class_name:

    "User", counter_cache: :followees_count belongs_to :followee, foreign_key: :followee_id, class_name: "User", counter_cache: :followers_count # user.rb has_many :incoming_follows, class_name: "Follow", foreign_key: :followee_id, dependent: :destroy has_many :outgoing_follows, class_name: "Follow", foreign_key: :follower_id, dependent: :destroy has_many :followers, through: :incoming_follows has_many :followees, through: :outgoing_follows
  35. But, does it work? [2] pry(main)> user.followers.size [slave/1] (3752.5ms) SELECT

    COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = 2008217 => 1712 [3] pry(main)> user.followees.size [slave/1] (71.0ms) SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`followee_id` WHERE `follows`.`follower_id` = 2008217 => 1087 Nope.
  36. What’s the problem? • The counter cache column exists •

    The column is well-deeined on the Follow object • Can ActiveRecord eigure out the counter cache follow in this self-referential has_many :through?
  37. Hidden deep in the docs… 4.3.2 Options for has_many 4.3.2.4

    :counter_cache This option can be used to configure a custom named :counter_cache. You only need this option when you customized the name of your :counter_cache on the belongs_to association. • Turns out, in cases like this where Rails can’t guess the inverse_relation, we can be explicit
  38. The solution has_many :followers, -> { order("follows.id DESC") }, through:

    :incoming_follows, counter_cache: :followers_count has_many :followees, -> { order("follows.id DESC") }, through: :outgoing_follows, counter_cache: :followees_count [2] pry(main)> user.followers.size => 1712 • Now we’re using the cache!
  39. A big IN clause Mysql2::Error: MySQL server has gone away:

    SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = 4612396 AND `recipe_likes`.`user_id` IN (11526355, 10243093, 10233195, 11675568, 10241820, 11343209, 10239509, 10227928, 10226294, 10230604, 10225640, 10499978, 10223459, 10230036, 10228518, 10228876, 9111946, 11598081, 11598079, 11598074, 11598083, 10225955, 11598080, 11598070, 11598072, 11598061, 11598066, 11598058, 11598037, 11598068, 11598062, 11598043, 11598041, 11598038, 11598034, 11598046, 11598044, 11598042, 11598014, 11598009, 11598040, 11598036, 11598006, 11598003, 11597999, 11597996, 11598035, 11598031, 11598039, 11598030, 10229548, 11598025, 11598023, 11597989, 11598029, 11598024, 11597991, 11598020, 11597990, 11597987, 11598021, 11598019, 11598018, 11598013, 11598008, 11598012, 11598015, 11598011, 11597997, 11598001, 11597994, 11597993, 11597992, 11597988, 11597971, 11597967, 11597966, 11597960, 11597959, 11597954, 11597951, 11597948, 11597945, 11597986, 11597985, 11597983, 11597935, 10229554, 11597978, 11597974, 11597953, 11597958, 11597949, 11597947, 11597936, 11597956, 11597950, 11597943, 11597939, 11597937, 11597910, 11597912, 11597933, 11597931, 11597927, 11597925, 11597917, 11597919, 11597920, 11597918, 11597913, 11597896, 11597892, 11597888, 11597877, 11597921, 11597886, 11597914, 11597882, 11597879, 11597911, 11597909, 11597908, 11597904, 10225953, 11597903, 11597905, 11597902, 11597901, 11597893, 11597898, 11597846, 11597874, 11597849, 11597872, 11597869, 11597868, 11597866, 11597860, 11597864, 11597828, 11597863, 11597858, 11597850, 11597819, 11597862, 11597857, 11597854, 11597852, 11597842, 11597853, 11597856, 11597848, 11597839, 11597836, 11597837, 11597835, 11597817, 11597834, 11597832, 11597829, 11597821, 11597807, 11597803, 11597831, 11597833, 11597820, 11597805, 11597798, 11597818, 11597801, 11597816, 11597793, 11597813, 11597781, 11597778, 11597811, 11597806, 11597761, 11597799, 11597766, 11597804, 11597797, 10225733, 11597794, 11597768, 11597762, 11597759, 11597788, 11597789, 11597791, 11597790, 11597785, 11597786, 11597756, 11597784, 11597779, 11597782, 11597780, 11597775, 11597773, 11597739, 11597777, 10225943, 11597774, 11597767, 11597763, 11597765, 11597760, 11597736, 11597757, 11597755, 11597753, 11597751, 11597750, 11597734, 11597735, 11597733,
  40. Why is this happening? def liked_by_multiple_followees? recipe.
 recipe_likes. where(user: subscriber_followee_ids).

    count >= LIKES_REQUIRED_FOR_PUBLISH end • All we want is to eind out if a recipe has been liked by several followees… • But… what if there are lots of followees?
  41. A clearer look at the SQL SELECT `users`.id FROM `users`

    INNER JOIN `follows` ON `users`.`id` = `follows`.`followee_id` WHERE `follows`.`follower_id` = ? ORDER BY follows.id DESC SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = ? AND `recipe_likes`.`user_id` IN (?+);
  42. What does IN do? • You can think of the

    IN clause as being similar to an OR operation • Each element of the IN means a separate index lookup • A better index won’t help… we can only avoid using IN
  43. Do it in Ruby • All we really want is

    the intersection between followers and “likers” • It’s easy to add more app servers • It’s harder to add more database instances (even slaves) • In this case, even with (say) 1 million ids, doing it in Ruby is still fast
  44. The solution # Find the ids liker_ids = RecipeLike.where(recipe: recipe).pluck(:user_id)

    follower_ids = Follow.where(follower_id: subscriber).pluck(:followee_id) # Find the intersection liker_ids & subscriber_followee_ids • Using Ruby eliminates slow queries and is actually quite simple
  45. Cookpad’s approach to Perf • Time eixing performance problems is

    time not spent improving the product ‣ Have good hygiene (understand indexes, N+1s, etc) • Rails makes it easy to write clean code, but problems can be hard to detect ‣ But, clean code can make diagnosis easier • Prefer quick and easy solutions
  46. Takeaways • Understand how RDBMS execute queries: in particular, how

    indexes work • Don’t look at average performance, look at the 99th percentile • Use tools to eind the bottlenecks that matter