Database Performance for Rails Applications

Database Performance for Rails Applications @Great Western Rails March 20th,
2018

8IP"SF:PV -FPOBSE$IJO &OHJOFFSJOH.BOBHFS8FC%FWFMPQFS !$PPLQBE(MPCBM 5PLZP0⒏DF !ɹ lchin l15n

About Cookpad • Web/iOS/Android • 22 languages • 68 countries
• 30M monthly average users • 1.1 million+ recipes

Big, but not Huge • 10-20 Rails web app servers
• 5+ database servers (Aurora MySQL) • Memcached/Redis instances • CDN for assets (Fastly) • 1000s of req/s • Plenty of data

Fast Enough, On Average * NewRelic

On Average?

Average is Easy • It’s easy (and feels good) just
to look at the average • Unfortunately, it doesn’t tell you much. • Worse, looking at averages hides real performance problems.

Percentiles Response Time (ms) 45 54 44 42 42 41
51 40 250 24 35 36 32 32 49 28 40 28 980 18 50 30 30 42 50 38 39 60 32 42 • Average: 70ms • Median: 40ms • 90th Percentile: 54.6ms • 95th Percentile: 193ms • 99th Percentile: 812ms • Maximum: 980ms * This is illustrative example data

The 99th Percentile • Slower than the other 99% •
Often slow because they have more data • Probably your best (heaviest!) users • They’re your 1%

The 1%

The 1% at Cookpad • Users who publish lots of
recipes • Users who bookmark and cook lots of recipes • Uses who follow lots of other users

What’s the pattern here?

Big (well, medium) Data • Tables with lots of records
(e.g. millions) • Users with lots of records (10,000s) • Commonly used tables (lots of load) • i.e. database performance

Finding Slow Endpoints If averages are useless, where do we
look? • Use performance monitoring that can eind your 99th percentile slow transactions ‣ e.g. NewRelic, Skylight • Look at your database’s slow query log • Look at your error logs for timeouts • Listen to your best users

DATABASE PERFORMANCE WAR STORIES Examples from Cookpad’s Ruby on Rails
app

(1) “My Bookmarks Don’t Load”

My Bookmarks Don’t Load Rack::Timeout::RequestTimeoutException: Request ran for longer than
19000ms app/views/bookmarks/_list.html.erb:4:in `block in _app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <%= render partial: "bookmarks/bookmark", collection: bookmarks %> app/views/bookmarks/_list.html.erb:2:in `tap' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/bookmarks/_list.html.erb:2:in `_app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/layouts/_bookmarks_all.html.erb:5:in `_app_views_layouts__bookmarks_all_html_erb__1020300051023946889_70283969077 500' <%= render "bookmarks/list", bookmarks: current_user.bookmarks.listing.recent, name: "recent_bookmarks" %> app/views/layouts/_sidebar.html.erb:87:in We were getting timeouts in our error logs on Sentry

But… endpoint looks fast?

Oh right, averages.

99th Percentile is terrible Average is 70ms, 99% is >
500ms!

What’s the problem? • This is our top page •
Usually, super fast • Unless….

User-specieic content • There’s a list of user bookmarks in
the menu • There are lots of bookmarks (100M records) • A user can have a LOT of bookmarks

The implementation # In the view current_user.bookmarks.listing.recent # As SQL
SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Looks simple enough?

Let’s look at the slow query log This is not
good

You an EXPLAIN locally, too # In the view current_user.bookmarks.listing.recent.explain
# As SQL EXPLAIN SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ?;

Production results are better • The amount of data you
have is going to greatly affect your query performance • The load on the database’s CPU from other queries will also affect performance • You can get good test results locally, but nothing beats production results

Understanding the EXPLAIN We use MySQL (Aurora), but the principles
apply to most RDBMS • We’re using an index. This is good. • extras has using 2ilesort. This is bad. • rows_examined is quite large. This is bad, sometimes.

Why? • Why is an index good to have an
index? • Why is eilesort bad? • Why does rows_examined matter?

A quick dive into Relational Databases * Mostly MySQL

Rails App Database Database Client Database Server SQL Query Result
Set Let’s find out what goes on in here

Database Internals Indexes and tables are stored separately. • Table
rows are stored on disk as… ‣ a B-tree in som databases MySQL (InnoDB) ‣ a Heap table in others (e.g. Postgres) • Indexes are (also) B-trees where the leaf nodes point to the table rows

What’s a B-tree? • It’s sorted, tree data structure •
It self-balances and can have lots of branches • It’s fast: Logarithmic time for search ‣ i.e. O(log n) ‣ Great for indexes

Index as a B-Tree 25 40 1 6 13 32
3 4 28 35 1 99 110 120 123 78 12 54 65 90 75 78 89 41 2 7 Leaf nodes are primary keys (row ids) pointing to the record

CS 101 • A SELECT is a search • Searching
requires sorted data • Sorting is slow • Searching is fast with a B-tree

Back to the query # In the view current_user.bookmarks.listing.recent #
As SQL SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Why is this slow?

The index # index_bookmarks_on_user_id t.index [“user_id"] • This matches the
WHERE perfectly! SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ?

But… # index_bookmarks_on_user_id t.index [“user_id"] • We’re sorting the result
with ORDER BY SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Hence, using ﬁlesort

What does the database do? • Finds all records with
given user_id using the index (fast) • Retrieves ALL records from the table ‣ despite the LIMIT/OFFSET • Sorts the records by visited_at • Returns LIMIT records at OFFSET

Sorting is slow

Composite Indexes 25, Feb 1 40, Jan 40 1, Mar
1 6, Feb 3 6, Feb 5 6, Mar 12 13, Feb 22 32 3 99 42 12 Branch Node Leaf Node For an index like (user_id, visited_at), you can visualise the sorting on the second column as above

Note on Composite Indexes • Often, single column indexes are
a smell • As the application developer, if you know what queries you make, specifying composite indexes improves index usage • Composite indexes are also used by subsets of the key, counting from the left

Reusing Composite Indexes 1, 22, Mar 1 6, 14 Feb
3 6, 14 Feb 5 6, 15 Mar 12 13, 5 Feb 22 32 3 99 42 12 Leaf Node • Given an index on (user_id, recipe_id, created_at) • Used for: ‣ user_id ‣ user_id, recipe_id ‣ user_id, recipe_id, created_at •Not Used for: ‣ recipe_id ‣ recipe_id, created_at ‣ created_at

Caveats • More indexes will slow down insert, update and
delete operations. It’s a tradeoff ‣ Therefore, only optimise when necessary • Applying indexes to a large table can lock the table ‣ You may need to use a tool like Percona Online Schema Change (pt-osc)

(2) “Too many count queries”

Slow Count Queries That’s a lot of rows After some
reports of slow page loads, I discovered a surprising query in the slow query logs

Looks normal? SELECT COUNT(*) FROM ùsers` INNER JOIN `follows` ON
ùsers`.ìd` = `follows`.`follower_id` WHERE `follows`.`followee_id` = ? • It’s counting the number of followers for a user

Tables JE OBNF 1 Mary 2 Jenny 3 Alice JE
GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users

Self-referential has_many :through ! JE OBNF 1 Mary 2 Jenny
3 Alice JE GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users

Counter Cache? def follower_count object.followers.size end • You’d expect that
we’d just use a counter cache to eliminate a query

Yep, Counter Cache # follow.rb belongs_to :follower, foreign_key: :follower_id, class_name:
"User", counter_cache: :followees_count belongs_to :followee, foreign_key: :followee_id, class_name: "User", counter_cache: :followers_count # user.rb has_many :incoming_follows, class_name: "Follow", foreign_key: :followee_id, dependent: :destroy has_many :outgoing_follows, class_name: "Follow", foreign_key: :follower_id, dependent: :destroy has_many :followers, through: :incoming_follows has_many :followees, through: :outgoing_follows

But, does it work? [2] pry(main)> user.followers.size [slave/1] (3752.5ms) SELECT
COUNT(*) FROM ùsers` INNER JOIN `follows` ON ùsers`.ìd` = `follows`.`follower_id` WHERE `follows`.`followee_id` = 2008217 => 1712 [3] pry(main)> user.followees.size [slave/1] (71.0ms) SELECT COUNT(*) FROM ùsers` INNER JOIN `follows` ON ùsers`.ìd` = `follows`.`followee_id` WHERE `follows`.`follower_id` = 2008217 => 1087 Nope.

What’s the problem? • The counter cache column exists •
The column is well-deeined on the Follow object • Can ActiveRecord eigure out the counter cache follow in this self-referential has_many :through?

Hidden deep in the docs… 4.3.2 Options for has_many 4.3.2.4
:counter_cache This option can be used to configure a custom named :counter_cache. You only need this option when you customized the name of your :counter_cache on the belongs_to association. • Turns out, in cases like this where Rails can’t guess the inverse_relation, we can be explicit

The solution has_many :followers, -> { order("follows.id DESC") }, through:
:incoming_follows, counter_cache: :followers_count has_many :followees, -> { order("follows.id DESC") }, through: :outgoing_follows, counter_cache: :followees_count [2] pry(main)> user.followers.size => 1712 • Now we’re using the cache!

(3) “Big IN Clause”

A big IN clause Mysql2::Error: MySQL server has gone away:
SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = 4612396 AND `recipe_likes`.`user_id` IN (11526355, 10243093, 10233195, 11675568, 10241820, 11343209, 10239509, 10227928, 10226294, 10230604, 10225640, 10499978, 10223459, 10230036, 10228518, 10228876, 9111946, 11598081, 11598079, 11598074, 11598083, 10225955, 11598080, 11598070, 11598072, 11598061, 11598066, 11598058, 11598037, 11598068, 11598062, 11598043, 11598041, 11598038, 11598034, 11598046, 11598044, 11598042, 11598014, 11598009, 11598040, 11598036, 11598006, 11598003, 11597999, 11597996, 11598035, 11598031, 11598039, 11598030, 10229548, 11598025, 11598023, 11597989, 11598029, 11598024, 11597991, 11598020, 11597990, 11597987, 11598021, 11598019, 11598018, 11598013, 11598008, 11598012, 11598015, 11598011, 11597997, 11598001, 11597994, 11597993, 11597992, 11597988, 11597971, 11597967, 11597966, 11597960, 11597959, 11597954, 11597951, 11597948, 11597945, 11597986, 11597985, 11597983, 11597935, 10229554, 11597978, 11597974, 11597953, 11597958, 11597949, 11597947, 11597936, 11597956, 11597950, 11597943, 11597939, 11597937, 11597910, 11597912, 11597933, 11597931, 11597927, 11597925, 11597917, 11597919, 11597920, 11597918, 11597913, 11597896, 11597892, 11597888, 11597877, 11597921, 11597886, 11597914, 11597882, 11597879, 11597911, 11597909, 11597908, 11597904, 10225953, 11597903, 11597905, 11597902, 11597901, 11597893, 11597898, 11597846, 11597874, 11597849, 11597872, 11597869, 11597868, 11597866, 11597860, 11597864, 11597828, 11597863, 11597858, 11597850, 11597819, 11597862, 11597857, 11597854, 11597852, 11597842, 11597853, 11597856, 11597848, 11597839, 11597836, 11597837, 11597835, 11597817, 11597834, 11597832, 11597829, 11597821, 11597807, 11597803, 11597831, 11597833, 11597820, 11597805, 11597798, 11597818, 11597801, 11597816, 11597793, 11597813, 11597781, 11597778, 11597811, 11597806, 11597761, 11597799, 11597766, 11597804, 11597797, 10225733, 11597794, 11597768, 11597762, 11597759, 11597788, 11597789, 11597791, 11597790, 11597785, 11597786, 11597756, 11597784, 11597779, 11597782, 11597780, 11597775, 11597773, 11597739, 11597777, 10225943, 11597774, 11597767, 11597763, 11597765, 11597760, 11597736, 11597757, 11597755, 11597753, 11597751, 11597750, 11597734, 11597735, 11597733,

Why is this happening? def liked_by_multiple_followees? recipe.  recipe_likes. where(user: subscriber_followee_ids).
count >= LIKES_REQUIRED_FOR_PUBLISH end • All we want is to eind out if a recipe has been liked by several followees… • But… what if there are lots of followees?

A clearer look at the SQL SELECT ùsers`.id FROM ùsers`
INNER JOIN `follows` ON ùsers`.ìd` = `follows`.`followee_id` WHERE `follows`.`follower_id` = ? ORDER BY follows.id DESC SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = ? AND `recipe_likes`.ùser_id` IN (?+);

What does IN do? • You can think of the
IN clause as being similar to an OR operation • Each element of the IN means a separate index lookup • A better index won’t help… we can only avoid using IN

Do it in Ruby • All we really want is
the intersection between followers and “likers” • It’s easy to add more app servers • It’s harder to add more database instances (even slaves) • In this case, even with (say) 1 million ids, doing it in Ruby is still fast

The solution # Find the ids liker_ids = RecipeLike.where(recipe: recipe).pluck(:user_id)
follower_ids = Follow.where(follower_id: subscriber).pluck(:followee_id) # Find the intersection liker_ids & subscriber_followee_ids • Using Ruby eliminates slow queries and is actually quite simple

Cookpad’s approach to Perf • Time eixing performance problems is
time not spent improving the product ‣ Have good hygiene (understand indexes, N+1s, etc) • Rails makes it easy to write clean code, but problems can be hard to detect ‣ But, clean code can make diagnosis easier • Prefer quick and easy solutions

Takeaways • Understand how RDBMS execute queries: in particular, how
indexes work • Don’t look at average performance, look at the 99th percentile • Use tools to eind the bottlenecks that matter

Questions?

Database Performance for Rails Applications

Database Performance for Rails Applications

More Decks by Leonard Chin

Other Decks in Technology

Featured

Transcript