Slide 1

Slide 1 text

Database Performance for Rails Applications @Great Western Rails March 20th, 2018

Slide 2

Slide 2 text

8IP"SF:PV -FPOBSE$IJO &OHJOFFSJOH.BOBHFS8FC%FWFMPQFS !$PPLQBE(MPCBM 5PLZP0⒏DF !ɹ lchin l15n

Slide 3

Slide 3 text

About Cookpad • Web/iOS/Android • 22 languages • 68 countries • 30M monthly average users • 1.1 million+ recipes

Slide 4

Slide 4 text

Big, but not Huge • 10-20 Rails web app servers • 5+ database servers (Aurora MySQL) • Memcached/Redis instances • CDN for assets (Fastly) • 1000s of req/s • Plenty of data

Slide 5

Slide 5 text

Fast Enough, On Average * NewRelic

Slide 6

Slide 6 text

On Average?

Slide 7

Slide 7 text

Average is Easy • It’s easy (and feels good) just to look at the average • Unfortunately, it doesn’t tell you much. • Worse, looking at averages hides real performance problems.

Slide 8

Slide 8 text

Percentiles Response Time (ms) 45 54 44 42 42 41 51 40 250 24 35 36 32 32 49 28 40 28 980 18 50 30 30 42 50 38 39 60 32 42 • Average: 70ms • Median: 40ms • 90th Percentile: 54.6ms • 95th Percentile: 193ms • 99th Percentile: 812ms • Maximum: 980ms * This is illustrative example data

Slide 9

Slide 9 text

The 99th Percentile • Slower than the other 99% • Often slow because they have more data • Probably your best (heaviest!) users • They’re your 1%

Slide 10

Slide 10 text

The 1%

Slide 11

Slide 11 text

The 1% at Cookpad • Users who publish lots of recipes • Users who bookmark and cook lots of recipes • Uses who follow lots of other users

Slide 12

Slide 12 text

What’s the pattern here?

Slide 13

Slide 13 text

Big (well, medium) Data • Tables with lots of records (e.g. millions) • Users with lots of records (10,000s) • Commonly used tables (lots of load) • i.e. database performance

Slide 14

Slide 14 text

Finding Slow Endpoints If averages are useless, where do we look? • Use performance monitoring that can eind your 99th percentile slow transactions ‣ e.g. NewRelic, Skylight • Look at your database’s slow query log • Look at your error logs for timeouts • Listen to your best users

Slide 15

Slide 15 text

DATABASE PERFORMANCE WAR STORIES Examples from Cookpad’s Ruby on Rails app

Slide 16

Slide 16 text

(1) “My Bookmarks Don’t Load”

Slide 17

Slide 17 text

My Bookmarks Don’t Load Rack::Timeout::RequestTimeoutException: Request ran for longer than 19000ms app/views/bookmarks/_list.html.erb:4:in `block in _app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <%= render partial: "bookmarks/bookmark", collection: bookmarks %> app/views/bookmarks/_list.html.erb:2:in `tap' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/bookmarks/_list.html.erb:2:in `_app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/layouts/_bookmarks_all.html.erb:5:in `_app_views_layouts__bookmarks_all_html_erb__1020300051023946889_70283969077 500' <%= render "bookmarks/list", bookmarks: current_user.bookmarks.listing.recent, name: "recent_bookmarks" %> app/views/layouts/_sidebar.html.erb:87:in We were getting timeouts in our error logs on Sentry

Slide 18

Slide 18 text

But… endpoint looks fast?

Slide 19

Slide 19 text

Oh right, averages.

Slide 20

Slide 20 text

99th Percentile is terrible Average is 70ms, 99% is > 500ms!

Slide 21

Slide 21 text

What’s the problem? • This is our top page • Usually, super fast • Unless….

Slide 22

Slide 22 text

User-specieic content • There’s a list of user bookmarks in the menu • There are lots of bookmarks (100M records) • A user can have a LOT of bookmarks

Slide 23

Slide 23 text

The implementation # In the view current_user.bookmarks.listing.recent # As SQL SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Looks simple enough?

Slide 24

Slide 24 text

Let’s look at the slow query log This is not good

Slide 25

Slide 25 text

You an EXPLAIN locally, too # In the view current_user.bookmarks.listing.recent.explain # As SQL EXPLAIN SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ?;

Slide 26

Slide 26 text

Production results are better • The amount of data you have is going to greatly affect your query performance • The load on the database’s CPU from other queries will also affect performance • You can get good test results locally, but nothing beats production results

Slide 27

Slide 27 text

Understanding the EXPLAIN We use MySQL (Aurora), but the principles apply to most RDBMS • We’re using an index. This is good. • extras has using 2ilesort. This is bad. • rows_examined is quite large. This is bad, sometimes.

Slide 28

Slide 28 text

Why? • Why is an index good to have an index? • Why is eilesort bad? • Why does rows_examined matter?

Slide 29

Slide 29 text

A quick dive into Relational Databases * Mostly MySQL

Slide 30

Slide 30 text

Rails App Database Database Client Database Server SQL Query Result Set Let’s find out what goes on in here

Slide 31

Slide 31 text

Database Internals Indexes and tables are stored separately. • Table rows are stored on disk as… ‣ a B-tree in som databases MySQL (InnoDB) ‣ a Heap table in others (e.g. Postgres) • Indexes are (also) B-trees where the leaf nodes point to the table rows

Slide 32

Slide 32 text

What’s a B-tree? • It’s sorted, tree data structure • It self-balances and can have lots of branches • It’s fast: Logarithmic time for search ‣ i.e. O(log n) ‣ Great for indexes

Slide 33

Slide 33 text

Index as a B-Tree 25 40 1 6 13 32 3 4 28 35 1 99 110 120 123 78 12 54 65 90 75 78 89 41 2 7 Leaf nodes are primary keys (row ids) pointing to the record

Slide 34

Slide 34 text

CS 101 • A SELECT is a search • Searching requires sorted data • Sorting is slow • Searching is fast with a B-tree

Slide 35

Slide 35 text

Back to the query # In the view current_user.bookmarks.listing.recent # As SQL SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Why is this slow?

Slide 36

Slide 36 text

The index # index_bookmarks_on_user_id t.index [“user_id"] • This matches the WHERE perfectly! SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ?

Slide 37

Slide 37 text

But… # index_bookmarks_on_user_id t.index [“user_id"] • We’re sorting the result with ORDER BY SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Hence, using filesort

Slide 38

Slide 38 text

What does the database do? • Finds all records with given user_id using the index (fast) • Retrieves ALL records from the table ‣ despite the LIMIT/OFFSET • Sorts the records by visited_at • Returns LIMIT records at OFFSET

Slide 39

Slide 39 text

Sorting is slow

Slide 40

Slide 40 text

The solution • Use a sorted composite index t.index [“user_id”, “visited_at”] +----+-------------+-----------+------+---------+-------+------+-------------+ | id | select_type | table | type | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------+-------+------+-------------+ | 1 | SIMPLE | bookmarks | ref | 5 | const | 2523 | Using where | +----+-------------+-----------+------+---------+-------+------+------------- • This eliminates the eilesort, and now its fast

Slide 41

Slide 41 text

Composite Indexes 25, Feb 1 40, Jan 40 1, Mar 1 6, Feb 3 6, Feb 5 6, Mar 12 13, Feb 22 32 3 99 42 12 Branch Node Leaf Node For an index like (user_id, visited_at), you can visualise the sorting on the second column as above

Slide 42

Slide 42 text

Note on Composite Indexes • Often, single column indexes are a smell • As the application developer, if you know what queries you make, specifying composite indexes improves index usage • Composite indexes are also used by subsets of the key, counting from the left

Slide 43

Slide 43 text

Reusing Composite Indexes 1, 22, Mar 1 6, 14 Feb 3 6, 14 Feb 5 6, 15 Mar 12 13, 5 Feb 22 32 3 99 42 12 Leaf Node • Given an index on (user_id, recipe_id, created_at) • Used for: ‣ user_id ‣ user_id, recipe_id ‣ user_id, recipe_id, created_at •Not Used for: ‣ recipe_id ‣ recipe_id, created_at ‣ created_at

Slide 44

Slide 44 text

Caveats • More indexes will slow down insert, update and delete operations. It’s a tradeoff ‣ Therefore, only optimise when necessary • Applying indexes to a large table can lock the table ‣ You may need to use a tool like Percona Online Schema Change (pt-osc)

Slide 45

Slide 45 text

(2) “Too many count queries”

Slide 46

Slide 46 text

Slow Count Queries That’s a lot of rows After some reports of slow page loads, I discovered a surprising query in the slow query logs

Slide 47

Slide 47 text

Looks normal? SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = ? • It’s counting the number of followers for a user

Slide 48

Slide 48 text

Tables JE OBNF 1 Mary 2 Jenny 3 Alice JE GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users

Slide 49

Slide 49 text

Self-referential has_many :through ! JE OBNF 1 Mary 2 Jenny 3 Alice JE GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users

Slide 50

Slide 50 text

Counter Cache? def follower_count object.followers.size end • You’d expect that we’d just use a counter cache to eliminate a query

Slide 51

Slide 51 text

Yep, Counter Cache # follow.rb belongs_to :follower, foreign_key: :follower_id, class_name: "User", counter_cache: :followees_count belongs_to :followee, foreign_key: :followee_id, class_name: "User", counter_cache: :followers_count # user.rb has_many :incoming_follows, class_name: "Follow", foreign_key: :followee_id, dependent: :destroy has_many :outgoing_follows, class_name: "Follow", foreign_key: :follower_id, dependent: :destroy has_many :followers, through: :incoming_follows has_many :followees, through: :outgoing_follows

Slide 52

Slide 52 text

But, does it work? [2] pry(main)> user.followers.size [slave/1] (3752.5ms) SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = 2008217 => 1712 [3] pry(main)> user.followees.size [slave/1] (71.0ms) SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`followee_id` WHERE `follows`.`follower_id` = 2008217 => 1087 Nope.

Slide 53

Slide 53 text

What’s the problem? • The counter cache column exists • The column is well-deeined on the Follow object • Can ActiveRecord eigure out the counter cache follow in this self-referential has_many :through?

Slide 54

Slide 54 text

Hidden deep in the docs… 4.3.2 Options for has_many 4.3.2.4 :counter_cache This option can be used to configure a custom named :counter_cache. You only need this option when you customized the name of your :counter_cache on the belongs_to association. • Turns out, in cases like this where Rails can’t guess the inverse_relation, we can be explicit

Slide 55

Slide 55 text

The solution has_many :followers, -> { order("follows.id DESC") }, through: :incoming_follows, counter_cache: :followers_count has_many :followees, -> { order("follows.id DESC") }, through: :outgoing_follows, counter_cache: :followees_count [2] pry(main)> user.followers.size => 1712 • Now we’re using the cache!

Slide 56

Slide 56 text

(3) “Big IN Clause”

Slide 57

Slide 57 text

A big IN clause Mysql2::Error: MySQL server has gone away: SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = 4612396 AND `recipe_likes`.`user_id` IN (11526355, 10243093, 10233195, 11675568, 10241820, 11343209, 10239509, 10227928, 10226294, 10230604, 10225640, 10499978, 10223459, 10230036, 10228518, 10228876, 9111946, 11598081, 11598079, 11598074, 11598083, 10225955, 11598080, 11598070, 11598072, 11598061, 11598066, 11598058, 11598037, 11598068, 11598062, 11598043, 11598041, 11598038, 11598034, 11598046, 11598044, 11598042, 11598014, 11598009, 11598040, 11598036, 11598006, 11598003, 11597999, 11597996, 11598035, 11598031, 11598039, 11598030, 10229548, 11598025, 11598023, 11597989, 11598029, 11598024, 11597991, 11598020, 11597990, 11597987, 11598021, 11598019, 11598018, 11598013, 11598008, 11598012, 11598015, 11598011, 11597997, 11598001, 11597994, 11597993, 11597992, 11597988, 11597971, 11597967, 11597966, 11597960, 11597959, 11597954, 11597951, 11597948, 11597945, 11597986, 11597985, 11597983, 11597935, 10229554, 11597978, 11597974, 11597953, 11597958, 11597949, 11597947, 11597936, 11597956, 11597950, 11597943, 11597939, 11597937, 11597910, 11597912, 11597933, 11597931, 11597927, 11597925, 11597917, 11597919, 11597920, 11597918, 11597913, 11597896, 11597892, 11597888, 11597877, 11597921, 11597886, 11597914, 11597882, 11597879, 11597911, 11597909, 11597908, 11597904, 10225953, 11597903, 11597905, 11597902, 11597901, 11597893, 11597898, 11597846, 11597874, 11597849, 11597872, 11597869, 11597868, 11597866, 11597860, 11597864, 11597828, 11597863, 11597858, 11597850, 11597819, 11597862, 11597857, 11597854, 11597852, 11597842, 11597853, 11597856, 11597848, 11597839, 11597836, 11597837, 11597835, 11597817, 11597834, 11597832, 11597829, 11597821, 11597807, 11597803, 11597831, 11597833, 11597820, 11597805, 11597798, 11597818, 11597801, 11597816, 11597793, 11597813, 11597781, 11597778, 11597811, 11597806, 11597761, 11597799, 11597766, 11597804, 11597797, 10225733, 11597794, 11597768, 11597762, 11597759, 11597788, 11597789, 11597791, 11597790, 11597785, 11597786, 11597756, 11597784, 11597779, 11597782, 11597780, 11597775, 11597773, 11597739, 11597777, 10225943, 11597774, 11597767, 11597763, 11597765, 11597760, 11597736, 11597757, 11597755, 11597753, 11597751, 11597750, 11597734, 11597735, 11597733,

Slide 58

Slide 58 text

Why is this happening? def liked_by_multiple_followees? recipe.
 recipe_likes. where(user: subscriber_followee_ids). count >= LIKES_REQUIRED_FOR_PUBLISH end • All we want is to eind out if a recipe has been liked by several followees… • But… what if there are lots of followees?

Slide 59

Slide 59 text

A clearer look at the SQL SELECT `users`.id FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`followee_id` WHERE `follows`.`follower_id` = ? ORDER BY follows.id DESC SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = ? AND `recipe_likes`.`user_id` IN (?+);

Slide 60

Slide 60 text

What does IN do? • You can think of the IN clause as being similar to an OR operation • Each element of the IN means a separate index lookup • A better index won’t help… we can only avoid using IN

Slide 61

Slide 61 text

Do it in Ruby • All we really want is the intersection between followers and “likers” • It’s easy to add more app servers • It’s harder to add more database instances (even slaves) • In this case, even with (say) 1 million ids, doing it in Ruby is still fast

Slide 62

Slide 62 text

The solution # Find the ids liker_ids = RecipeLike.where(recipe: recipe).pluck(:user_id) follower_ids = Follow.where(follower_id: subscriber).pluck(:followee_id) # Find the intersection liker_ids & subscriber_followee_ids • Using Ruby eliminates slow queries and is actually quite simple

Slide 63

Slide 63 text

Cookpad’s approach to Perf • Time eixing performance problems is time not spent improving the product ‣ Have good hygiene (understand indexes, N+1s, etc) • Rails makes it easy to write clean code, but problems can be hard to detect ‣ But, clean code can make diagnosis easier • Prefer quick and easy solutions

Slide 64

Slide 64 text

Takeaways • Understand how RDBMS execute queries: in particular, how indexes work • Don’t look at average performance, look at the 99th percentile • Use tools to eind the bottlenecks that matter

Slide 65

Slide 65 text

Questions?