Slide 1

Slide 1 text

DATABASE PERFORMANCE FOR RUBY ON RAILS APPLICATIONS Stories from real world applications

Slide 2

Slide 2 text

Hi, I’m Leonard! lchin l15n

Slide 3

Slide 3 text

Engineering@

Slide 4

Slide 4 text

A global cooking community platform

Slide 5

Slide 5 text

25 Languages 70+ Countries

Slide 6

Slide 6 text

Using Ruby and Rails since 2007

Slide 7

Slide 7 text

… we have a lot of Rails apps

Slide 8

Slide 8 text

Here’s an (imaginary) mapping:

Slide 9

Slide 9 text

Simple Complex 10 100 1k 10k 100k 1m 10m 100m 1b+ Total Users Ruby on Rails apps at Cookpad * for illustrative purposes only

Slide 10

Slide 10 text

Simple Complex 10 100 1k 10k 100k 1m 10m 100m 1b+ Total Users Ruby on Rails apps at Cookpad * for illustrative purposes only Internal admin / business apps Internal tools Supporting Services New and peripheral products

Slide 11

Slide 11 text

Simple Complex 10 100 1k 10k 100k 1m 10m 100m 1b+ Total Users Ruby on Rails apps at Cookpad * for illustrative purposes only Internal admin / business apps Internal tools Supporting Services New and peripheral products Two “Cookpads”

Slide 12

Slide 12 text

Two Cookpads?

Slide 13

Slide 13 text

#

Slide 14

Slide 14 text

Simple Complex 10 100 1k 10k 100k 1m 10m 100m 1b+ Total Users Ruby on Rails apps at Cookpad * for illustrative purposes only Two “Cookpads”

Slide 15

Slide 15 text

Complex, with lots of users

Slide 16

Slide 16 text

# 300 controllers 400+ models 40m users/month 300 controllers 1200+ models 60m users/month

Slide 17

Slide 17 text

Performance gets harder

Slide 18

Slide 18 text

More Servers, Faster Servers

Slide 19

Slide 19 text

More Servers, Faster Servers Only gets you so far

Slide 20

Slide 20 text

More Code? More errors

Slide 21

Slide 21 text

More Users? Harder to notice slow users

Slide 22

Slide 22 text

More Data? Harder to reproduce issues

Slide 23

Slide 23 text

Good tools are essential

Slide 24

Slide 24 text

Error tracking e.g. Sentry

Slide 25

Slide 25 text

Performance Monitoring e.g. New Relic

Slide 26

Slide 26 text

Indexed (Slow Query) Logs e.g. Kibana + ElasticSearch

Slide 27

Slide 27 text

Let’s have a look at some problems

Slide 28

Slide 28 text

DATABASE PERFORMANCE WAR STORIES Examples from the Cookpad Global Ruby on Rails app

Slide 29

Slide 29 text

(1) “MY BOOKMARKS DON’T LOAD!”

Slide 30

Slide 30 text

We started seeing a lot of weird timeout errors in Sentry

Slide 31

Slide 31 text

Rack::Timeout::RequestTimeoutException: Request ran for longer than 19000ms app/views/bookmarks/_list.html.erb:4:in `block in _app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <%= render partial: "bookmarks/bookmark", collection: bookmarks %> app/views/bookmarks/_list.html.erb:2:in `tap' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/bookmarks/_list.html.erb:2:in `_app_views_bookmarks__list_html_erb__1896250766765096406_70283982598900' <% bookmarks.page(params[page]).tap do |bookmarks| %> app/views/layouts/_bookmarks_all.html.erb:5:in `_app_views_layouts__bookmarks_all_html_erb__1020300051023946889_702839690775 00' <%= render "bookmarks/list", bookmarks: current_user.bookmarks.listing.recent, name: "recent_bookmarks" %> app/views/layouts/_sidebar.html.erb:87:in `_app_views_layouts__sidebar_html_erb___3139679681373029435_70284006194440' <%= render "layouts/bookmarks_all" %>

Slide 32

Slide 32 text

But… the endpoint looks fast on New Relic

Slide 33

Slide 33 text

Average response time: 62.2ms

Slide 34

Slide 34 text

average response time?

Slide 35

Slide 35 text

Averages hide the real people behind each request

Slide 36

Slide 36 text

Response Time (ms) 45 54 44 42 42 41 51 40 250 24 35 36 32 32 49 28 40 28 980 18 50 30 30 42 50 38 39 60 32 42 * This is illustrative example data • Average: 70ms • Median: 40ms • 90th Percentile: 54.6ms • 95th Percentile: 193ms • 99th Percentile: 812ms • Maximum: 980ms

Slide 37

Slide 37 text

Think in percentiles

Slide 38

Slide 38 text

the 99th percentile are slower than 99% of other users

Slide 39

Slide 39 text

Why is that?

Slide 40

Slide 40 text

Because they have more data

Slide 41

Slide 41 text

Their database queries are slow

Slide 42

Slide 42 text

At Cookpad, they’re the ones who publish the most recipes

Slide 43

Slide 43 text

At Cookpad, they’re the ones who bookmark the most recipes

Slide 44

Slide 44 text

They’re our most active users

Slide 45

Slide 45 text

They’re our most valuable users

Slide 46

Slide 46 text

And they’re thinking…

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Let’s have a look at the 99th percentile back on NewRelic

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

The 99%ile: 514ms

Slide 51

Slide 51 text

The 99%ile: 514ms 7x slower than theaverage

Slide 52

Slide 52 text

The problem, in context

Slide 53

Slide 53 text

Here’s our top page

Slide 54

Slide 54 text

It loads fast

Slide 55

Slide 55 text

It loads fast …most of the time.

Slide 56

Slide 56 text

Notice this user- specific content: Recipe bookmarks

Slide 57

Slide 57 text

Notice this user- specific content: Recipe bookmarks

Slide 58

Slide 58 text

We have a lot of Recipe Bookmarks

Slide 59

Slide 59 text

We have a lot of Recipe Bookmarks 100 million+

Slide 60

Slide 60 text

THE IMPLEMENTATION # In the view current_user.bookmarks.listing.recent # As SQL SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ?

Slide 61

Slide 61 text

You could EXPLAIN locally… # In the view current_user.bookmarks.listing.recent.explain # As SQL EXPLAIN SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ?;

Slide 62

Slide 62 text

# In the view current_user.bookmarks.listing.recent.explain # As SQL EXPLAIN SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ?; JUST ADD EXPLAIN You could EXPLAIN locally…

Slide 63

Slide 63 text

Running EXPLAIN in production is more useful

Slide 64

Slide 64 text

Running EXPLAIN in production produces different results

Slide 65

Slide 65 text

Running EXPLAIN in production produces different results ✅ DATA CHANGES BEHAVIOUR

Slide 66

Slide 66 text

Running EXPLAIN in production produces different results ✅ DATA CHANGES BEHAVIOUR ✅ SYSTEM LOAD FROM OTHER QUERIES

Slide 67

Slide 67 text

Let’s have a look at the EXPLAIN from the production slow query log

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

Note: we use MySQL (Aurora) but the same principles apply with other RDBMS

Slide 70

Slide 70 text

GREAT! WE’RE USING AN INDEX

Slide 71

Slide 71 text

GREAT! WE’RE USING AN INDEX FILESORT … THIS IS NOT GOOD

Slide 72

Slide 72 text

FILESORT … THIS IS NOT GOOD LOTS OF ROWS EXAMINED… ALSO NOT GOOD GREAT! WE’RE USING AN INDEX

Slide 73

Slide 73 text

Why is it good to use an index?

Slide 74

Slide 74 text

Why is it bad to filesort?

Slide 75

Slide 75 text

Why does rows_examined matter?

Slide 76

Slide 76 text

Let’s find out!

Slide 77

Slide 77 text

A Quick Dive Into Relational Databases * Mostly MySQL with InnoDB

Slide 78

Slide 78 text

Ruby on Rails App Database Database Client Database Server SQL Query Result Set

Slide 79

Slide 79 text

Ruby on Rails App Database Database Client Database Server SQL Query Result Set Let’s find out what goes on in here

Slide 80

Slide 80 text

Database Internals

Slide 81

Slide 81 text

Database Internals (the easy version)

Slide 82

Slide 82 text

Database Internals (the easy version) ✅ Your data is stored as rows in a table on disk

Slide 83

Slide 83 text

Database Internals (the easy version) ✅ Your data is stored as rows in a table on disk ✅ Indexes are stored separately as B-Trees, with leaf nodes pointing to table rows

Slide 84

Slide 84 text

B-Trees (the easy version)

Slide 85

Slide 85 text

B-Trees (the easy version) ✅ A sorted, tree data structure

Slide 86

Slide 86 text

B-Trees (the easy version) ✅ A sorted, tree data structure ✅ It self-balances on insert/delete

Slide 87

Slide 87 text

B-Trees (the easy version) ✅ A sorted, tree data structure ✅ It self-balances on insert/delete ✅ It’s fast: O(log n) for searching

Slide 88

Slide 88 text

Index as a B-Tree 25 40 1 6 13 32 3 4 28 35 1 99 110 120 123 78 12 54 65 90 75 78 89 41 2 7 Leaf nodes are primary keys (row ids) pointing to the table record

Slide 89

Slide 89 text

Why is this relevant?

Slide 90

Slide 90 text

A SELECT is really just a search

Slide 91

Slide 91 text

To search, you need sorted data

Slide 92

Slide 92 text

Sorting is slow

Slide 93

Slide 93 text

Sorting is slow at best O(n log n) at worst O(n^2)

Slide 94

Slide 94 text

Sorting is slow at best O(n log n) at worst O(n^2)

Slide 95

Slide 95 text

A B-Tree index lets you search fast without scanning & sorting all data

Slide 96

Slide 96 text

Back to the slow query

Slide 97

Slide 97 text

# In the view current_user.bookmarks.listing.recent # As SQL SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ?

Slide 98

Slide 98 text

# In the view current_user.bookmarks.listing.recent # As SQL SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Why is this slow?

Slide 99

Slide 99 text

The query was using an index # index_bookmarks_on_user_id t.index [“user_id"]

Slide 100

Slide 100 text

The query was using an index # index_bookmarks_on_user_id t.index [“user_id"] SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? It matches the query, doesn’t it?

Slide 101

Slide 101 text

# index_bookmarks_on_user_id t.index [“user_id"] SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? But…. look at this ORDER BY

Slide 102

Slide 102 text

Sorting is slow

Slide 103

Slide 103 text

filesort is slow

Slide 104

Slide 104 text

What is the database doing?

Slide 105

Slide 105 text

What is the database doing? ✅ Find all records for user_id from the index (fast)

Slide 106

Slide 106 text

What is the database doing? ✅ Find all records for user_id from the index (fast) ✅ Retrieves all these records from the table (despite the LIMIT)

Slide 107

Slide 107 text

What is the database doing? ✅ Find all records for user_id from the index (fast) ✅ Retrieves all these records from the table (despite the LIMIT) ✅ Sorts all these records by visited_at

Slide 108

Slide 108 text

What is the database doing? ✅ Find all records for user_id from the index (fast) ✅ Retrieves all these records from the table (despite the LIMIT) ✅ Sorts all these records by visited_at ✅ Returns LIMIT records at OFFSET

Slide 109

Slide 109 text

What should we do?

Slide 110

Slide 110 text

Use a composite index for the sort

Slide 111

Slide 111 text

t.index [“user_id”, “visited_at”] SELECT `bookmarks`.* FROM `bookmarks` WHERE `bookmarks`.`user_id` = ? ORDER BY visited_at DESC LIMIT ? OFFSET ? Index with the sorted column

Slide 112

Slide 112 text

25, Feb 1 40, Jan 40 1, Mar 1 6, Feb 3 6, Feb 5 6, Mar 12 13, Feb 22 32 3 99 42 12 Branch Node Leaf Node For an index like (user_id, visited_at), you can visualise the sorting on the second column as above Composite indexes as B-Trees

Slide 113

Slide 113 text

Bonus: Composite indexes are reusable

Slide 114

Slide 114 text

t.index [“user_id”, “recipe_id”, “created_at”] Given

Slide 115

Slide 115 text

t.index [“user_id”, “recipe_id”, “created_at”] Given 1, 22, Mar 1 6, 14 Feb 3 6, 14 Feb 5 6, 15 Mar 12 13, 5 Feb 22 32 3 99 42 12 Leaf Node

Slide 116

Slide 116 text

t.index [“user_id”, “recipe_id”, “created_at”] Given Partial matches from the left also work

Slide 117

Slide 117 text

t.index [“user_id”, “recipe_id”, “created_at”] Given ✅ user_id ✅ user_id, recipe_id

Slide 118

Slide 118 text

t.index [“user_id”, “recipe_id”, “created_at”] Given ✅ user_id ✅ user_id, recipe_id ❌ recipe_id ❌ created_at ❌ recipe_id, user_id ❌ user_id, created_at

Slide 119

Slide 119 text

UNDERSTANDING HOW INDEXES WORK CHANGES EVERYTHING!

Slide 120

Slide 120 text

(2) “COUNTING TOO MUCH”

Slide 121

Slide 121 text

Several popular users reported slow page loads…

Slide 122

Slide 122 text

No content

Slide 123

Slide 123 text

No content

Slide 124

Slide 124 text

2,753,220 rows examined!

Slide 125

Slide 125 text

The implementation def follower_count object.followers.size end

Slide 126

Slide 126 text

Triggers a query to the database

Slide 127

Slide 127 text

SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = ?

Slide 128

Slide 128 text

SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = ? It’s counting the number of followers for a user

Slide 129

Slide 129 text

TABLES JE OBNF 1 Mary 2 Jenny 3 Alice JE GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users

Slide 130

Slide 130 text

TABLES JE OBNF 1 Mary 2 Jenny 3 Alice JE GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users

Slide 131

Slide 131 text

TABLES JE OBNF 1 Mary 2 Jenny 3 Alice JE GPMMPXFS@JE GPMMPXFF@JE 1 1 2 2 1 3 3 2 3 Follows Users

Slide 132

Slide 132 text

It’s actually self-referential

Slide 133

Slide 133 text

In Ruby on Rails parlance, a has_many :through relationship from User to User through Follows

Slide 134

Slide 134 text

A counter_cache column could be used to eliminate the query

Slide 135

Slide 135 text

# follow.rb belongs_to :follower, foreign_key: :follower_id, class_name: "User", counter_cache: :followees_count belongs_to :followee, foreign_key: :followee_id, class_name: "User", counter_cache: :followers_count # user.rb has_many :incoming_follows, class_name: "Follow", foreign_key: :followee_id, dependent: :destroy has_many :outgoing_follows, class_name: "Follow", foreign_key: :follower_id, dependent: :destroy has_many :followers, through: :incoming_follows has_many :followees, through: :outgoing_follows

Slide 136

Slide 136 text

# follow.rb belongs_to :follower, foreign_key: :follower_id, class_name: "User", counter_cache: :followees_count belongs_to :followee, foreign_key: :followee_id, class_name: "User", counter_cache: :followers_count # user.rb has_many :incoming_follows, class_name: "Follow", foreign_key: :followee_id, dependent: :destroy has_many :outgoing_follows, class_name: "Follow", foreign_key: :follower_id, dependent: :destroy has_many :followers, through: :incoming_follows has_many :followees, through: :outgoing_follows counter_cache is already configured!

Slide 137

Slide 137 text

Does it work?

Slide 138

Slide 138 text

[2] pry(main)> user.followers.size [slave/1] (3752.5ms) SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = 2008217 => 1712 [3] pry(main)> user.followees.size [slave/1] (71.0ms) SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`followee_id` WHERE `follows`.`follower_id` = 2008217 => 1087

Slide 139

Slide 139 text

[2] pry(main)> user.followers.size [slave/1] (3752.5ms) SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`follower_id` WHERE `follows`.`followee_id` = 2008217 => 1712 [3] pry(main)> user.followees.size [slave/1] (71.0ms) SELECT COUNT(*) FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`followee_id` WHERE `follows`.`follower_id` = 2008217 => 1087 Nope

Slide 140

Slide 140 text

It still queries the database

Slide 141

Slide 141 text

Why doesn’t Rails know that a counter_cache column exists?

Slide 142

Slide 142 text

✅ The counter cache on Follow is set to followers_count

Slide 143

Slide 143 text

✅ The counter cache on Follow is set to followers_count ✅ The counter cache column exists in the database

Slide 144

Slide 144 text

✅ The counter cache on Follow is set to followers_count ✅ The counter cache column exists in the database ✅ The counter cache increments/decrements correctly

Slide 145

Slide 145 text

✅ The counter cache on Follow is set to followers_count ✅ The counter cache column exists in the database ✅ The counter cache increments/decrements correctly ✅ The counter cache name follows the convention of association_name_count

Slide 146

Slide 146 text

Let’s see how Rails does it

Slide 147

Slide 147 text

ActiveRecord::Associations::HasManyAssociation def count_records count = if reflection.has_cached_counter? owner. _read_attribute(reflection.counter_cache_column). to_i else scope.count(:all) end end

Slide 148

Slide 148 text

ActiveRecord::Associations::HasManyAssociation def count_records count = if reflection.has_cached_counter? owner. _read_attribute(reflection.counter_cache_column). to_i else scope.count(:all) end end

Slide 149

Slide 149 text

Let’s see if this works

Slide 150

Slide 150 text

pry(main)> followers = User. reflect_on_all_associations(:has_many). find { |r| r.name == :followers } => #, pry(main)> followers.has_cached_counter? => nil pry(main)> followers.inverse_of => nil pry(main)> followers.counter_cache_column => "followers_count"

Slide 151

Slide 151 text

pry(main)> followers = User. reflect_on_all_associations(:has_many). find { |r| r.name == :followers } => #, pry(main)> followers.has_cached_counter? => nil pry(main)> followers.inverse_of => nil pry(main)> followers.counter_cache_column => "followers_count" This is not what I expected

Slide 152

Slide 152 text

✅ The association has the correct counter_cache column ❌ The association does not believe it has a cached counter

Slide 153

Slide 153 text

ActiveRecord::Reflection # Returns whether a counter cache should be used for this association. # # The counter_cache option must be given on either the owner or inverse # association, and the column must be present on the owner. def has_cached_counter? options[:counter_cache] || inverse_which_updates_counter_cache && inverse_which_updates_counter_cache.options[:counter_cache ] && !!active_record.columns_hash[counter_cache_column] end

Slide 154

Slide 154 text

ActiveRecord::Reflection def inverse_which_updates_counter_cache return @inverse_which_updates_counter_cache if defined?(@inverse_which_updates_counter_cache) @inverse_which_updates_counter_cache = klass.reflect_on_all_associations(:belongs_to).find do | inverse| inverse.counter_cache_column == counter_cache_column end end

Slide 155

Slide 155 text

The association expects to be able to find a counterpart belongs_to on its inverse association

Slide 156

Slide 156 text

But, because we have a self-referential association

Slide 157

Slide 157 text

But, because we have a self-referential association the inverse belongs_to must also exist on User

Slide 158

Slide 158 text

# follow.rb belongs_to :follower, foreign_key: :follower_id, class_name: "User", counter_cache: :followees_count belongs_to :followee, foreign_key: :followee_id, class_name: "User", counter_cache: :followers_count # user.rb has_many :incoming_follows, class_name: "Follow", foreign_key: :followee_id, dependent: :destroy has_many :outgoing_follows, class_name: "Follow", foreign_key: :follower_id, dependent: :destroy has_many :followers, through: :incoming_follows has_many :followees, through: :outgoing_follows This is a self-referential has_many :through many-to-many association!

Slide 159

Slide 159 text

Therefore, there is no counterpart belongs_to

Slide 160

Slide 160 text

There is, however, an answer

Slide 161

Slide 161 text

ActiveRecord::Reflection # Returns whether a counter cache should be used for this association. # # The counter_cache option must be given on either the owner or inverse # association, and the column must be present on the owner. def has_cached_counter? options[:counter_cache] || inverse_which_updates_counter_cache && inverse_which_updates_counter_cache.options[:counter_cache ] && !!active_record.columns_hash[counter_cache_column] end

Slide 162

Slide 162 text

It’s also deep inside the Rails Guides

Slide 163

Slide 163 text

4.3.2 Options for has_many 4.3.2.4 :counter_cache This option can be used to configure a custom named :counter_cache. You only need this option when you customized the name of your :counter_cache on the belongs_to association.

Slide 164

Slide 164 text

The solution has_many :followers, -> { order("follows.id DESC") }, through: :incoming_follows, counter_cache: :followers_count has_many :followees, -> { order("follows.id DESC") }, through: :outgoing_follows, counter_cache: :followees_count

Slide 165

Slide 165 text

[2] pry(main)> user.followers.size => 1712

Slide 166

Slide 166 text

[2] pry(main)> user.followers.size => 1712 No query. It works!

Slide 167

Slide 167 text

READ THE DOCUMENTATION. BE BRAVE AND READ YOUR LIBRARIES!

Slide 168

Slide 168 text

(3) “OUT OF THE DATABASE”

Slide 169

Slide 169 text

While trying to improve the performance of our slowest, high traffic endpoint, I noticed an outlier

Slide 170

Slide 170 text

Mysql2::Error: MySQL server has gone away: SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = 4612396 AND `recipe_likes`.`user_id` IN (11526355, 10243093, 10233195, 11675568, 10241820, 11343209, 10239509, 10227928, 10226294, 10230604, 10225640, 10499978, 10223459, 10230036, 10228518, 10228876, 9111946, 11598081, 11598079, 11598074, 11598083, 10225955, 11598080, 11598070, 11598072, 11598061, 11598066, 11598058, 11598037, 11598068, 11598062, 11598043, 11598041, 11598038, 11598034, 11598046, 11598044, 11598042, 11598014, 11598009, 11598040, 11598036, 11598006, 11598003, 11597999, 11597996, 11598035, 11598031, 11598039, 11598030, 10229548, 11598025, 11598023, 11597989, 11598029, 11598024, 11597991, 11598020, 11597990, 11597987, 11598021, 11598019, 11598018, 11598013, 11598008, 11598012, 11598015, 11598011, 11597997, 11598001, 11597994, 11597993, 11597992, 11597988, 11597971, 11597967, 11597966, 11597960, 11597959, 11597954, 11597951, 11597948, 11597945, 11597986, 11597985, 11597983, 11597935, 10229554, 11597978, 11597974, 11597953, 11597958, 11597949, 11597947, 11597936, 11597956, 11597950, 11597943, 11597939, 11597937, 11597910, 11597912, 11597933, 11597931, 11597927, 11597925, 11597917, 11597919, 11597920, 11597918, 11597913, 11597896, 11597892, 11597888, 11597877, 11597921, 11597886, 11597914, 11597882, 11597879, 11597911, 11597909, 11597908, 11597904, 10225953, 11597903, 11597905, 11597902, 11597901, 11597893, 11597898, 11597846, 11597874, 11597849, 11597872, 11597869, 11597868, 11597866, 11597860, 11597864, 11597828, 11597863, 11597858, 11597850, 11597819, 11597862, 11597857, 11597854, 11597852, 11597842, 11597853, 11597856, 11597848, 11597839, 11597836, 11597837, 11597835, 11597817, 11597834, 11597832, 11597829, 11597821, 11597807, 11597803, 11597831, 11597833, 11597820, 11597805, 11597798, 11597818, 11597801, 11597816, 11597793, 11597813, 11597781, 11597778, 11597811, 11597806, 11597761, 11597799, 11597766, 11597804, 11597797, 10225733, 11597794, 11597768, 11597762, 11597759, 11597788, 11597789, 11597791, 11597790, 11597785, 11597786, 11597756, 11597784, 11597779, 11597782, 11597780, 11597775, 11597773, 11597739, 11597777, 10225943, 11597774, 11597767, 11597763, 11597765, 11597760, 11597736, 11597757, 11597755, 11597753, 11597751, 11597750, 11597734, 11597735, 11597733, 11597744, 11597730, 11597728, 11597715, 11597708, 11597742, 11597738, 11597723, 11597718, 11597713, 11597703, 11597749, 11597747, 11597741, 11597710, 11597699, 11597705, 11597732, 10231546, 11597709, 11597714, 11597727, 11597716, 11597711, 11597706, 11597694, 11597671, 11597704, 11597700, 11597698, 11597691, 11597690, 11597689, 11597687, 11597688, 11597684, 11597682, 11597676, 11597652, 11597685, 11597678, 11597681, 11597674, 11597672, 11597668, 11597666, 11597639, 11597680, 11597677, 11597644, 11597642, 11597638, 11597659, 11597657, 11597655, 11597620, 11597615, 11597656, 11597654, 11597649, 11597636, 11597640, 11597647, 11597648, 11597641, 11597637, 11597635, 11597631, 11597628, 11597626, 11597623, 11597621, 11597618, 11597592, 11597617, 11597614, 11597610, 11597607, 11597606, 11597579, 11597616, 11597613, 11597611, 11597609, 11597591, 11597581, 11597587, 11597602, 11597601, 11597594, 11597596, 11597598, 11597589, 11597599, 11597593, 11597557, 11597590, 11597553, 10225940, 11597582, 11597580, 11597586, 11597577, 11597578, 11597548, 11597545, 11597550, 11597549, 11597571, 11597574, 11597547, 11597544, 11597568, 11597564, 11597563, 11597561, 11597532, 11597567, 11597560, 10229535, 11597562, 11597559, 11597558, 11597529, 11597556, 11597554, 11597526, 11597522, 11597513, 11597518, 11597516, 11597503, 11597508, 11597542, 11597539, 11597534, 11597530, 11597533, 11597531, 11597491, 11597538, 11597504, 11597497, 11597499, 11597485, 11597490, 11597493, 11597528, 11597519, 11597523, 11597514, 11597512, 11597509, 11597510, 11597502, 11597506, 11597498, 11597488, 11597492, 11597483, 11597486, 11597505, 115...

Slide 171

Slide 171 text

Mysql2::Error: MySQL server has gone away: SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = 4612396 AND `recipe_likes`.`user_id` IN (11526355, 10243093, 10233195, 11675568, 10241820, 11343209, 10239509, 10227928, 10226294, 10230604, 10225640, 10499978, 10223459, 10230036, 10228518, 10228876, 9111946, 11598081, 11598079, 11598074, 11598083, 10225955, 11598080, 11598070, 11598072, 11598061, 11598066, 11598058, 11598037, 11598068, 11598062, 11598043, 11598041, 11598038, 11598034, 11598046, 11598044, 11598042, 11598014, 11598009, 11598040, 11598036, 11598006, 11598003, 11597999, 11597996, 11598035, 11598031, 11598039, 11598030, 10229548, 11598025, 11598023, 11597989, 11598029, 11598024, 11597991, 11598020, 11597990, 11597987, 11598021, 11598019, 11598018, 11598013, 11598008, 11598012, 11598015, 11598011, 11597997, 11598001, 11597994, 11597993, 11597992, 11597988, 11597971, 11597967, 11597966, 11597960, 11597959, 11597954, 11597951, 11597948, 11597945, 11597986, 11597985, 11597983, 11597935, 10229554, 11597978, 11597974, 11597953, 11597958, 11597949, 11597947, 11597936, 11597956, 11597950, 11597943, 11597939, 11597937, 11597910, 11597912, 11597933, 11597931, 11597927, 11597925, 11597917, 11597919, 11597920, 11597918, 11597913, 11597896, 11597892, 11597888, 11597877, 11597921, 11597886, 11597914, 11597882, 11597879, 11597911, 11597909, 11597908, 11597904, 10225953, 11597903, 11597905, 11597902, 11597901, 11597893, 11597898, 11597846, 11597874, 11597849, 11597872, 11597869, 11597868, 11597866, 11597860, 11597864, 11597828, 11597863, 11597858, 11597850, 11597819, 11597862, 11597857, 11597854, 11597852, 11597842, 11597853, 11597856, 11597848, 11597839, 11597836, 11597837, 11597835, 11597817, 11597834, 11597832, 11597829, 11597821, 11597807, 11597803, 11597831, 11597833, 11597820, 11597805, 11597798, 11597818, 11597801, 11597816, 11597793, 11597813, 11597781, 11597778, 11597811, 11597806, 11597761, 11597799, 11597766, 11597804, 11597797, 10225733, 11597794, 11597768, 11597762, 11597759, 11597788, 11597789, 11597791, 11597790, 11597785, 11597786, 11597756, 11597784, 11597779, 11597782, 11597780, 11597775, 11597773, 11597739, 11597777, 10225943, 11597774, 11597767, 11597763, 11597765, 11597760, 11597736, 11597757, 11597755, 11597753, 11597751, 11597750, 11597734, 11597735, 11597733, 11597744, 11597730, 11597728, 11597715, 11597708, 11597742, 11597738, 11597723, 11597718, 11597713, 11597703, 11597749, 11597747, 11597741, 11597710, 11597699, 11597705, 11597732, 10231546, 11597709, 11597714, 11597727, 11597716, 11597711, 11597706, 11597694, 11597671, 11597704, 11597700, 11597698, 11597691, 11597690, 11597689, 11597687, 11597688, 11597684, 11597682, 11597676, 11597652, 11597685, 11597678, 11597681, 11597674, 11597672, 11597668, 11597666, 11597639, 11597680, 11597677, 11597644, 11597642, 11597638, 11597659, 11597657, 11597655, 11597620, 11597615, 11597656, 11597654, 11597649, 11597636, 11597640, 11597647, 11597648, 11597641, 11597637, 11597635, 11597631, 11597628, 11597626, 11597623, 11597621, 11597618, 11597592, 11597617, 11597614, 11597610, 11597607, 11597606, 11597579, 11597616, 11597613, 11597611, 11597609, 11597591, 11597581, 11597587, 11597602, 11597601, 11597594, 11597596, 11597598, 11597589, 11597599, 11597593, 11597557, 11597590, 11597553, 10225940, 11597582, 11597580, 11597586, 11597577, 11597578, 11597548, 11597545, 11597550, 11597549, 11597571, 11597574, 11597547, 11597544, 11597568, 11597564, 11597563, 11597561, 11597532, 11597567, 11597560, 10229535, 11597562, 11597559, 11597558, 11597529, 11597556, 11597554, 11597526, 11597522, 11597513, 11597518, 11597516, 11597503, 11597508, 11597542, 11597539, 11597534, 11597530, 11597533, 11597531, 11597491, 11597538, 11597504, 11597497, 11597499, 11597485, 11597490, 11597493, 11597528, 11597519, 11597523, 11597514, 11597512, 11597509, 11597510, 11597502, 11597506, 11597498, 11597488, 11597492, 11597483, 11597486, 11597505, 115... A Huge IN Clause

Slide 172

Slide 172 text

Why is this happening

Slide 173

Slide 173 text

def liked_by_multiple_followees? recipe.
 recipe_likes. where(user: subscriber_followee_ids). count >= LIKES_REQUIRED_FOR_PUBLISH end The source of the problem

Slide 174

Slide 174 text

def liked_by_multiple_followees? recipe.
 recipe_likes. where(user: subscriber_followee_ids). count >= LIKES_REQUIRED_FOR_PUBLISH end The source of the problem This wants to check if a Recipe is “liked” by users followed by the current user

Slide 175

Slide 175 text

def liked_by_multiple_followees? recipe.
 recipe_likes. where(user: subscriber_followee_ids). count >= LIKES_REQUIRED_FOR_PUBLISH end The source of the problem But… what if they follow lots of users?

Slide 176

Slide 176 text

Let’s see what kind of SQL this generates

Slide 177

Slide 177 text

SELECT `users`.id FROM `users` INNER JOIN `follows` ON `users`.`id` = `follows`.`followee_id` WHERE `follows`.`follower_id` = ? ORDER BY follows.id DESC SELECT COUNT(*) FROM `recipe_likes` WHERE `recipe_likes`.`recipe_id` = ? AND `recipe_likes`.`user_id` IN (?+);

Slide 178

Slide 178 text

What does the IN clause do? ✅ It’s kind of like an OR

Slide 179

Slide 179 text

What does the IN clause do? ✅ It’s kind of like an OR ✅ Every element is a separate index lookup

Slide 180

Slide 180 text

What does the IN clause do? ✅ It’s kind of like an OR ✅ Every element is a separate index lookup ❌ MySQL doesn't really have a limit on number of elements in IN

Slide 181

Slide 181 text

An index won’t help us

Slide 182

Slide 182 text

Is there another way?

Slide 183

Slide 183 text

def liked_by_multiple_followees? recipe.
 recipe_likes. where(user: subscriber_followee_ids). count >= LIKES_REQUIRED_FOR_PUBLISH end The source of the problem All we want is a count for a comparison.

Slide 184

Slide 184 text

Users who Liked Recipe who are also Users followed by current user

Slide 185

Slide 185 text

Users who Liked Recipe & Users followed by current user

Slide 186

Slide 186 text

This is an intersection

Slide 187

Slide 187 text

# Find the ids liker_ids = RecipeLike.where(recipe: recipe).pluck(:user_id) follower_ids = Follow.where(follower_id: subscriber).pluck(:followee_id)

Slide 188

Slide 188 text

# Find the ids liker_ids = RecipeLike.where(recipe: recipe).pluck(:user_id) follower_ids = Follow.where(follower_id: subscriber).pluck(:followee_id) # Find the intersection liker_ids & follower_ids

Slide 189

Slide 189 text

# Find the ids liker_ids = RecipeLike.where(recipe: recipe).pluck(:user_id) follower_ids = Follow.where(follower_id: subscriber).pluck(:followee_id) # Find the intersection liker_ids & follower_ids Now its two fast queries and an array operation

Slide 190

Slide 190 text

THINK OUTSIDE THE BOX: TRADEOFF BETWEEN THE DATABASE AND APPLICATION

Slide 191

Slide 191 text

Thanks for listening!

Slide 192

Slide 192 text

QUESTIONS?