CRUD! What to do When Active Record, MySQL, and Your Data Betray You

CRUD! What to do When Active Record, MySQL, and Your Data Betray You

Slideshow for talk at Mountain West Ruby Conference March 21st, 2014.

Related blog post: http://www.eileencodes.com/posts/crud-what-to-do-when-active-record-mysql-and-your-data-betray-you

The magic of ActiveRecord database interactions is easy to rely on and allows us assume it knows best. Without a solid understanding of how ActiveRecord translates into MySQL, however, significant issues can arise. This is particularly true with large data sets and complex model relationships. My talk explores an example for each CRUD function and shows how these queries can result in MySQL timeouts, memory issues or stack level too deep errors. The examples will examine the consequences of chaining large datasets, uses for Arel, and how to avoid encountering major problems and most importantly, how these queries can be rewritten to run more efficiently.

C44e1f7e22c3f23cff7bc130871047ef?s=128

Eileen M. Uchitelle

March 21, 2014
Tweet

Transcript

  1. CRUD! What to do When Active Record, MySQL, and Your

    Data Betray You
  2. eileencodes.com @eileencodes github.com/eileencodes Eileen M. Uchitelle

  3. You - Should - Tell Uchitelle

  4. You - Should - Tell Uchitelle X

  5. Senior Developer at PhishMe

  6. * always check where a link is going before clicking

    on it
  7. At PhishMe we have A LOT of data

  8. Active Record is a magical unicorn

  9. When MySQL crashed or my queries were slow I felt

    betrayed by Active Record.
  10. But I knew Active Record and I could work it

    out So I began studying my queries more closely...
  11. Learn from my mistakes... And avoid #firstworldproblems

  12. CREATE READ UPDATE DELETE

  13. CREATE create!, new, INSERT

  14. READ where, find_each, pluck, SELECT

  15. UPDATE update_attributes, edit, update_all, UPDATE

  16. DELETE destroy, delete, destroy_all, delete_all, DELETE

  17. Address Book Example ERD Users first_name string last_name string email

    string password_digest string ...[etc] Contacts first_name string last_name string birthday date user_id integer ...[etc] Categories name string description string user_id integer Categorizations contact_id integer category_id integer
  18. Address Book Example User has_many :contacts has_many :categories Contact belongs_to

    :user has_many :categorizations, dependent: :delete_all has_many :categories, through: :categorizations Category belongs_to :user has_many :categorizations, dependent: :delete_all has_many :contacts, through: :categorizations Categorizations belongs_to :contact belongs_to :category
  19. Address Book Example User has_many :contacts has_many :categories Contact belongs_to

    :user has_many :categorizations, dependent: :delete_all has_many :categories, through: :categorizations Category belongs_to :user has_many :categorizations, dependent: :delete_all has_many :contacts, through: :categorizations Categorizations belongs_to :contact belongs_to :category
  20. Address Book Example User has_many :contacts has_many :categories Contact belongs_to

    :user has_many :categorizations, dependent: :delete_all has_many :categories, through: :categorizations Category belongs_to :user has_many :categorizations, dependent: :delete_all has_many :contacts, through: :categorizations Categorizations belongs_to :contact belongs_to :category
  21. Address Book Example User has_many :contacts has_many :categories Contact belongs_to

    :user has_many :categorizations, dependent: :delete_all has_many :categories, through: :categorizations Category belongs_to :user has_many :categorizations, dependent: :delete_all has_many :contacts, through: :categorizations Categorizations belongs_to :contact belongs_to :category
  22. Address Book Example User has_many :contacts has_many :categories Contact belongs_to

    :user has_many :categorizations, dependent: :delete_all has_many :categories, through: :categorizations Category belongs_to :user has_many :categorizations, dependent: :delete_all has_many :contacts, through: :categorizations Categorizations belongs_to :contact belongs_to :category
  23. CREATE

  24. Let’s create 10,000 records the Active Record way using a

    CSV CSV.foreach("#{filepath}", headers: true) do |csv| Contact.create!({ first_name: csv[0], last_name: csv[1], birthday: csv[2], ... }) end
  25. Let’s create 10,000 records the Active Record way using a

    CSV CSV.foreach("#{filepath}", headers: true) do |csv| Contact.create!({ first_name: csv[0], last_name: csv[1], birthday: csv[2], ... }) end INSERT INTO `contacts` (`first_name`, `last_name`, `birthday`,...) VALUES ('John', 'Smith', '1987-02-01',...);
  26. MySQL Batch Insert, FTW.

  27. Don’t be afraid to write raw SQL

  28. Let’s create the same contacts with MySQL Batch Insert contact_values

    = [] CSV.foreach("#{filepath}", headers: true) do |csv| contact_values << "('#{csv[0]}','#{csv[1]}','#{csv[2]}'...)" end batch_size = 2000 while !contact_values.empty? contacts_shifted = contact_values.shift(batch_size) contacts_sql = "INSERT INTO contacts(first_name, last_name, birthday,...) VALUES#{contacts_shifted.join(", ")}" ActiveRecord::Base.connection.execute(contacts_sql) end
  29. Let’s create the same contacts with MySQL Batch Insert contact_values

    = [] CSV.foreach("#{filepath}", headers: true) do |csv| contact_values << "('#{csv[0]}','#{csv[1]}','#{csv[2]}'...)" end batch_size = 2000 while !contact_values.empty? contacts_shifted = contact_values.shift(batch_size) contacts_sql = "INSERT INTO contacts(first_name, last_name, birthday,...) VALUES#{contacts_shifted.join(", ")}" ActiveRecord::Base.connection.execute(contacts_sql) end
  30. Let’s create the same contacts with MySQL Batch Insert contact_values

    = [] CSV.foreach("#{filepath}", headers: true) do |csv| contact_values << "('#{csv[0]}','#{csv[1]}','#{csv[2]}'...)" end batch_size = 2000 while !contact_values.empty? contacts_shifted = contact_values.shift(batch_size) contacts_sql = "INSERT INTO contacts(first_name, last_name, birthday,...) VALUES#{contacts_shifted.join(", ")}" ActiveRecord::Base.connection.execute(contacts_sql) end
  31. Let’s create the same contacts with MySQL Batch Insert contact_values

    = [] CSV.foreach("#{filepath}", headers: true) do |csv| contact_values << "('#{csv[0]}','#{csv[1]}','#{csv[2]}'...)" end batch_size = 2000 while !contact_values.empty? contacts_shifted = contact_values.shift(batch_size) contacts_sql = "INSERT INTO contacts(first_name, last_name, birthday,...) VALUES#{contacts_shifted.join(", ")}" ActiveRecord::Base.connection.execute(contacts_sql) end
  32. Let’s create the same contacts with MySQL Batch Insert contact_values

    = [] CSV.foreach("#{filepath}", headers: true) do |csv| contact_values << "('#{csv[0]}','#{csv[1]}','#{csv[2]}'...)" end batch_size = 2000 while !contact_values.empty? contacts_shifted = contact_values.shift(batch_size) contacts_sql = "INSERT INTO contacts(first_name, last_name, birthday,...) VALUES#{contacts_shifted.join(", ")}" ActiveRecord::Base.connection.execute(contacts_sql) end
  33. Let’s create the same contacts with MySQL Batch Insert contact_values

    = [] CSV.foreach("#{filepath}", headers: true) do |csv| contact_values << "('#{csv[0]}','#{csv[1]}','#{csv[2]}'...)" end batch_size = 2000 while !contact_values.empty? contacts_shifted = contact_values.shift(batch_size) contacts_sql = "INSERT INTO contacts(first_name, last_name, birthday,...) VALUES#{contacts_shifted.join(", ")}" ActiveRecord::Base.connection.execute(contacts_sql) end
  34. Let’s create the same contacts with MySQL Batch Insert ActiveRecord::Base.connection.execute(contacts_sql)

    INSERT INTO contacts(first_name, last_name, birthday,...) VALUES('Lauretta','Senger','1987-02-01',...), ('Jane','Roob','1987-02-01',...), ('Blaze','Lakin','1987-02-01',...), ('Elton','Cormier','1987-02-01',...), ('John','Kohler','1987-02-01',...), ('Clementine','Marvin','1987-02-01',...), ('Ova','Aufderhar','1987-02-01',...) ...
  35. That’s a huge difference ActiveRecord Benchmark Data: user system total

    real => 44.740000 1.180000 45.920000 ( 51.095556)
  36. That’s a huge difference ActiveRecord Benchmark Data: user system total

    real => 44.740000 1.180000 45.920000 ( 51.095556) MySQL Batch Insert Benchmark Data: user system total real => 2.710000 0.050000 2.760000 ( 3.227031)
  37. READ

  38. Let’s read some data and output the first name of

    each contact Contact.where(user_id: 1, country: “USA”).each do |contact| puts contact.first_name end Contact.where(user_id: 1, country: “USA”).find_each do |contact| puts contact.first_name end Contact.where(user_id: 1, country: “USA”) .pluck(:first_name).each do |first_name| puts first_name end
  39. Let’s read some data and output the first name of

    each contact Contact.where(user_id: 1, country: “USA”).each do |contact| puts contact.first_name end SELECT `contacts`.* FROM `contacts` WHERE `contacts`.`user_id` = 1 AND `contacts`.`country` = 'USA' => Lauretta Shana Jason Jermain Blaze Jessy Bradly ...
  40. Let’s read some data and output the first name of

    each contact Contact.where(user_id: 1, country: “USA”).find_each do |contact| puts contact.first_name end SELECT `contacts`.* FROM `contacts` WHERE `contacts`.`user_id` = 1 AND `contacts`.`country` = 'USA' ORDER BY `contacts`.`id` ASC LIMIT 1000 => Lauretta Shana Jason ... SELECT `contacts`.* FROM `contacts` WHERE `contacts`.`user_id` = 1 AND `contacts`.`country` = 'USA' AND (`contacts`.`id` > 1001) ORDER BY `contacts`.`id` ASC LIMIT 1000 => ...
  41. Let’s read some data and output the first name of

    each contact f Contact.where(user_id: 1, country: “USA”) .pluck(:first_name).each do |first_name| puts first_name end SELECT `contacts`.`first_name` FROM `contacts` WHERE `contacts`.`user_id` = 1 AND `contacts`.`country` = 'USA' => ["Lauretta", "Shana", "Jason", "Jermain", "Blaze", "Jessy", "Bradly", "Emma", "Cruz", "Elton", "Dashawn", "Rosanna", "Ryan", "Leonel", "Ashly", "Mittie", "Tobin", "Antonio", "Chad", "Lauryn", "Sydnie", "Sebastian", "Johnpaul", "Yasmeen", "Junior", "Monroe", "Avery",...]
  42. Let’s read some data and output the first name of

    each contact Contact.where(user_id: 1, country: “USA”).each do |contact| puts contact.first_name end Contact.where(user_id: 1, country: “USA”).find_each do |contact| puts contact.first_name end Contact.where(user_id: 1, country: “USA”) .pluck(:first_name).each do |first_name| puts first_name end
  43. Which one was fastest? .each Benchmark Data: user system total

    real => 0.950000 0.060000 1.010000 ( 1.050266)
  44. Which one was fastest? .each Benchmark Data: user system total

    real => 0.950000 0.060000 1.010000 ( 1.050266) .find_each Benchmark Data: user system total real => 0.900000 0.040000 0.940000 ( 0.976979)
  45. Which one was fastest? .each Benchmark Data: user system total

    real => 0.950000 0.060000 1.010000 ( 1.050266) .find_each Benchmark Data: user system total real => 0.900000 0.040000 0.940000 ( 0.976979) .pluck Benchmark Data: user system total real => 0.080000 0.020000 0.100000 ( 0.126814)
  46. Which one was fastest? .each Benchmark Data: user system total

    real => 0.950000 0.060000 1.010000 ( 1.050266) .find_each Benchmark Data: user system total real => 0.900000 0.040000 0.940000 ( 0.976979) .pluck Benchmark Data: user system total real => 0.080000 0.020000 0.100000 ( 0.126814)
  47. Statistics for Read 0 3.8 7.5 11.3 15.0 0 10

    1000 10000 100000 .each .find_each .pluck
  48. Utilizing find_by_sql Contact.find_by_sql([ "SELECT first_name FROM contacts WHERE user_id=? AND

    country=? AND birthday > ?", 1, 'USA','1987-01-01' ]).each do |contact| puts contact.first_name end
  49. UPDATE

  50. Let’s update the category in each categorization category = Category.where(name:

    “Networking”).first Categorization.all.each do |categorization| categorization.update_attributes( category_id: category.id ) end
  51. Let’s update the category in each categorization category = Category.where(name:

    “Networking”).first Categorization.all.each do |categorization| categorization.update_attributes( category_id: category.id ) end UPDATE `categorizations` SET `category_id` = 1 WHERE `categorizations`.`id` = 1 UPDATE `categorizations` SET `category_id` = 1 WHERE `categorizations`.`id` = 2 UPDATE `categorizations` SET `category_id` = 1 WHERE `categorizations`.`id` = 3
  52. Let’s update the category in each categorization category = Category.where(name:

    “Networking”).first Categorization.update_all( category_id: category.id )
  53. Let’s update the category in each categorization category = Category.where(name:

    “Networking”).first Categorization.update_all( category_id: category.id ) UPDATE `categorizations` SET `categorizations`.`category_id` = 1
  54. Check out the difference update_attributes Benchmark Data: user system total

    real => 12.990000 0.870000 13.860000 ( 17.156265)
  55. Check out the difference update_attributes Benchmark Data: user system total

    real => 12.990000 0.870000 13.860000 ( 17.156265) update_all Benchmark Data: user system total real => 0.000000 0.000000 0.000000 ( 0.042140)
  56. DELETE

  57. destroy_all vs. delete_all with has_many dependencies Contact has_many :categorizations has_many

    :categories, through: :categorizations Category has_many :categorizations has_many :contacts, through: :categorizations Categorizations belongs_to :contact belongs_to :category
  58. f f destroy_all vs. delete_all with has_many dependencies Contact has_many

    :categorizations, dependent: :destroy has_many :categories, through: :categorizations Category has_many :categorizations, dependent: :destroy has_many :contacts, through: :categorizations Contact.destroy_all
  59. destroy_all vs. delete_all with has_many dependencies f f Contact has_many

    :categorizations, dependent: :destroy has_many :categories, through: :categorizations Category has_many :categorizations, dependent: :destroy has_many :contacts, through: :categorizations Contact.destroy_all SELECT `categorizations`.* FROM `categorizations` WHERE `categorizations`.`contact_id` = 1 DELETE FROM `categorizations` WHERE `categorizations`.`id` = 1 DELETE FROM `contacts` WHERE `contacts`.`id` = 1
  60. destroy_all vs. delete_all with has_many dependencies f f Contact has_many

    :categorizations, dependent: :delete_all has_many :categories, through: :categorizations Category has_many :categorizations, dependent: :delete_all has_many :contacts, through: :categorizations Contact.delete_all
  61. destroy_all vs. delete_all with has_many dependencies f f Contact has_many

    :categorizations, dependent: :delete_all has_many :categories, through: :categorizations Category has_many :categorizations, dependent: :delete_all has_many :contacts, through: :categorizations Contact.delete_all DELETE FROM `contacts`
  62. destroy_all vs. delete_all with has_many dependencies f f Contact has_many

    :categorizations has_many :categories, through: :categorizations Category has_many :categorizations has_many :contacts, through: :categorizations Contact.delete_all Contact.destroy_all
  63. destroy_all vs. delete_all with has_many dependencies f f Contact has_many

    :categorizations has_many :categories, through: :categorizations Category has_many :categorizations has_many :contacts, through: :categorizations Contact.delete_all DELETE FROM `contacts` Contact.destroy_all
  64. destroy_all vs. delete_all with has_many dependencies f f Contact has_many

    :categorizations has_many :categories, through: :categorizations Category has_many :categorizations has_many :contacts, through: :categorizations Contact.delete_all Contact.destroy_all DELETE FROM `contacts` WHERE `contacts`.`id` = 1 DELETE FROM `contacts` WHERE `contacts`.`id` = 2 DELETE FROM `contacts` WHERE `contacts`.`id` = 3
  65. destroy_all vs. delete_all with has_many dependencies category = Category.where(name: “Networking”).first

    category.contacts.destroy_all
  66. destroy_all vs. delete_all with has_many dependencies category = Category.where(name: “Networking”).first

    category.contacts.destroy_all It looks like you’re trying to delete some records, do you need help with that?
  67. destroy_all vs. delete_all with has_many dependencies category = Category.where(name: “Networking”).first

    category.contacts.destroy_all category.categorizations.destroy_all It looks like you’re trying to delete some records, do you need help with that?
  68. destroy_all vs. delete_all with has_many dependencies category = Category.where(name: “Networking”).first

    category.contacts.destroy_all category.categorizations.destroy_all category.categorizations.delete_all It looks like you’re trying to delete some records, do you need help with that?
  69. CollectionProxy

  70. CollectionProxy Object that holds the association category

  71. CollectionProxy Object that holds the association category Associated object categorizations

  72. destroy_all vs. delete_all with has_many dependencies f f Contact has_many

    :categorizations has_many :categories, through: :categorizations Category has_many :categorizations has_many :contacts, through: :categorizations category.categorizations.delete_all
  73. destroy_all vs. delete_all with has_many dependencies f f Contact has_many

    :categorizations has_many :categories, through: :categorizations Category has_many :categorizations has_many :contacts, through: :categorizations category.categorizations.delete_all UPDATE `categorizations` SET `categorizations`.`category_id` = NULL WHERE `categorizations`.`category_id` = 1
  74. destroy_all vs. delete_all with has_many dependencies Contact has_many :categorizations, dependent:

    :destroy has_many :categories, through: :categorizations Category has_many :categorizations, dependent: :destroy has_many :contacts, through: :categorizations category.categorizations.delete_all f f
  75. destroy_all vs. delete_all with has_many dependencies Contact has_many :categorizations, dependent:

    :destroy has_many :categories, through: :categorizations Category has_many :categorizations, dependent: :destroy has_many :contacts, through: :categorizations category.categorizations.delete_all DELETE FROM `categorizations` WHERE `categorizations`.`id` = 1 DELETE FROM `categorizations` WHERE `categorizations`.`id` = 2... f f
  76. destroy_all vs. delete_all with has_many dependencies f f Contact has_many

    :categorizations, dependent: :delete_all has_many :categories, through: :categorizations Category has_many :categorizations, dependent: :delete_all has_many :contacts, through: :categorizations category.categorizations.delete_all
  77. destroy_all vs. delete_all with has_many dependencies category.categorizations.delete_all user system total

    real => 130.080000 0.120000 130.200000 (130.308334)
  78. destroy_all vs. delete_all with has_many dependencies category.categorizations.delete_all user system total

    real => 130.080000 0.120000 130.200000 (130.308334) DELETE FROM `categorizations` WHERE `categorizations`.`category_id` = 1
  79. destroy_all vs. delete_all with has_many dependencies category.categorizations.delete_all user system total

    real => 130.080000 0.120000 130.200000 (130.308334) DELETE FROM `categorizations` WHERE `categorizations`.`category_id` = 1 AND `categorizations`.`id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46...10000)
  80. Let’s fix this. category = Category.where(name: “Networking”).first Categorization.where( category_id: category.id).delete_all

  81. Let’s fix this. category = Category.where(name: “Networking”).first Categorization.where( category_id: category.id).delete_all

    user system total real => 0.010000 0.000000 0.010000 ( 0.014286)
  82. Let’s fix this. category = Category.where(name: “Networking”).first Categorization.where( category_id: category.id).delete_all

    user system total real => 0.010000 0.000000 0.010000 ( 0.014286) DELETE FROM `categorizations` WHERE `categorizations`.`category_id` = 1
  83. In Conclusion

  84. In Conclusion ‣ Active Record is an awesome tool

  85. In Conclusion ‣ Active Record is an awesome tool ‣

    Active Record queries can have major consequences
  86. In Conclusion ‣ Active Record is an awesome tool ‣

    Active Record queries can have major consequences ‣ We can all be friends with Active Record
  87. In Conclusion ‣ Active Record is an awesome tool ‣

    Active Record queries can have major consequences ‣ We can all be friends with Active Record ‣ Check out these helpful tools
  88. In Conclusion ‣ Active Record is an awesome tool ‣

    Active Record queries can have major consequences ‣ We can all be friends with Active Record ‣ Check out these helpful tools New Relic
  89. In Conclusion ‣ Active Record is an awesome tool ‣

    Active Record queries can have major consequences ‣ We can all be friends with Active Record ‣ Check out these helpful tools New Relic MiniProfiler Gem
  90. In Conclusion ‣ Active Record is an awesome tool ‣

    Active Record queries can have major consequences ‣ We can all be friends with Active Record ‣ Check out these helpful tools New Relic MiniProfiler Gem Bullet Gem
  91. much thanks so database many Active Record very mwrc such

    data wow
  92. eileencodes.com @eileencodes speakerdeck.com/eileencodes github.com/eileencodes/crud_project Eileen M. Uchitelle