Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Active Record Query Quiz

pocke
October 27, 2023

Active Record Query Quiz

pocke

October 27, 2023
Tweet

More Decks by pocke

Other Decks in Programming

Transcript

  1. pp self • Masataka Pocke Kuwabara • Work for Money

    Forward, Inc. ◦ Rails engineer for クラウド会計Plus ◦ A maintainer of RBS • Live in Okayama Read it with the conference app→ (app.kaigionrails.org)
  2. Goal • You can choose the appropriate method for querying

    w/ AR BTW, this talk focuses on the Rails beginner. It may be boring for Rails experts.
  3. Quiz Rule • I'll show you two kinds of code

    ◦ using Active Record ◦ doing the same things • You choose the faster code • Rails 7.1 / mysql2 adapter and MySQL 8.0 / ruby 3.3.0-dev
  4. Q1. Finding articles of blog_id = 42 # A Article

    .all .select { _1.blog_id == 42 } # B Article .all .where(blog_id: 42) .to_a
  5. Answer. Finding articles of blog_id = 42 # A Article

    .all .select { _1.blog_id == 42 } # Article Load (2.4ms) SELECT `articles`.* FROM `articles` # B Article .all .where(blog_id: 42) .to_a # Article Load (1.1ms) SELECT `articles`.* FROM `articles` WHERE `articles`.`blog_id` = 42 B is faster than A! 💯 󰢄
  6. Explanation: Finding articles of blog_id = 42 • Lazy evaluation

    ◦ `all` and `where` do not execute SQL, but `to_a` does. ◦ It is delayed until when the records are necessary • Basically, RDBMS should do it if RDBMS can.
  7. Counterexample: Finding articles of blog_id = 42 • `select` can

    be faster if it's called several times ◦ Because Active Record caches the result to `@records` # A articles = Article.all x = articles.select { _1.is_active } # Article Load (3.1ms) SELECT `articles`.* FROM `articles` y = articles.select { not _1.is_active } # nothing # B x = Article .where(is_active: true) .to_a # Article Load (1.5ms) SELECT `articles`.* FROM `articles` WHERE `articles`.`is_active` = TRUE y = Article .where(is_active: false) .to_a # Article Load (1.9ms) SELECT `articles`.* FROM `articles` WHERE `articles`.`is_active` = FALSE Note: Enumerable#partition is more appropriate.
  8. Pro tips To investigate what SQL is executed • Use

    the debug log • Use `AR::Relation#to_sql` method ◦ irb(main):001> Article.all.where(blog_id: 42).to_sql => "SELECT `articles`.* FROM `articles` WHERE `articles`.`blog_id` = 42"
  9. Answer: Checking existence # A Article.all .exists?(blog_id: 42) # Article

    Exists? (0.9ms) SELECT 1 AS one FROM `articles` WHERE `articles`.`blog_id` = 42 LIMIT 1 # B Article.all .any? { _1.blog_id == 42 } # Article Load (1.7ms) SELECT `articles`.* FROM `articles` 💯 󰢄 A is faster than B!
  10. Explanation: Checking existence • Lazy execution, same as Q1. •

    Active Record has specific methods such as exists?. ◦ Use them instead of Enumerable's method • Enumerable methods are useful, but it is not optimized for AR. ◦ You can easily distinguish by that the method takes a block yielding an AR object ◦ But…
  11. You can make any? faster by a hack… • `any?

    { … }` needs records to execute the block • On the other hand, it does not need records if it does not execute the block • We have `RubyVM::AbstractSyntaxTree.of` to solve this problem!
  12. Money Forward • Nov. 9: RubyWorld Conference 2023. https://2023.rubyworld-conf.org/ja/ ◦

    I will talk about RBS. ◦ At Matsue (registration closed😢) and Online • Nov. 16: 各社の技術広報が明かす「RubyKaigiスポンサーの裏話」運営ノウハウ やコミュニティへの想い https://moneyforward.connpass.com/event/298325/ ◦ Acha-san will talk about sponsoring RubyKaigi ◦ Online event • We have an X account @moneyforwardDev!
  13. Weekly event to watch RubyKaigi 2023 videos I organize an

    event to watch RubyKaigi 2023 videos. • In a zoom meeting. • At 12:10-12:40 on every Thursday • Accounced in #rubykaigi channel of ruby-jp slack. • We'll watch "RubyGems on the watch / Maciej Mensfeld" as the next session in 2nd Nov. • Feel free to join us!
  14. Q3. Querying exists? in loop # A current_user = User.find(42)

    can_update = current_user .permissions .exists?(action: :update) response = Article.all.map do { title: _1.title, can_update:, } end # B current_user = User.find(42) response = Article.all.map do can_update = current_user .permissions .exists?(action: :update) { title: _1.title, can_update:, } end
  15. Answer: Querying exists? in loop # A current_user = User.find(42)

    can_update = current_user .permissions .exists?(action: :update) # Permission Exists? (0.8ms) SELECT 1 AS one FROM `permissions` WHERE `permissions`.`user_id` = 42 AND `permissions`.`action` = 'update' response = Article.all.map do { title: _1.title, can_update:, } end # B current_user = User.find(42) response = Article.all.map do can_update = current_user .permissions .exists?(action: :update) # Permission Exists? (0.8ms) SELECT 1 AS one FROM `permissions` WHERE `permissions`.`user_id` = 42 AND `permissions`.`action` = 'update' { title: _1.title, can_update:, } end 💯 󰢄 A is faster than B!
  16. Counterexample: Reduce the difference Active Record query cache can reduce

    the difference. We can confirm it with `ActiveRecord.cache do -- end` It is enabled by default in controller actions, but disabled in other places, such as job, rails console.
  17. Counterexample: Reduce the difference Calculating ------------------------------------- outside loop - no

    cache 192.676 (±15.1%) i/s - 4.712k in 25.056327s inside loop - no cache 4.416 (±22.6%) i/s - 109.000 in 25.288121s outside loop - cache 160.870 (±13.1%) i/s - 3.948k in 25.012053s inside loop - cache 39.832 (±10.0%) i/s - 984.000 in 25.079316s Comparison: outside loop - no cache: 192.7 i/s outside loop - cache: 160.9 i/s - same-ish: difference falls within error inside loop - cache: 39.8 i/s - 4.84x slower inside loop - no cache: 4.4 i/s - 43.63x slower
  18. Q4. Querying SELECT in loop # A categories = Category.all.to_a

    response = Article.all.map do { title: _1.title, categories:, } end # B response = Article.all.map do categories = Category.all.to_a { title: _1.title, categories:, } end
  19. Answer. Querying SELECT in loop # A categories = Category.all.to_a

    response = Article.all.map do { title: _1.title, categories:, } end # B response = Article.all.map do categories = Category.all.to_a { title: _1.title, categories:, } end 💯 󰢄 A is faster than B!
  20. AR cache makes it faster…? Calculating ------------------------------------- outside loop -

    no cache 206.570 (±18.4%) i/s - 1.007k in 5.050579s inside loop - no cache 2.511 (± 0.0%) i/s - 13.000 in 5.220488s outside loop - cache 208.327 (±17.8%) i/s - 1.020k in 5.065721s inside loop - cache 19.188 (±10.4%) i/s - 96.000 in 5.040611s Comparison: outside loop - cache: 208.3 i/s outside loop - no cache: 206.6 i/s - same-ish: difference falls within error inside loop - cache: 19.2 i/s - 10.86x slower inside loop - no cache: 2.5 i/s - 82.98x slower
  21. AR cache makes it faster, but not faster enough •

    AR cache needs to cache boolean value for `exists?` • But it needs to cache AR model objects for query fetching objects ◦ Initialization AR model is heavy
  22. Conclusion • Imagine and confirm your code executes what SQL

    ◦ Use debug log and `#to_sql` • You can find the benchmark code from pocke/kaigi-on-rails-2023 GitHub repository. Thanks for your listening!