Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Active Record Query Quiz

Avatar for pocke pocke
October 27, 2023

Active Record Query Quiz

Avatar for pocke

pocke

October 27, 2023
Tweet

More Decks by pocke

Other Decks in Programming

Transcript

  1. pp self • Masataka Pocke Kuwabara • Work for Money

    Forward, Inc. ◦ Rails engineer for クラウド会計Plus ◦ A maintainer of RBS • Live in Okayama Read it with the conference app→ (app.kaigionrails.org)
  2. Goal • You can choose the appropriate method for querying

    w/ AR BTW, this talk focuses on the Rails beginner. It may be boring for Rails experts.
  3. Quiz Rule • I'll show you two kinds of code

    ◦ using Active Record ◦ doing the same things • You choose the faster code • Rails 7.1 / mysql2 adapter and MySQL 8.0 / ruby 3.3.0-dev
  4. Q1. Finding articles of blog_id = 42 # A Article

    .all .select { _1.blog_id == 42 } # B Article .all .where(blog_id: 42) .to_a
  5. Answer. Finding articles of blog_id = 42 # A Article

    .all .select { _1.blog_id == 42 } # Article Load (2.4ms) SELECT `articles`.* FROM `articles` # B Article .all .where(blog_id: 42) .to_a # Article Load (1.1ms) SELECT `articles`.* FROM `articles` WHERE `articles`.`blog_id` = 42 B is faster than A! 💯 󰢄
  6. Explanation: Finding articles of blog_id = 42 • Lazy evaluation

    ◦ `all` and `where` do not execute SQL, but `to_a` does. ◦ It is delayed until when the records are necessary • Basically, RDBMS should do it if RDBMS can.
  7. Counterexample: Finding articles of blog_id = 42 • `select` can

    be faster if it's called several times ◦ Because Active Record caches the result to `@records` # A articles = Article.all x = articles.select { _1.is_active } # Article Load (3.1ms) SELECT `articles`.* FROM `articles` y = articles.select { not _1.is_active } # nothing # B x = Article .where(is_active: true) .to_a # Article Load (1.5ms) SELECT `articles`.* FROM `articles` WHERE `articles`.`is_active` = TRUE y = Article .where(is_active: false) .to_a # Article Load (1.9ms) SELECT `articles`.* FROM `articles` WHERE `articles`.`is_active` = FALSE Note: Enumerable#partition is more appropriate.
  8. Pro tips To investigate what SQL is executed • Use

    the debug log • Use `AR::Relation#to_sql` method ◦ irb(main):001> Article.all.where(blog_id: 42).to_sql => "SELECT `articles`.* FROM `articles` WHERE `articles`.`blog_id` = 42"
  9. Answer: Checking existence # A Article.all .exists?(blog_id: 42) # Article

    Exists? (0.9ms) SELECT 1 AS one FROM `articles` WHERE `articles`.`blog_id` = 42 LIMIT 1 # B Article.all .any? { _1.blog_id == 42 } # Article Load (1.7ms) SELECT `articles`.* FROM `articles` 💯 󰢄 A is faster than B!
  10. Explanation: Checking existence • Lazy execution, same as Q1. •

    Active Record has specific methods such as exists?. ◦ Use them instead of Enumerable's method • Enumerable methods are useful, but it is not optimized for AR. ◦ You can easily distinguish by that the method takes a block yielding an AR object ◦ But…
  11. You can make any? faster by a hack… • `any?

    { … }` needs records to execute the block • On the other hand, it does not need records if it does not execute the block • We have `RubyVM::AbstractSyntaxTree.of` to solve this problem!
  12. Money Forward • Nov. 9: RubyWorld Conference 2023. https://2023.rubyworld-conf.org/ja/ ◦

    I will talk about RBS. ◦ At Matsue (registration closed😢) and Online • Nov. 16: 各社の技術広報が明かす「RubyKaigiスポンサーの裏話」運営ノウハウ やコミュニティへの想い https://moneyforward.connpass.com/event/298325/ ◦ Acha-san will talk about sponsoring RubyKaigi ◦ Online event • We have an X account @moneyforwardDev!
  13. Weekly event to watch RubyKaigi 2023 videos I organize an

    event to watch RubyKaigi 2023 videos. • In a zoom meeting. • At 12:10-12:40 on every Thursday • Accounced in #rubykaigi channel of ruby-jp slack. • We'll watch "RubyGems on the watch / Maciej Mensfeld" as the next session in 2nd Nov. • Feel free to join us!
  14. Q3. Querying exists? in loop # A current_user = User.find(42)

    can_update = current_user .permissions .exists?(action: :update) response = Article.all.map do { title: _1.title, can_update:, } end # B current_user = User.find(42) response = Article.all.map do can_update = current_user .permissions .exists?(action: :update) { title: _1.title, can_update:, } end
  15. Answer: Querying exists? in loop # A current_user = User.find(42)

    can_update = current_user .permissions .exists?(action: :update) # Permission Exists? (0.8ms) SELECT 1 AS one FROM `permissions` WHERE `permissions`.`user_id` = 42 AND `permissions`.`action` = 'update' response = Article.all.map do { title: _1.title, can_update:, } end # B current_user = User.find(42) response = Article.all.map do can_update = current_user .permissions .exists?(action: :update) # Permission Exists? (0.8ms) SELECT 1 AS one FROM `permissions` WHERE `permissions`.`user_id` = 42 AND `permissions`.`action` = 'update' { title: _1.title, can_update:, } end 💯 󰢄 A is faster than B!
  16. Counterexample: Reduce the difference Active Record query cache can reduce

    the difference. We can confirm it with `ActiveRecord.cache do -- end` It is enabled by default in controller actions, but disabled in other places, such as job, rails console.
  17. Counterexample: Reduce the difference Calculating ------------------------------------- outside loop - no

    cache 192.676 (±15.1%) i/s - 4.712k in 25.056327s inside loop - no cache 4.416 (±22.6%) i/s - 109.000 in 25.288121s outside loop - cache 160.870 (±13.1%) i/s - 3.948k in 25.012053s inside loop - cache 39.832 (±10.0%) i/s - 984.000 in 25.079316s Comparison: outside loop - no cache: 192.7 i/s outside loop - cache: 160.9 i/s - same-ish: difference falls within error inside loop - cache: 39.8 i/s - 4.84x slower inside loop - no cache: 4.4 i/s - 43.63x slower
  18. Q4. Querying SELECT in loop # A categories = Category.all.to_a

    response = Article.all.map do { title: _1.title, categories:, } end # B response = Article.all.map do categories = Category.all.to_a { title: _1.title, categories:, } end
  19. Answer. Querying SELECT in loop # A categories = Category.all.to_a

    response = Article.all.map do { title: _1.title, categories:, } end # B response = Article.all.map do categories = Category.all.to_a { title: _1.title, categories:, } end 💯 󰢄 A is faster than B!
  20. AR cache makes it faster…? Calculating ------------------------------------- outside loop -

    no cache 206.570 (±18.4%) i/s - 1.007k in 5.050579s inside loop - no cache 2.511 (± 0.0%) i/s - 13.000 in 5.220488s outside loop - cache 208.327 (±17.8%) i/s - 1.020k in 5.065721s inside loop - cache 19.188 (±10.4%) i/s - 96.000 in 5.040611s Comparison: outside loop - cache: 208.3 i/s outside loop - no cache: 206.6 i/s - same-ish: difference falls within error inside loop - cache: 19.2 i/s - 10.86x slower inside loop - no cache: 2.5 i/s - 82.98x slower
  21. AR cache makes it faster, but not faster enough •

    AR cache needs to cache boolean value for `exists?` • But it needs to cache AR model objects for query fetching objects ◦ Initialization AR model is heavy
  22. Conclusion • Imagine and confirm your code executes what SQL

    ◦ Use debug log and `#to_sql` • You can find the benchmark code from pocke/kaigi-on-rails-2023 GitHub repository. Thanks for your listening!