Save 37% off PRO during our Black Friday Sale! »

Batching – A powerful way to solve N+1 queries

Cb09696b034cce3cc79cab80a4bba4a3?s=47 exAspArk
August 08, 2017

Batching – A powerful way to solve N+1 queries

#Batching #N+1 #BatchLoader #Ruby #GraphQL #Dataloader #Haxl #Lazy #Monad #Promise

* Blog post: https://engineering.universe.com/batching-a-powerful-way-to-solve-n-1-queries-every-rubyist-should-know-24e20c6e7b94
* BatchLoader: https://github.com/exaspark/batch-loader

Cb09696b034cce3cc79cab80a4bba4a3?s=128

exAspArk

August 08, 2017
Tweet

Transcript

  1. Batching – A powerful way to solve N+1 queries exAspArk

    Evgeny Li
  2. None
  3. What are N+1 queries? posts = Post.where(id: [1, 2, 3])

    # SELECT * FROM posts WHERE id IN (1, 2, 3) posts.each { |post| post.user } # SELECT * FROM users WHERE id = 1 # SELECT * FROM users WHERE id = 2 # SELECT * FROM users WHERE id = 3 1 + N
  4. How do we usually solve N+1 problem? • Eager loading

    data in models • Preloading data and passing it down explicitly
  5. Eager loading data in models posts = Post.where(id: [1, 2,

    3]).includes(:user) # SELECT * FROM posts WHERE id IN (1, 2, 3) # SELECT * FROM users WHERE id IN (1, 2, 3) posts.each { |post| post.user }
  6. Pros and Cons Pros: • Solves N+1 problem • Easy

    to use Cons: • Top-level model knows ahead which data to load • It does it every time, even if it's not necessary • ORM can’t always help (e.g. loading data from other sources)
  7. Preloading data and passing it down explicitly class Post <

    ApplicationRecord def rating(like_count, dislike_count) like_count * 2 - dislike_count end end posts = Post.where(id: [1, 2, 3]) # SELECT * FROM posts WHERE id IN (1, 2, 3) post_emoticons = Emoticon.where(post_id: posts.map(&:id)) like_count_by_post_id = post_emoticons.likes.group(:post_id).count # SELECT COUNT(*) FROM emoticons WHERE name = 'like' AND # post_id IN (1, 2, 3) GROUP BY post_id dislike_count_by_post_id = post_emoticons.dislikes.group(:post_id).count # SELECT COUNT(*) FROM emoticons WHERE name = 'dislike' AND # post_id IN (1, 2, 3) GROUP BY post_id posts.map do |post| post.rating( like_count_by_post_id[post.id], dislike_count_by_post_id[post.id] ) end
  8. Pros and Cons Pros: • Solves N+1 problem • Can

    solve the problem when ORM can’t help • Can be memory efficient (e.g. do not load all records to count) Cons: • On the top level we should know ahead what to preload • Do it every time, even if it's not necessary • Passing the data as arguments can be complicated, especially when there are several layers below (e.g. load Emoticons, pass through N Users to M Posts)
  9. There is another way to avoid N+1 queries!

  10. History of batching It’s not a new way to solve

    N+1 queries. Facebook released Haskel Haxl library in 2014, but used the mechanism even before https://code.facebook.com/posts/302060973291128/open-sourcing-haxl-a-library-for-haskell/
  11. Batching explained on one slide https://github.com/facebook/Haxl

  12. Haxl with Monads, Applicatives, Functors https://github.com/facebook/Haxl

  13. Other implementations JavaScript DataLoader was released in 2015, it uses

    similar ideas for batching and caching https://github.com/facebook/dataloader
  14. https://www.youtube.com/watch?v=OQTnXNCDywA DataLoader with Promises

  15. What is batching? General steps: 1. Passing an item to

    load in any part of the app 2. Loading values for the passed items in batch 3. Getting the loaded value where the item was passed
  16. Pros and Cons Pros: • Solves N+1 problem • Batching

    is isolated, load data in batch where and when it's needed Cons: • Need to remember how to get loaded value after batching
  17. Basic DataLoader example var batch = (userIds) => ...; var

    loader = new DataLoader(userIds => batch(userIds)); // “load” schedules a job to dispatch a queue with // Node.js “process.nextTick” and returns a promise loader.load(1).then(user1 => console.log(user1)); loader.load(2).then(user2 => console.log(user2)); loader.load(3).then(user3 => console.log(user3));
  18. What about Ruby? https://github.com/exAspArk/batch-loader

  19. BatchLoader https://github.com/exAspArk/batch-loader • Can help to avoid N+1 DB queries,

    HTTP requests, etc. • Works with any code, such as REST APIs or GraphQL • Automatically caches previously resolved values • Thread-safe • No need to share batching through instance variables or custom defined classes • No dependencies, no monkey-patches, no extra primitives such as Monad or Promise
  20. Basic BatchLoader example def user_lazy(post) BatchLoader.for(post.user_id).batch do |user_ids| User.where(id: user_ids)

    end end posts = Post.where(id: [1, 2, 3]) users_lazy = posts.map { |post| user_lazy(post) } users = BatchLoader.sync!(users_lazy) # SELECT * FROM users WHERE id IN (1, 2, 3) https://github.com/exAspArk/batch-loader
  21. Don’t worry, be lazy BatchLoader doesn't try to mimic implementations

    in other programming languages which have an asynchronous nature. So, it doesn't load extra primitives such as Promise, which are not very popular in Ruby community (unless you’re using something like Eventmachine). Instead, it uses the idea of lazy objects, which are included in Ruby standard library (e.g. lazy arrays). They allow to manipulate with objects and then resolve them at the end when it's necessary.
  22. BatchLoader uses similar approach https://ruby-doc.org/core-2.4.1/Enumerator/Lazy.html range = (1..Float::INFINITY) values_lazy =

    range.lazy.map { |i| i * i }.take(10) values_lazy.force # => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
  23. Matching passed items and loaded values JavaScript DataLoader and similar

    implementations do it implicitly and add these 2 constraints: • The array of loaded values must be the same length as the array of passed items • Each index in the array of loaded values must correspond to the same index in the array of passed items
  24. Which generally means Instead of: BatchLoader.for(post.user_id).batch do |user_ids| User.where(id: user_ids)

    end You usually have to manually re-order the loaded values and add null values for missing ones: BatchLoader.for(post.user_id).batch do |user_ids| users = User.where(id: user_ids) user_by_id = users.each_with_object({}) { |u, m| m[u.id] = u } user_ids.map { |user_id| user_by_id[user_id] } end
  25. Let’s just do it explicitly BatchLoader.for(post.user_id).batch do |user_ids, batch_loader| User.where(id:

    user_ids).each { |u| batch_loader.load(u.id, u) } end
  26. Example: N+1 HTTP requests with REST API # app/models/post.rb class

    Post < ApplicationRecord def rating HttpClient.get(“https://example.com/ratings/#{id}") end end # app/controllers/posts_controller.rb class PostsController < ApplicationController def index posts = Post.limit(10) serialized_posts = posts.map do |post| {id: post.id, rating: post.rating} # N+1 HTTP requests end render json: serialized_posts end end
  27. Batching N+1 HTTP requests BatchLoader#load is thread-safe. So, if HttpClient

    is also thread-safe, then with Parallel gem we can execute all HTTP requests concurrently in threads. MRI releases Global Interpreter Lock when thread hits blocking I/O – HTTP request in our case. https://github.com/grosser/parallel
  28. Batching N+1 HTTP requests # app/models/post.rb class Post < ApplicationRecord

    def rating_lazy BatchLoader.for(post).batch do |posts, batch_loader| Parallel.each(posts, in_threads: 10) do |post| batch_loader.load(post, post.rating) end end end end # app/controllers/posts_controller.rb class PostsController < ApplicationController def index posts = Post.limit(10) serialized_posts = posts.map do |post| {id: post.id, rating: post.rating_lazy} end render json: BatchLoader.sync!(serialized_posts) end end
  29. Example: N+1 DB queries with GraphQL With GraphQL using batching

    is particularly useful. You can't use usual techniques such as preloading associations in advance to avoid N+1 queries. Since you don't know which fields user is going to ask in a query.
  30. Example: N+1 DB queries with GraphQL Schema = GraphQL::Schema.define do

    query QueryType end QueryType = GraphQL::ObjectType.define do name "Query" field :posts, !types[PostType] do resolve ->(obj, args, ctx) { Post.all } end end PostType = GraphQL::ObjectType.define do name "Post" field :user, !UserType do resolve ->(post, args, ctx) { post.user } # N+1 DB queries end end UserType = GraphQL::ObjectType.define do name "User" field :name, !types.String end
  31. Example: N+1 DB queries with GraphQL query = " {

    posts { user { name } } } " Schema.execute(query) # SELECT * FROM posts # SELECT * FROM users WHERE id = 1 # SELECT * FROM users WHERE id = 2 # SELECT * FROM users WHERE id = 3
  32. Use BatchLoader in resolve lambda PostType = GraphQL::ObjectType.define do name

    "Post" field :user, !UserType, resolve: ->(post, args, ctx) do BatchLoader.for(post.user_id).batch do |user_ids, batch_loader| User.where(id: user_ids).each { |u| batch_loader.load(u.id, u) } end end end
  33. Setup GraphQL schema Schema = GraphQL::Schema.define do query QueryType lazy_resolve

    BatchLoader, :sync end Schema.execute(query) # SELECT * FROM posts # SELECT * FROM users WHERE id IN (1, 2, 3)
  34. Approved by the creator of graphql-ruby https://github.com/rmosolgo/graphql-ruby/pull/876

  35. graphql-ruby GraphQL lazy_resolve will basically call a resolve lambda of

    the field. If it returns an instance of a lazy BatchLoader, it’ll call sync later to get the actual loaded value automatically. https://github.com/rmosolgo/graphql-ruby/pull/386
  36. graphql-ruby I want to make graphql-ruby an easy choice for

    importing into existing projects, and part of that means keeping the dependencies [concurrent-ruby, ActiveSupport, Celluloid] down for people who take those issues seriously. – @rmosolgo, creator of graphql-ruby https://github.com/rmosolgo/graphql-ruby/pull/386
  37. Simplified implementation # Returns a BatchLoader instance batch_loader = BatchLoader.for(user_id)

    https://github.com/exAspArk/batch-loader
  38. Simplified implementation # The shared store between BatchLoaders is current

    thread Thread.current[:batch_loader] = {} https://github.com/exAspArk/batch-loader
  39. Simplified implementation # Saves a block which loads users in

    batch # Returns "self" – BatchLoader instance batch_loader.batch(&batch_users) # It also saves the passed “for" item store = Thread.current[:batch_loader] store[hash(&batch_users)] ||= {} store[hash(&batch_users)][user_id] = nil https://github.com/exAspArk/batch-loader
  40. Simplified implementation # Batch block should call "load" batch_users =

    ->(user_ids, batch_loader) do User.where(id: user_ids).each do |user| batch_loader.load(user.id, user) end end # “load” stores the loaded value store[hash(&batch_users)][user_id] = user # store: # # { # hash(&batch_users) => { # user_id1 => user1, # user_id2 => user2 # } # } https://github.com/exAspArk/batch-loader
  41. Simplified implementation # And finally batch_loader.sync # It'll call "batch_users"

    block, which calls “load” user_ids = store[hash(&batch_users)].keys batch_users.call(user_ids, self) # Return the stored value store[hash(&batch_users)][user_id] https://github.com/exAspArk/batch-loader
  42. Simplified implementation That’s it. Less than 150 lines of code

    KISS https://github.com/exAspArk/batch-loader
  43. How to find N+1 queries? • Bullet gem • Monitoring

    apps: NewRelic, Scout, etc. • Code review • Reading logs :) https://github.com/flyerhzm/bullet
  44. Takeaways Learn more by exploring other programming languages. But don’t

    blindly copy and paste the solutions