Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Batching – A powerful way to solve N+1 queries

exAspArk
August 08, 2017

Batching – A powerful way to solve N+1 queries

#Batching #N+1 #BatchLoader #Ruby #GraphQL #Dataloader #Haxl #Lazy #Monad #Promise

* Blog post: https://engineering.universe.com/batching-a-powerful-way-to-solve-n-1-queries-every-rubyist-should-know-24e20c6e7b94
* BatchLoader: https://github.com/exaspark/batch-loader

exAspArk

August 08, 2017
Tweet

More Decks by exAspArk

Other Decks in Programming

Transcript

  1. What are N+1 queries? posts = Post.where(id: [1, 2, 3])

    # SELECT * FROM posts WHERE id IN (1, 2, 3) posts.each { |post| post.user } # SELECT * FROM users WHERE id = 1 # SELECT * FROM users WHERE id = 2 # SELECT * FROM users WHERE id = 3 1 + N
  2. How do we usually solve N+1 problem? • Eager loading

    data in models • Preloading data and passing it down explicitly
  3. Eager loading data in models posts = Post.where(id: [1, 2,

    3]).includes(:user) # SELECT * FROM posts WHERE id IN (1, 2, 3) # SELECT * FROM users WHERE id IN (1, 2, 3) posts.each { |post| post.user }
  4. Pros and Cons Pros: • Solves N+1 problem • Easy

    to use Cons: • Top-level model knows ahead which data to load • It does it every time, even if it's not necessary • ORM can’t always help (e.g. loading data from other sources)
  5. Preloading data and passing it down explicitly class Post <

    ApplicationRecord def rating(like_count, dislike_count) like_count * 2 - dislike_count end end posts = Post.where(id: [1, 2, 3]) # SELECT * FROM posts WHERE id IN (1, 2, 3) post_emoticons = Emoticon.where(post_id: posts.map(&:id)) like_count_by_post_id = post_emoticons.likes.group(:post_id).count # SELECT COUNT(*) FROM emoticons WHERE name = 'like' AND # post_id IN (1, 2, 3) GROUP BY post_id dislike_count_by_post_id = post_emoticons.dislikes.group(:post_id).count # SELECT COUNT(*) FROM emoticons WHERE name = 'dislike' AND # post_id IN (1, 2, 3) GROUP BY post_id posts.map do |post| post.rating( like_count_by_post_id[post.id], dislike_count_by_post_id[post.id] ) end
  6. Pros and Cons Pros: • Solves N+1 problem • Can

    solve the problem when ORM can’t help • Can be memory efficient (e.g. do not load all records to count) Cons: • On the top level we should know ahead what to preload • Do it every time, even if it's not necessary • Passing the data as arguments can be complicated, especially when there are several layers below (e.g. load Emoticons, pass through N Users to M Posts)
  7. History of batching It’s not a new way to solve

    N+1 queries. Facebook released Haskel Haxl library in 2014, but used the mechanism even before https://code.facebook.com/posts/302060973291128/open-sourcing-haxl-a-library-for-haskell/
  8. Other implementations JavaScript DataLoader was released in 2015, it uses

    similar ideas for batching and caching https://github.com/facebook/dataloader
  9. What is batching? General steps: 1. Passing an item to

    load in any part of the app 2. Loading values for the passed items in batch 3. Getting the loaded value where the item was passed
  10. Pros and Cons Pros: • Solves N+1 problem • Batching

    is isolated, load data in batch where and when it's needed Cons: • Need to remember how to get loaded value after batching
  11. Basic DataLoader example var batch = (userIds) => ...; var

    loader = new DataLoader(userIds => batch(userIds)); // “load” schedules a job to dispatch a queue with // Node.js “process.nextTick” and returns a promise loader.load(1).then(user1 => console.log(user1)); loader.load(2).then(user2 => console.log(user2)); loader.load(3).then(user3 => console.log(user3));
  12. BatchLoader https://github.com/exAspArk/batch-loader • Can help to avoid N+1 DB queries,

    HTTP requests, etc. • Works with any code, such as REST APIs or GraphQL • Automatically caches previously resolved values • Thread-safe • No need to share batching through instance variables or custom defined classes • No dependencies, no monkey-patches, no extra primitives such as Monad or Promise
  13. Basic BatchLoader example def user_lazy(post) BatchLoader.for(post.user_id).batch do |user_ids| User.where(id: user_ids)

    end end posts = Post.where(id: [1, 2, 3]) users_lazy = posts.map { |post| user_lazy(post) } users = BatchLoader.sync!(users_lazy) # SELECT * FROM users WHERE id IN (1, 2, 3) https://github.com/exAspArk/batch-loader
  14. Don’t worry, be lazy BatchLoader doesn't try to mimic implementations

    in other programming languages which have an asynchronous nature. So, it doesn't load extra primitives such as Promise, which are not very popular in Ruby community (unless you’re using something like Eventmachine). Instead, it uses the idea of lazy objects, which are included in Ruby standard library (e.g. lazy arrays). They allow to manipulate with objects and then resolve them at the end when it's necessary.
  15. BatchLoader uses similar approach https://ruby-doc.org/core-2.4.1/Enumerator/Lazy.html range = (1..Float::INFINITY) values_lazy =

    range.lazy.map { |i| i * i }.take(10) values_lazy.force # => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
  16. Matching passed items and loaded values JavaScript DataLoader and similar

    implementations do it implicitly and add these 2 constraints: • The array of loaded values must be the same length as the array of passed items • Each index in the array of loaded values must correspond to the same index in the array of passed items
  17. Which generally means Instead of: BatchLoader.for(post.user_id).batch do |user_ids| User.where(id: user_ids)

    end You usually have to manually re-order the loaded values and add null values for missing ones: BatchLoader.for(post.user_id).batch do |user_ids| users = User.where(id: user_ids) user_by_id = users.each_with_object({}) { |u, m| m[u.id] = u } user_ids.map { |user_id| user_by_id[user_id] } end
  18. Example: N+1 HTTP requests with REST API # app/models/post.rb class

    Post < ApplicationRecord def rating HttpClient.get(“https://example.com/ratings/#{id}") end end # app/controllers/posts_controller.rb class PostsController < ApplicationController def index posts = Post.limit(10) serialized_posts = posts.map do |post| {id: post.id, rating: post.rating} # N+1 HTTP requests end render json: serialized_posts end end
  19. Batching N+1 HTTP requests BatchLoader#load is thread-safe. So, if HttpClient

    is also thread-safe, then with Parallel gem we can execute all HTTP requests concurrently in threads. MRI releases Global Interpreter Lock when thread hits blocking I/O – HTTP request in our case. https://github.com/grosser/parallel
  20. Batching N+1 HTTP requests # app/models/post.rb class Post < ApplicationRecord

    def rating_lazy BatchLoader.for(post).batch do |posts, batch_loader| Parallel.each(posts, in_threads: 10) do |post| batch_loader.load(post, post.rating) end end end end # app/controllers/posts_controller.rb class PostsController < ApplicationController def index posts = Post.limit(10) serialized_posts = posts.map do |post| {id: post.id, rating: post.rating_lazy} end render json: BatchLoader.sync!(serialized_posts) end end
  21. Example: N+1 DB queries with GraphQL With GraphQL using batching

    is particularly useful. You can't use usual techniques such as preloading associations in advance to avoid N+1 queries. Since you don't know which fields user is going to ask in a query.
  22. Example: N+1 DB queries with GraphQL Schema = GraphQL::Schema.define do

    query QueryType end QueryType = GraphQL::ObjectType.define do name "Query" field :posts, !types[PostType] do resolve ->(obj, args, ctx) { Post.all } end end PostType = GraphQL::ObjectType.define do name "Post" field :user, !UserType do resolve ->(post, args, ctx) { post.user } # N+1 DB queries end end UserType = GraphQL::ObjectType.define do name "User" field :name, !types.String end
  23. Example: N+1 DB queries with GraphQL query = " {

    posts { user { name } } } " Schema.execute(query) # SELECT * FROM posts # SELECT * FROM users WHERE id = 1 # SELECT * FROM users WHERE id = 2 # SELECT * FROM users WHERE id = 3
  24. Use BatchLoader in resolve lambda PostType = GraphQL::ObjectType.define do name

    "Post" field :user, !UserType, resolve: ->(post, args, ctx) do BatchLoader.for(post.user_id).batch do |user_ids, batch_loader| User.where(id: user_ids).each { |u| batch_loader.load(u.id, u) } end end end
  25. Setup GraphQL schema Schema = GraphQL::Schema.define do query QueryType lazy_resolve

    BatchLoader, :sync end Schema.execute(query) # SELECT * FROM posts # SELECT * FROM users WHERE id IN (1, 2, 3)
  26. graphql-ruby GraphQL lazy_resolve will basically call a resolve lambda of

    the field. If it returns an instance of a lazy BatchLoader, it’ll call sync later to get the actual loaded value automatically. https://github.com/rmosolgo/graphql-ruby/pull/386
  27. graphql-ruby I want to make graphql-ruby an easy choice for

    importing into existing projects, and part of that means keeping the dependencies [concurrent-ruby, ActiveSupport, Celluloid] down for people who take those issues seriously. – @rmosolgo, creator of graphql-ruby https://github.com/rmosolgo/graphql-ruby/pull/386
  28. Simplified implementation # The shared store between BatchLoaders is current

    thread Thread.current[:batch_loader] = {} https://github.com/exAspArk/batch-loader
  29. Simplified implementation # Saves a block which loads users in

    batch # Returns "self" – BatchLoader instance batch_loader.batch(&batch_users) # It also saves the passed “for" item store = Thread.current[:batch_loader] store[hash(&batch_users)] ||= {} store[hash(&batch_users)][user_id] = nil https://github.com/exAspArk/batch-loader
  30. Simplified implementation # Batch block should call "load" batch_users =

    ->(user_ids, batch_loader) do User.where(id: user_ids).each do |user| batch_loader.load(user.id, user) end end # “load” stores the loaded value store[hash(&batch_users)][user_id] = user # store: # # { # hash(&batch_users) => { # user_id1 => user1, # user_id2 => user2 # } # } https://github.com/exAspArk/batch-loader
  31. Simplified implementation # And finally batch_loader.sync # It'll call "batch_users"

    block, which calls “load” user_ids = store[hash(&batch_users)].keys batch_users.call(user_ids, self) # Return the stored value store[hash(&batch_users)][user_id] https://github.com/exAspArk/batch-loader
  32. Simplified implementation That’s it. Less than 150 lines of code

    KISS https://github.com/exAspArk/batch-loader
  33. How to find N+1 queries? • Bullet gem • Monitoring

    apps: NewRelic, Scout, etc. • Code review • Reading logs :) https://github.com/flyerhzm/bullet