Slide 1

Slide 1 text

Batching – A powerful way to solve N+1 queries exAspArk Evgeny Li

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

What are N+1 queries? posts = Post.where(id: [1, 2, 3]) # SELECT * FROM posts WHERE id IN (1, 2, 3) posts.each { |post| post.user } # SELECT * FROM users WHERE id = 1 # SELECT * FROM users WHERE id = 2 # SELECT * FROM users WHERE id = 3 1 + N

Slide 4

Slide 4 text

How do we usually solve N+1 problem? • Eager loading data in models • Preloading data and passing it down explicitly

Slide 5

Slide 5 text

Eager loading data in models posts = Post.where(id: [1, 2, 3]).includes(:user) # SELECT * FROM posts WHERE id IN (1, 2, 3) # SELECT * FROM users WHERE id IN (1, 2, 3) posts.each { |post| post.user }

Slide 6

Slide 6 text

Pros and Cons Pros: • Solves N+1 problem • Easy to use Cons: • Top-level model knows ahead which data to load • It does it every time, even if it's not necessary • ORM can’t always help (e.g. loading data from other sources)

Slide 7

Slide 7 text

Preloading data and passing it down explicitly class Post < ApplicationRecord def rating(like_count, dislike_count) like_count * 2 - dislike_count end end posts = Post.where(id: [1, 2, 3]) # SELECT * FROM posts WHERE id IN (1, 2, 3) post_emoticons = Emoticon.where(post_id: posts.map(&:id)) like_count_by_post_id = post_emoticons.likes.group(:post_id).count # SELECT COUNT(*) FROM emoticons WHERE name = 'like' AND # post_id IN (1, 2, 3) GROUP BY post_id dislike_count_by_post_id = post_emoticons.dislikes.group(:post_id).count # SELECT COUNT(*) FROM emoticons WHERE name = 'dislike' AND # post_id IN (1, 2, 3) GROUP BY post_id posts.map do |post| post.rating( like_count_by_post_id[post.id], dislike_count_by_post_id[post.id] ) end

Slide 8

Slide 8 text

Pros and Cons Pros: • Solves N+1 problem • Can solve the problem when ORM can’t help • Can be memory efficient (e.g. do not load all records to count) Cons: • On the top level we should know ahead what to preload • Do it every time, even if it's not necessary • Passing the data as arguments can be complicated, especially when there are several layers below (e.g. load Emoticons, pass through N Users to M Posts)

Slide 9

Slide 9 text

There is another way to avoid N+1 queries!

Slide 10

Slide 10 text

History of batching It’s not a new way to solve N+1 queries. Facebook released Haskel Haxl library in 2014, but used the mechanism even before https://code.facebook.com/posts/302060973291128/open-sourcing-haxl-a-library-for-haskell/

Slide 11

Slide 11 text

Batching explained on one slide https://github.com/facebook/Haxl

Slide 12

Slide 12 text

Haxl with Monads, Applicatives, Functors https://github.com/facebook/Haxl

Slide 13

Slide 13 text

Other implementations JavaScript DataLoader was released in 2015, it uses similar ideas for batching and caching https://github.com/facebook/dataloader

Slide 14

Slide 14 text

https://www.youtube.com/watch?v=OQTnXNCDywA DataLoader with Promises

Slide 15

Slide 15 text

What is batching? General steps: 1. Passing an item to load in any part of the app 2. Loading values for the passed items in batch 3. Getting the loaded value where the item was passed

Slide 16

Slide 16 text

Pros and Cons Pros: • Solves N+1 problem • Batching is isolated, load data in batch where and when it's needed Cons: • Need to remember how to get loaded value after batching

Slide 17

Slide 17 text

Basic DataLoader example var batch = (userIds) => ...; var loader = new DataLoader(userIds => batch(userIds)); // “load” schedules a job to dispatch a queue with // Node.js “process.nextTick” and returns a promise loader.load(1).then(user1 => console.log(user1)); loader.load(2).then(user2 => console.log(user2)); loader.load(3).then(user3 => console.log(user3));

Slide 18

Slide 18 text

What about Ruby? https://github.com/exAspArk/batch-loader

Slide 19

Slide 19 text

BatchLoader https://github.com/exAspArk/batch-loader • Can help to avoid N+1 DB queries, HTTP requests, etc. • Works with any code, such as REST APIs or GraphQL • Automatically caches previously resolved values • Thread-safe • No need to share batching through instance variables or custom defined classes • No dependencies, no monkey-patches, no extra primitives such as Monad or Promise

Slide 20

Slide 20 text

Basic BatchLoader example def user_lazy(post) BatchLoader.for(post.user_id).batch do |user_ids| User.where(id: user_ids) end end posts = Post.where(id: [1, 2, 3]) users_lazy = posts.map { |post| user_lazy(post) } users = BatchLoader.sync!(users_lazy) # SELECT * FROM users WHERE id IN (1, 2, 3) https://github.com/exAspArk/batch-loader

Slide 21

Slide 21 text

Don’t worry, be lazy BatchLoader doesn't try to mimic implementations in other programming languages which have an asynchronous nature. So, it doesn't load extra primitives such as Promise, which are not very popular in Ruby community (unless you’re using something like Eventmachine). Instead, it uses the idea of lazy objects, which are included in Ruby standard library (e.g. lazy arrays). They allow to manipulate with objects and then resolve them at the end when it's necessary.

Slide 22

Slide 22 text

BatchLoader uses similar approach https://ruby-doc.org/core-2.4.1/Enumerator/Lazy.html range = (1..Float::INFINITY) values_lazy = range.lazy.map { |i| i * i }.take(10) values_lazy.force # => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Slide 23

Slide 23 text

Matching passed items and loaded values JavaScript DataLoader and similar implementations do it implicitly and add these 2 constraints: • The array of loaded values must be the same length as the array of passed items • Each index in the array of loaded values must correspond to the same index in the array of passed items

Slide 24

Slide 24 text

Which generally means Instead of: BatchLoader.for(post.user_id).batch do |user_ids| User.where(id: user_ids) end You usually have to manually re-order the loaded values and add null values for missing ones: BatchLoader.for(post.user_id).batch do |user_ids| users = User.where(id: user_ids) user_by_id = users.each_with_object({}) { |u, m| m[u.id] = u } user_ids.map { |user_id| user_by_id[user_id] } end

Slide 25

Slide 25 text

Let’s just do it explicitly BatchLoader.for(post.user_id).batch do |user_ids, batch_loader| User.where(id: user_ids).each { |u| batch_loader.load(u.id, u) } end

Slide 26

Slide 26 text

Example: N+1 HTTP requests with REST API # app/models/post.rb class Post < ApplicationRecord def rating HttpClient.get(“https://example.com/ratings/#{id}") end end # app/controllers/posts_controller.rb class PostsController < ApplicationController def index posts = Post.limit(10) serialized_posts = posts.map do |post| {id: post.id, rating: post.rating} # N+1 HTTP requests end render json: serialized_posts end end

Slide 27

Slide 27 text

Batching N+1 HTTP requests BatchLoader#load is thread-safe. So, if HttpClient is also thread-safe, then with Parallel gem we can execute all HTTP requests concurrently in threads. MRI releases Global Interpreter Lock when thread hits blocking I/O – HTTP request in our case. https://github.com/grosser/parallel

Slide 28

Slide 28 text

Batching N+1 HTTP requests # app/models/post.rb class Post < ApplicationRecord def rating_lazy BatchLoader.for(post).batch do |posts, batch_loader| Parallel.each(posts, in_threads: 10) do |post| batch_loader.load(post, post.rating) end end end end # app/controllers/posts_controller.rb class PostsController < ApplicationController def index posts = Post.limit(10) serialized_posts = posts.map do |post| {id: post.id, rating: post.rating_lazy} end render json: BatchLoader.sync!(serialized_posts) end end

Slide 29

Slide 29 text

Example: N+1 DB queries with GraphQL With GraphQL using batching is particularly useful. You can't use usual techniques such as preloading associations in advance to avoid N+1 queries. Since you don't know which fields user is going to ask in a query.

Slide 30

Slide 30 text

Example: N+1 DB queries with GraphQL Schema = GraphQL::Schema.define do query QueryType end QueryType = GraphQL::ObjectType.define do name "Query" field :posts, !types[PostType] do resolve ->(obj, args, ctx) { Post.all } end end PostType = GraphQL::ObjectType.define do name "Post" field :user, !UserType do resolve ->(post, args, ctx) { post.user } # N+1 DB queries end end UserType = GraphQL::ObjectType.define do name "User" field :name, !types.String end

Slide 31

Slide 31 text

Example: N+1 DB queries with GraphQL query = " { posts { user { name } } } " Schema.execute(query) # SELECT * FROM posts # SELECT * FROM users WHERE id = 1 # SELECT * FROM users WHERE id = 2 # SELECT * FROM users WHERE id = 3

Slide 32

Slide 32 text

Use BatchLoader in resolve lambda PostType = GraphQL::ObjectType.define do name "Post" field :user, !UserType, resolve: ->(post, args, ctx) do BatchLoader.for(post.user_id).batch do |user_ids, batch_loader| User.where(id: user_ids).each { |u| batch_loader.load(u.id, u) } end end end

Slide 33

Slide 33 text

Setup GraphQL schema Schema = GraphQL::Schema.define do query QueryType lazy_resolve BatchLoader, :sync end Schema.execute(query) # SELECT * FROM posts # SELECT * FROM users WHERE id IN (1, 2, 3)

Slide 34

Slide 34 text

Approved by the creator of graphql-ruby https://github.com/rmosolgo/graphql-ruby/pull/876

Slide 35

Slide 35 text

graphql-ruby GraphQL lazy_resolve will basically call a resolve lambda of the field. If it returns an instance of a lazy BatchLoader, it’ll call sync later to get the actual loaded value automatically. https://github.com/rmosolgo/graphql-ruby/pull/386

Slide 36

Slide 36 text

graphql-ruby I want to make graphql-ruby an easy choice for importing into existing projects, and part of that means keeping the dependencies [concurrent-ruby, ActiveSupport, Celluloid] down for people who take those issues seriously. – @rmosolgo, creator of graphql-ruby https://github.com/rmosolgo/graphql-ruby/pull/386

Slide 37

Slide 37 text

Simplified implementation # Returns a BatchLoader instance batch_loader = BatchLoader.for(user_id) https://github.com/exAspArk/batch-loader

Slide 38

Slide 38 text

Simplified implementation # The shared store between BatchLoaders is current thread Thread.current[:batch_loader] = {} https://github.com/exAspArk/batch-loader

Slide 39

Slide 39 text

Simplified implementation # Saves a block which loads users in batch # Returns "self" – BatchLoader instance batch_loader.batch(&batch_users) # It also saves the passed “for" item store = Thread.current[:batch_loader] store[hash(&batch_users)] ||= {} store[hash(&batch_users)][user_id] = nil https://github.com/exAspArk/batch-loader

Slide 40

Slide 40 text

Simplified implementation # Batch block should call "load" batch_users = ->(user_ids, batch_loader) do User.where(id: user_ids).each do |user| batch_loader.load(user.id, user) end end # “load” stores the loaded value store[hash(&batch_users)][user_id] = user # store: # # { # hash(&batch_users) => { # user_id1 => user1, # user_id2 => user2 # } # } https://github.com/exAspArk/batch-loader

Slide 41

Slide 41 text

Simplified implementation # And finally batch_loader.sync # It'll call "batch_users" block, which calls “load” user_ids = store[hash(&batch_users)].keys batch_users.call(user_ids, self) # Return the stored value store[hash(&batch_users)][user_id] https://github.com/exAspArk/batch-loader

Slide 42

Slide 42 text

Simplified implementation That’s it. Less than 150 lines of code KISS https://github.com/exAspArk/batch-loader

Slide 43

Slide 43 text

How to find N+1 queries? • Bullet gem • Monitoring apps: NewRelic, Scout, etc. • Code review • Reading logs :) https://github.com/flyerhzm/bullet

Slide 44

Slide 44 text

Takeaways Learn more by exploring other programming languages. But don’t blindly copy and paste the solutions