Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fast Paging with MongoDB

Avatar for ScaleGrid ScaleGrid
December 20, 2018

Fast Paging with MongoDB

Paging through your data is one of the most common operations with MongoDB. A typical scenario is the need to display your results in chunks in your UI. If you’re batch processing your data, it’s also important to get your paging strategy correct so your data processing can scale.

Let’s walk through an example to see the different ways to page through data in MongoDB. In this example, we have a CRM database of user data that we need to page through and display 10 users at a time. So in effect, our page size is 10. Here is the structure of our user document:

{
_id,
name,
company,
state
}

Approach 1: Using skip() and limit()
MongoDB natively supports the paging operation using the skip() and limit() commands. The skip(n) directive tells MongoDB that it should skip ‘n’ results, and the limit(n) directive instructs MongoDB that it should limit the result length to ‘n’ results. Typically, you’ll be using the skip() and limit() directives with your cursor - but to illustrate the scenario, we provide console commands that would achieve the same results. Also, for brevity of code, the limits-checking code is also excluded:

//Page 1
db.users.find().limit (10)
//Page 2
db.users.find().skip(10).limit(10)
//Page 3
db.users.find().skip(20).limit(10)
........

You get the idea. In general, to retrieve page ‘n’ the code looks like this:

db.users.find().skip(pagesize*(n-1)).limit(pagesize)

However, as the size of your data increases, this approach has serious performance problems. The reason is that every time the query is executed, the full result set is built up, then the server has to walk from the beginning of the collection to the specified offset. As your offset increases, this process gets slower and slower. Also, this process does not make efficient use of the indexes. So typically the ‘skip()’ and ‘limit()’ approach is useful when you have small data sets, and if you’re working with large data sets, you’ll want to consider other approaches.

Approach 2: Using find() and limit()
The reason the previous approach does not scale very well is the skip() command, and the goal in this section is to implement paging without using the ‘skip()’ command. For this, we’re going to leverage the natural order in the stored data like a timestamp or an id stored in the document. In this example, we’re going to use the ‘_id’ stored in each document. ‘_id’ is a MongoDB ObjectID structure which is a 12-byte structure containing a timestamp, machined, processid, counter, etc. The overall idea is as follows:

1. Retrieve the _id of the last document in the current page
2. Retrieve documents greater than this “_id” in the next page

//Page 1
db.users.find().limit(pageSize);
//Find the id of the last document in this page
last_id = ...

//Page 2
users = db.users.find({'_id'> last_id}). limit(10);
//Update the last id with the id of the last document in this page
last_id = ...

This approach leverages the inherent order that exists in the “_id” field. Also, since the “_id” field is indexed by default, the performance of the find operation is very good. If the field you’re using is not indexed, your performance will suffer – so it’s important to make sure that field is indexed.

Additionally, if you’d like your data sorted in a particular order for your paging, then you can also use the sort() clause with the above technique. It’s important to ensure that the sort process is leveraging an index for best performance. You can use the .explain() suffix to your query to determine this:

users = db.users.find({'_id'> last_id}). sort(..).limit(10);
//Update the last id with the id of the last document in this page
last_id = ...

As always if you have any questions or comments please feel free to reach out to us at [email protected] or learn more about our MongoDB solutions at https://scalegrid.io/mongodb.html

Read the full article: https://scalegrid.io/blog/fast-paging-with-mongodb/

Avatar for ScaleGrid

ScaleGrid

December 20, 2018
Tweet

More Decks by ScaleGrid

Other Decks in Technology

Transcript

  1. Fast Paging with MongoDB How to perform one of the

    most common operations with MongoDB Read the Blog Post | Dharshan Rangegowda
  2. Paging through your data is one of the most common

    operations with MongoDB. Read the Fast Paging with MongoDB Blog Database-as-a-Service |
  3. A t ic c a is: The need to display

    your results in chunks in your UI. If you’re batch processing your data, it’s also important to get your paging strategy correct so your data processing can scale. Database-as-a-Service | Read the Fast Paging with MongoDB Blog
  4. Let’s a k ro n e p . We’ll cover

    the two different ways to page through data in MongoDB. For these examples, we have a CRM database of user data that we need to page through and display 10 users at a time. So in effect, our page size is 10. Here is the structure of our user document: Database-as-a-Service | 1 2 3 4 5 6 { _id, name, company, state } Read the Fast Paging with MongoDB Blog
  5. Approach 1: Using skip() and limit() Read the Fast Paging

    with MongoDB Blog Database-as-a-Service |
  6. Usi s () an m () MongoDB natively supports the

    paging operation using the command directives: skip(n) tells MongoDB that it should skip ‘n’ results limit(n) instructs MongoDB that it should limit the result length to ‘n’ results. Database-as-a-Service | 1 2 3 4 5 6 7 //Page 1 db.users.find().limit (10) //Page 2 db.users.find().skip(10).limit(10) //Page 3 db.users.find().skip(20).limit(10) ........ Typically, you’ll be using the skip() and limit() directives with your cursor - but to illustrate the scenario, we provide console commands that would achieve the same results. Also, for brevity of code, the limits-checking code is also excluded: Read the Fast Paging with MongoDB Blog
  7. Yo g he a. In general, to retrieve page ‘n’

    the code looks like this: Database-as-a-Service | 1 db.users.find().skip(pagesize*(n-1)).limit(pagesize) Read the Fast Paging with MongoDB Blog
  8. However, as the size of your data increases, this approach

    has serious performance problems. Read the Fast Paging with MongoDB Blog Database-as-a-Service |
  9. The on h : Every time the query is executed,

    the full result set is built up, then the server has to walk from the beginning of the collection to the specified offset. As your offset increases, this process gets slower and slower. Also, this process does not make efficient use of the indexes. So typically the ‘skip()’ and ‘limit()’ approach is useful when you have small data sets, and if you’re working with large data sets, you’ll want to consider other approaches. Database-as-a-Service | Read the Fast Paging with MongoDB Blog
  10. Approach 2: Using find() and limit() Read the Fast Paging

    with MongoDB Blog Database-as-a-Service |
  11. Usi f () an m () The reason the previous

    approach does not scale very well is the skip() command, and the goal in this section is to implement paging without using the ‘skip()’ command. For this, we’re going to leverage the natural order in the stored data like a timestamp or an id stored in the document. In this example, we’re going to use the ‘_id’ stored in each document. ‘_id’ is a MongoDB ObjectID structure which is a 12-byte structure containing a timestamp, machined, processid, counter, etc. Database-as-a-Service | Read the Fast Paging with MongoDB Blog
  12. The al a s ol s: Database-as-a-Service | skip(n) tells

    MongoDB that it should skip ‘n’ results. limit(n) instructs MongoDB that it should limit the result length to ‘n’ results. 1 2 3 4 5 6 7 8 9 //Page 1 db.users.find().limit(pageSize); //Find the id of the last document in this page last_id = ... //Page 2 users = db.users.find({'_id'> last_id}). limit(10); //Update the last id with the id of the last document in this page last_id = ... Read the Fast Paging with MongoDB Blog
  13. Thi p a h ra : The inherent order that

    exists in the “_id” field. Also, since the “_id” field is indexed by default, the performance of the find operation is very good. If the field you’re using is not indexed, your performance will suffer – so it’s important to make sure that field is indexed. Database-as-a-Service | Read the Fast Paging with MongoDB Blog
  14. Ad i na y, if you’d like your data sorted

    in a particular order for your paging, then you can also use the sort() clause with the above technique. It’s important to ensure that the sort process is leveraging an index for best performance. You can use the .explain() suffix to your query to determine this: Database-as-a-Service | 1 2 3 users = db.users.find({'_id'> last_id}). sort(..).limit(10); //Update the last id with the id of the last document in this page last_id = ... Read the Fast Paging with MongoDB Blog
  15. Tha s r a g Fast Paging with MongoDB You

    may also like: Cassandra vs. MongoDB Which is the Best MongoDB GUI? Using Redis with Node.js & Socket.IO Fully Managed MongoDB Hosting ScaleGrid’s fully managed Database-as-a-Service (DBaaS) solution is used by thousands of developers, startups and enterprise customers, including UPS, Dell and Adobe. Easily manage MongoDB and Redis in the cloud with AWS, Azure, and DigitalOcean, and automate your database operations at any scale so you can focus on product. scalegrid.io | help.scalegrid.io | @scalegridio