Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB DC 2012: Indexing and Query Optimization

mongodb
June 26, 2012
210

MongoDB DC 2012: Indexing and Query Optimization

Robert Stam, 10gen
MongoDB supports a wide range of indexing options to enable fast querying of your data. In this talk we’ll cover how indexing works, the various indexing options, and cover use cases where each might be useful.

mongodb

June 26, 2012
Tweet

Transcript

  1. What are indexes? Lists of values associated with documents Maintained

    in a tree structure, allowing fast lookup • •
  2. Creating an index index {_id:1} created automatically db.c.createIndex({x:1}) db.c.createIndex({x:-1}) db.c.createIndex({"x.a":1})

    db.c.createIndex({"x.a":1,"x.b":1}) db.c.createIndex({loc:"2d"}) db.c.createIndex({x:1},{unique:true}) db.c.createIndex({x:1},{background:true}) db.c.ensureIndex({x:1}) • • • • • • • • •
  3. Unique constraint db.c.createIndex({x:1},{unique:true}) don't allow {_id:1,x:2} and {_id:2,x:2} don't allow

    {_id:1} and {_id:2} both match {x:null} db.c.createIndex({x:1},{unique:true, dropDups:true}) keeps first duplicate in natural order and drops the rest • • • • • •
  4. Impact of creating indexes can create an index at any

    time even when a collection already has data creating an index will block MongoDB unless you specify {background:true} even background creation has an impact recommend you create indexes at off peak times (using a script instead of ensureIndex) • • • • • •
  5. Fast document lookup db.c.find({_id:2}), using index {_id:1} db.c.find({x:2}), using index

    {x:1} db.c.find({x:{$in:[2,3]}}), using index {x:1} db.c.find({"x.a":2}), using index {"x.a":1} matches {_id:1,x:{a:2}} and {_id:2,x:{a:2,b:3}} db.c.find({x:{a:2}}), using index {x:1} matches {_id:1,x:{a:2}}, but not {_id:2,x:{a:2,b:3}} What about: • • • • • • • •
  6. Fast document range scan db.c.find({x:{$gte:2}}), using index {x:1} db.c.find({x:{$gte:2,$lt:5}}), using

    index {x:1} db.c.find({x:/^abc/}), using index {x:1} What about: db.c.find({x:/abc/}) • • • • •
  7. Other operations db.c.count({x:2}), using index {x:1} db.c.distinct("x"), using index {x:1}

    db.c.distinct("y",{x:2}), using index {x:1} db.c.update({x:2},{$set:{x:3}}), using index {x:1} db.c.remove({x:2}), using index {x:1} • • • • •
  8. Fast result sorting db.c.find().sort({x:1}), using index {x:1} db.c.find().sort({x:-1}), using index

    {x:1} db.c.find({x:{$gte:4}}).sort({x:-1}), using index {x:1} db.c.find().sort({"x.a":1}), using index {"x.a":1} • • • •
  9. Missing fields db.c.find({x:null}), using index {x:1} matches {_id:5} and {_id:6,x:null}

    db.c.find({x:{$exists:false}}), using index {x:1} matches {_id:5} but not {_id:6,x:null} • • • •
  10. Array matching All of the following match {_id:6,x:[2,10]}, using index

    {x:1} db.c.find({x:2}) db.c.find({x:10}) db.c.find({x:{$gt:5}}) db.c.find({x:[2,10]}) db.c.find({x:{$in:[2,5]}}) What about: db.c.find({x:{$all:[2,10]}}) index will be used to lookup all x=2 • • • • • • • • •
  11. Compound indexes db.c.find({x:2,y:3}), using index {x:1,y:1} db.c.find({x:2,y:3}), using index {x:1,y:-1}

    db.c.find({x:{$in:[2,3]}}), using index {x:1,y:1} db.c.find().sort({x:1,y:1}), using index {x:1,y:1} db.c.find().sort({x:-1,y:1}), using index {x:1:y- 1} db.c.find({x:2}).sort({y:1}), using index {x:1,y:1} • • • • • •
  12. When indexes don't help db.c.find({x:{$ne:2}}) db.c.find({x:{$mod:[10,0]}}) will limit scan to

    numbers db.c.find({x:{$not:/a/}}) db.c.find({x:{$gte:0,$lte:10},y:5}), using index {x:1,y:1} will scan from {x:0,y:5} to {x:10,y:5} db.c.find({$where:"this.x == 5"}) • • • • • • •
  13. Geospatial indexes db.c.find({x:[50,50]}), using index {x:"2d"} db.c.find({x:{$near:[50,50]}}), using index {x:"2d"}

    results are sorted from nearest to farthest db.c.find({x:{$within:{$box:[[40,40], [60,60]]}}}), using index {x:"2d"} db.c.find({x:{$within:{$center:[[50, 50], 10]}}}), using index {x:"2d"} db.c.find({x:{$near:[50, 50]},y:2}), using index {x:"2d",y:1} • • • • • •
  14. Sparse indexes db.c.ensureIndex({x:1},{sparse:true}) only creates index entries for documents that

    have a value for the key currently key must be a single field can behave slightly differently sort only returns documents in the index {$exists:false} won't find any documents you can combine sparse with unique to create a unique constraint that ignores missing values • • • • • • •
  15. Covering indexes Any index can potentially be a covering index

    Whether it is a covering index or not depends on the query and the fields returned All the fields used in the query AND all the fields returned must be in the index Often that means you must exclude _id from the results (unless _id is in the index) Covering indexes are almost always compound indexes Multikey indexes cannot be covering indexes • • • • • •
  16. Limits and tradeoffs max 64 indexes per collection watch out

    for logically equivalent indexes {x:1} and {x:-1} {x:1} and {x:1,unique:true} indexes improve speed of queries but slow down inserts and updates more specific indexes can be more helpful than less specific ones {x:1,y:1} vs {x:1} • • • • • • •
  17. Query optimizer picks which index to use empirical vs cost

    based you can force an index to be used db.c.find({x:2,y:3}).hint({y:1}) uses {y:1} and ignores {x:1} you can force a full collection scan db.c.find({x:2}).hint({$natural:1}) don't rely on order of results unless you use sort • • • • • • • •
  18. Explain output (without index) > for (i = 0; i

    < 100000; i++) { db.test.save({i:i}); } > db.test.find({i:99999}).explain() { "cursor" : "BasicCursor", "nscanned" : 100000, "nscannedObjects" : 100000, "n" : 1, "millis" : 78, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } } >
  19. Explain output (with index) > db.c.ensureIndex({i:1}) > db.c.find({i:99999}).explain() { "cursor"

    : "BtreeCursor i_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "i" : [ [ 99999, 99999 ] ] } }
  20. Explain output (with range query) > var query = {i:{$gte:50000,$lt:51000,$mod:[100,0]}}

    > db.c.find(query).explain() { "cursor" : "BtreeCursor i_1", "nscanned" : 1000, "nscannedObjects" : 1000, "n" : 10, "millis" : 10, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "i" : [ [ 50000, 51000 ] ] } }
  21. Explain output (covering index) > db.c.find({i:50000},{_id:0,i:1}) { "i" : 50000

    } > db.c.find({i:50000},{_id:0,i:1}).explain() { "cursor" : "BtreeCursor i_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : true, "indexBounds" : { "i" : [ [ 50000, 50000 ] ] } }
  22. Profiling db.setProfilingLevel(level) 0: off, 1: slow, 2: all slow: >

    100 ms (configurable) records performance info in system.profile query system.profile like any other collection profiling impacts performance but not severely • • • • • •
  23. Profiling example > db.setProfilingLevel(2) { "was" : 0, "slowms" :

    100, "ok" : 1 } > db.c.find({i:99999}) { "_id" : ObjectId("4e8499f93f80d05429a407c3"), "i" : 99999 } > db.system.profile.find() ... { "ts" : ISODate("2011-09-29T16:32:06.971Z"), "op" : "query", "ns" : "test.c", "query" : { "i" : 99999 }, "nscanned" : 1, "nreturned" : 1, "responseLength" : 53, "millis" : 0, "client" : "127.0.0.1", "user" : "" } >
  24. Antipatterns lack of indexes too many indexes unused indexes redundant

    indexes indexes with low selectivity failure to consider compound indexes poor understanding of how compound indexes work • • • • • • •
  25. Summary Indexes can speed up queries Indexes can slow down

    inserts and updates Experiment: use explain and profiling Be careful creating indexes on existing large collections Be careful indexing array fields Consider using compound indexes • • • • • •