Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Indexing and Query Optimization - MongoDB Atlanta 2012 - Robert Stam

mongodb
May 08, 2012
290

Indexing and Query Optimization - MongoDB Atlanta 2012 - Robert Stam

mongodb

May 08, 2012
Tweet

More Decks by mongodb

Transcript

  1. What are indexes? • Lists of values associated with documents

    • Maintained in a tree structure, allowing fast lookup Tuesday, May 8, 12
  2. Why use indexes? • Faster queries • Required for geo

    queries • Enforce unique constraints Tuesday, May 8, 12
  3. Creating an index • index {_id:1} created automatically • db.c.createIndex({x:1})

    • db.c.createIndex({x:-1}) • db.c.createIndex({"x.a":1}) • db.c.createIndex({"x.a":1,"x.b":1}) • db.c.createIndex({loc:"2d"}) • db.c.createIndex({x:1},{unique:true}) • db.c.createIndex({x:1},{background:true}) • db.c.ensureIndex({x:1}) Tuesday, May 8, 12
  4. Unique constraint • db.c.createIndex({x:1},{unique:true}) • don't allow {_id:1,x:2} and {_id:2,x:2}

    • don't allow {_id:1} and {_id:2} • both match {x:null} • db.c.createIndex({x:1},{unique:true, dropDups:true}) • keeps first duplicate in natural order and drops the rest Tuesday, May 8, 12
  5. Impact of creating indexes • can create an index at

    any time • even when a collection already has data • creating an index will block MongoDB • unless you specify {background:true} • even background creation has an impact • recommend you create indexes at off peak times (using a script instead of ensureIndex) Tuesday, May 8, 12
  6. Fast document lookup • db.c.find({_id:2}), using index {_id:1} • db.c.find({x:2}),

    using index {x:1} • db.c.find({x:{$in:[2,3]}}), using index {x:1} • db.c.find({"x.a":2}), using index {"x.a":1} • matches {_id:1,x:{a:2}} and {_id:2,x:{a:2,b:3}} • db.c.find({x:{a.2}}), using index {x:1} • matches {_id:1,x:{a:2}}, but not {_id:2,x:{a:2,b:3}} • What about: • db.c.find({$where:"this.x == 2"}) Tuesday, May 8, 12
  7. Fast document range scan • db.c.find({x:{$gt:2}}), using index {x:1} •

    db.c.find({x:{$gt:2,$lt:5}}), using index {x:1} • db.c.find({x:/^abc/}), using index {x:1} • What about: • db.c.find({x:/abc/}) Tuesday, May 8, 12
  8. Other operations • db.c.count({x:2}), using index {x:1} • db.c.distinct({x:2}), using

    index {x:1} • db.c.update({x:2},{$set:{x:3}}), using index {x:1} • db.c.remove({x:2}), using index {x:1} Tuesday, May 8, 12
  9. Fast result sorting • db.c.find().sort({x:1}), using index {x:1} • db.c.find().sort({x:-1}),

    using index {x:1} • db.c.find({x:{$gt:4}}).sort({x:-1}), using index {x:1} • db.c.find().sort({"x.a":1}), using index {"x.a":1} Tuesday, May 8, 12
  10. Missing fields • db.c.find({x:null}), using index {x:1} • matches {_id:5}

    and {_id:6,x:null} • db.c.find({x:{$exists:false}}), using index {x:1} • matches {_id:5} but not {_id:6,x:null} Tuesday, May 8, 12
  11. Array matching • All of the following match {_id:6,x:[2,10]}, using

    index {x:1} • db.c.find({x:2}) • db.c.find({x:10}) • db.c.find({x:{$gt:5}}) • db.c.find({x:[2,10]}) • db.c.find({x:{$in:[2,5]}}) • What about: • db.c.find({x:{$all:[2,10]}}) • index will be used to lookup all x=2 Tuesday, May 8, 12
  12. Compound indexes • db.c.find({x:2,y:3}), using index {x:1,y:1} • db.c.find({x:2,y:3}), using

    index {x:1,y:-1} • db.c.find({x:{$in:[2,3]}}), using index {x:1,y:1} • db.c.find().sort({x:1,y:1}), using index {x:1,y:1} • db.c.find().sort({x:-1,y:1}), using index {x:1:y-1} • db.c.find({x:2}).sort({y:1}), using index {x:1,y:1} Tuesday, May 8, 12
  13. When indexes don't help • db.c.find({x:{$ne:2}}) • db.c.find({x:{$mod:[10,0]}}) • will

    limit scan to numbers • db.c.find({x:{$not:/a/}}) • db.c.find({x:{$gte:0,$lte:10},y:5}), using index {x:1,y:1} • will scan from {x:0,y:5} to {x:10,y:5} • db.c.find({$where:"this.x == 5"}) Tuesday, May 8, 12
  14. Geospatial indexes • db.c.find({x:[50,50]}), using index {x:"2d"} • db.c.find({x:{$near:[50,50]}}), using

    index {x:"2d"} • results are sorted from nearest to farthest • db.c.find({x:{$within:{$box:[[40,40], [60,60]]}}}), using index {x:"2d"} • db.c.find({x:{$within:{$center:[[50, 50], 10]}}}), using index {x:"2d"} • db.c.find({x:{$near:[50, 50]},y:2}), using index {x:"2d",y:1} Tuesday, May 8, 12
  15. Sparse indexes • db.c.ensureIndex({x:1},{sparse:true}) • only creates index entries for

    documents that have a value for the key • currently key must be a single field • can behave slightly differently • sort only returns documents in the index • {$exists:false} won't find any documents • you can combine sparse with unique to create a unique constraint that ignores missing values Tuesday, May 8, 12
  16. Covering indexes • Any index can potentially be a covering

    index • Whether it is a covering index or not depends on the query and the fields returned • All the fields used in the query AND all the fields returned must be in the index • Often that means you must exclude _id from the results (unless _id is in the index) • Covering indexes are almost always compound indexes • Multikey indexes cannot be covering indexes • Use explain to verify that the index is covering with respect to a query ({indexOnly:true}) Tuesday, May 8, 12
  17. Limits and tradeoffs • max 64 indexes per collection •

    watch out for logically equivalent indexes • {x:1} and {x:-1} • {x:1} and {x:1,unique:true} • indexes improve speed of queries but slow down inserts and updates • more specific indexes can be more helpful than less specific ones • {x:1,y:1} vs {x:1} Tuesday, May 8, 12
  18. Query optimizer • picks which index to use • empirical

    vs cost based • you can force an index to be used • db.c.find({x:2,y:3}).hint({y:1}) • uses {y:1} and ignores {x:1} • you can force a full collection scan • db.c.find({x:2}).hint({$natural:1}) • don't rely on order of results unless you use sort Tuesday, May 8, 12
  19. Explain output (without index) > for (i = 0; i

    < 100000; i++) { db.test.save({i:i}); } > db.test.find({i:99999}).explain() { "cursor" : "BasicCursor", "nscanned" : 100000, "nscannedObjects" : 100000, "n" : 1, "millis" : 78, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } } > Tuesday, May 8, 12
  20. Explain output (with index) > db.c.ensureIndex({i:1}) > db.c.find({i:99999}).explain() { "cursor"

    : "BtreeCursor i_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "i" : [ [ 99999, 99999 ] ] } } Tuesday, May 8, 12
  21. Explain output (with range query) > var query = {i:{$gte:50000,$lt:51000,$mod:[100,0]}}

    > db.c.find(query).explain() { "cursor" : "BtreeCursor i_1", "nscanned" : 1000, "nscannedObjects" : 1000, "n" : 10, "millis" : 10, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "i" : [ [ 50000, 51000 ] ] } } Tuesday, May 8, 12
  22. Explain output (covering index) > db.c.find({i:50000},{_id:0,i:1}) { "i" : 50000

    } > db.c.find({i:50000},{_id:0,i:1}).explain() { "cursor" : "BtreeCursor i_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : true, "indexBounds" : { "i" : [ [ 50000, 50000 ] ] } } Tuesday, May 8, 12
  23. Profiling • db.setProfilingLevel(level) • 0: off, 1: slow, 2: all

    • slow: > 100 ms (configurable) • records performance info in system.profile • query system.profile like any other collection • profiling impacts performance but not severely Tuesday, May 8, 12
  24. Profiling example > db.setProfilingLevel(2) { "was" : 0, "slowms" :

    100, "ok" : 1 } > db.c.find({i:99999}) { "_id" : ObjectId("4e8499f93f80d05429a407c3"), "i" : 99999 } > db.system.profile.find() ... { "ts" : ISODate("2011-09-29T16:32:06.971Z"), "op" : "query", "ns" : "test.c", "query" : { "i" : 99999 }, "nscanned" : 1, "nreturned" : 1, "responseLength" : 53, "millis" : 0, "client" : "127.0.0.1", "user" : "" } > Tuesday, May 8, 12
  25. Antipatterns • lack of indexes • too many indexes •

    unused indexes • redundant indexes • indexes with low selectivity • failure to consider compound indexes • poor understanding of how compound indexes work Tuesday, May 8, 12
  26. Summary • Indexes can speed up queries • Indexes can

    slow down inserts and updates • Experiment: use explain and profiling • Be careful creating indexes on existing large collections • Be careful indexing array fields • Consider using compound indexes Tuesday, May 8, 12
  27. @mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongo>   Facebook  

           |            Twitter        |          LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! http://www.10gen.com/jobs Tuesday, May 8, 12