Slide 1

Slide 1 text

MongoDC 2012 Indexing and Query Optimization Robert Stam [email protected]

Slide 2

Slide 2 text

What are indexes? Lists of values associated with documents Maintained in a tree structure, allowing fast lookup • •

Slide 3

Slide 3 text

Why use indexes? Faster queries Required for geo queries Enforce unique constraints • • •

Slide 4

Slide 4 text

Creating an index index {_id:1} created automatically db.c.createIndex({x:1}) db.c.createIndex({x:-1}) db.c.createIndex({"x.a":1}) db.c.createIndex({"x.a":1,"x.b":1}) db.c.createIndex({loc:"2d"}) db.c.createIndex({x:1},{unique:true}) db.c.createIndex({x:1},{background:true}) db.c.ensureIndex({x:1}) • • • • • • • • •

Slide 5

Slide 5 text

Unique constraint db.c.createIndex({x:1},{unique:true}) don't allow {_id:1,x:2} and {_id:2,x:2} don't allow {_id:1} and {_id:2} both match {x:null} db.c.createIndex({x:1},{unique:true, dropDups:true}) keeps first duplicate in natural order and drops the rest • • • • • •

Slide 6

Slide 6 text

Impact of creating indexes can create an index at any time even when a collection already has data creating an index will block MongoDB unless you specify {background:true} even background creation has an impact recommend you create indexes at off peak times (using a script instead of ensureIndex) • • • • • •

Slide 7

Slide 7 text

Managing indexes db.c.getIndexes() db.c.dropIndex({x:1}) db.c.dropIndexes() db.c.reIndex() • • • •

Slide 8

Slide 8 text

Fast document lookup db.c.find({_id:2}), using index {_id:1} db.c.find({x:2}), using index {x:1} db.c.find({x:{$in:[2,3]}}), using index {x:1} db.c.find({"x.a":2}), using index {"x.a":1} matches {_id:1,x:{a:2}} and {_id:2,x:{a:2,b:3}} db.c.find({x:{a:2}}), using index {x:1} matches {_id:1,x:{a:2}}, but not {_id:2,x:{a:2,b:3}} What about: • • • • • • • •

Slide 9

Slide 9 text

Fast document range scan db.c.find({x:{$gte:2}}), using index {x:1} db.c.find({x:{$gte:2,$lt:5}}), using index {x:1} db.c.find({x:/^abc/}), using index {x:1} What about: db.c.find({x:/abc/}) • • • • •

Slide 10

Slide 10 text

Other operations db.c.count({x:2}), using index {x:1} db.c.distinct("x"), using index {x:1} db.c.distinct("y",{x:2}), using index {x:1} db.c.update({x:2},{$set:{x:3}}), using index {x:1} db.c.remove({x:2}), using index {x:1} • • • • •

Slide 11

Slide 11 text

Fast result sorting db.c.find().sort({x:1}), using index {x:1} db.c.find().sort({x:-1}), using index {x:1} db.c.find({x:{$gte:4}}).sort({x:-1}), using index {x:1} db.c.find().sort({"x.a":1}), using index {"x.a":1} • • • •

Slide 12

Slide 12 text

Missing fields db.c.find({x:null}), using index {x:1} matches {_id:5} and {_id:6,x:null} db.c.find({x:{$exists:false}}), using index {x:1} matches {_id:5} but not {_id:6,x:null} • • • •

Slide 13

Slide 13 text

Array matching All of the following match {_id:6,x:[2,10]}, using index {x:1} db.c.find({x:2}) db.c.find({x:10}) db.c.find({x:{$gt:5}}) db.c.find({x:[2,10]}) db.c.find({x:{$in:[2,5]}}) What about: db.c.find({x:{$all:[2,10]}}) index will be used to lookup all x=2 • • • • • • • • •

Slide 14

Slide 14 text

Compound indexes db.c.find({x:2,y:3}), using index {x:1,y:1} db.c.find({x:2,y:3}), using index {x:1,y:-1} db.c.find({x:{$in:[2,3]}}), using index {x:1,y:1} db.c.find().sort({x:1,y:1}), using index {x:1,y:1} db.c.find().sort({x:-1,y:1}), using index {x:1:y- 1} db.c.find({x:2}).sort({y:1}), using index {x:1,y:1} • • • • • •

Slide 15

Slide 15 text

When indexes don't help db.c.find({x:{$ne:2}}) db.c.find({x:{$mod:[10,0]}}) will limit scan to numbers db.c.find({x:{$not:/a/}}) db.c.find({x:{$gte:0,$lte:10},y:5}), using index {x:1,y:1} will scan from {x:0,y:5} to {x:10,y:5} db.c.find({$where:"this.x == 5"}) • • • • • • •

Slide 16

Slide 16 text

Geospatial indexes db.c.find({x:[50,50]}), using index {x:"2d"} db.c.find({x:{$near:[50,50]}}), using index {x:"2d"} results are sorted from nearest to farthest db.c.find({x:{$within:{$box:[[40,40], [60,60]]}}}), using index {x:"2d"} db.c.find({x:{$within:{$center:[[50, 50], 10]}}}), using index {x:"2d"} db.c.find({x:{$near:[50, 50]},y:2}), using index {x:"2d",y:1} • • • • • •

Slide 17

Slide 17 text

Sparse indexes db.c.ensureIndex({x:1},{sparse:true}) only creates index entries for documents that have a value for the key currently key must be a single field can behave slightly differently sort only returns documents in the index {$exists:false} won't find any documents you can combine sparse with unique to create a unique constraint that ignores missing values • • • • • • •

Slide 18

Slide 18 text

Covering indexes Any index can potentially be a covering index Whether it is a covering index or not depends on the query and the fields returned All the fields used in the query AND all the fields returned must be in the index Often that means you must exclude _id from the results (unless _id is in the index) Covering indexes are almost always compound indexes Multikey indexes cannot be covering indexes • • • • • •

Slide 19

Slide 19 text

Limits and tradeoffs max 64 indexes per collection watch out for logically equivalent indexes {x:1} and {x:-1} {x:1} and {x:1,unique:true} indexes improve speed of queries but slow down inserts and updates more specific indexes can be more helpful than less specific ones {x:1,y:1} vs {x:1} • • • • • • •

Slide 20

Slide 20 text

Query optimizer picks which index to use empirical vs cost based you can force an index to be used db.c.find({x:2,y:3}).hint({y:1}) uses {y:1} and ignores {x:1} you can force a full collection scan db.c.find({x:2}).hint({$natural:1}) don't rely on order of results unless you use sort • • • • • • • •

Slide 21

Slide 21 text

Explain output (without index) > for (i = 0; i < 100000; i++) { db.test.save({i:i}); } > db.test.find({i:99999}).explain() { "cursor" : "BasicCursor", "nscanned" : 100000, "nscannedObjects" : 100000, "n" : 1, "millis" : 78, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } } >

Slide 22

Slide 22 text

Explain output (with index) > db.c.ensureIndex({i:1}) > db.c.find({i:99999}).explain() { "cursor" : "BtreeCursor i_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "i" : [ [ 99999, 99999 ] ] } }

Slide 23

Slide 23 text

Explain output (with range query) > var query = {i:{$gte:50000,$lt:51000,$mod:[100,0]}} > db.c.find(query).explain() { "cursor" : "BtreeCursor i_1", "nscanned" : 1000, "nscannedObjects" : 1000, "n" : 10, "millis" : 10, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "i" : [ [ 50000, 51000 ] ] } }

Slide 24

Slide 24 text

Explain output (covering index) > db.c.find({i:50000},{_id:0,i:1}) { "i" : 50000 } > db.c.find({i:50000},{_id:0,i:1}).explain() { "cursor" : "BtreeCursor i_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : true, "indexBounds" : { "i" : [ [ 50000, 50000 ] ] } }

Slide 25

Slide 25 text

Profiling db.setProfilingLevel(level) 0: off, 1: slow, 2: all slow: > 100 ms (configurable) records performance info in system.profile query system.profile like any other collection profiling impacts performance but not severely • • • • • •

Slide 26

Slide 26 text

Profiling example > db.setProfilingLevel(2) { "was" : 0, "slowms" : 100, "ok" : 1 } > db.c.find({i:99999}) { "_id" : ObjectId("4e8499f93f80d05429a407c3"), "i" : 99999 } > db.system.profile.find() ... { "ts" : ISODate("2011-09-29T16:32:06.971Z"), "op" : "query", "ns" : "test.c", "query" : { "i" : 99999 }, "nscanned" : 1, "nreturned" : 1, "responseLength" : 53, "millis" : 0, "client" : "127.0.0.1", "user" : "" } >

Slide 27

Slide 27 text

Antipatterns lack of indexes too many indexes unused indexes redundant indexes indexes with low selectivity failure to consider compound indexes poor understanding of how compound indexes work • • • • • • •

Slide 28

Slide 28 text

Summary Indexes can speed up queries Indexes can slow down inserts and updates Experiment: use explain and profiling Be careful creating indexes on existing large collections Be careful indexing array fields Consider using compound indexes • • • • • •

Slide 29

Slide 29 text

@mongodb conferences,.appearances,.and.meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook.....|......Twitter....|.....LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! http://www.10gen.com/jobs