Slide 1

Slide 1 text

explain() BTreeCursor Locality

Slide 2

Slide 2 text

Indexing for Fun and Profit Richard Kreuter MongoNYC 2012 2

Slide 3

Slide 3 text

What’s in store Indexing is important! Understanding range-based indexing Working with indexes in MongoDB • • • 3

Slide 4

Slide 4 text

Indexes are the single biggest tunable performance factor in MongoDB.

Slide 5

Slide 5 text

Absent or suboptimal indexes are the most common avoidable MongoDB performance problem.

Slide 6

Slide 6 text

So what problem do indexes solve?

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

How do you find a chicken recipe? An unindexed cookbook might be quite a page turner. Probably not what you want, though. • •

Slide 9

Slide 9 text

I know, I’ll use an index!

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Let’s imagine a simple index ingredient page aardvark 790 ... ... beef 190, 191, 205, ... ... ... chicken 182, 199, 200, ... chorizo 497, ... ... ... zucchini 673, 986, ...

Slide 12

Slide 12 text

How do you find a quick chicken recipe?

Slide 13

Slide 13 text

Let’s imagine a compound index ingredient cooking time page ... ... ... chicken 15 min 182, 200 chicken 25 min 199 chicken 30 min 289,316,320 chicken 45 min 290, 291, 354 ... ... ...

Slide 14

Slide 14 text

Consider the ordering of index keys Chicken, 15 min Chicken, 25 min Chicken, 30 min Chicken, 45 min

Slide 15

Slide 15 text

How about a low-calorie chicken recipe?

Slide 16

Slide 16 text

Let’s imagine a 2nd compound index ingredient calories page ... ... ... chicken 250 199, 316 chicken 300 289,291 chicken 425 320 ... ... ...

Slide 17

Slide 17 text

How about a quick, low-calorie recipe?

Slide 18

Slide 18 text

Let’s imagine a last compound index calories cooking time page ... ... ... 250 25 min 199 250 30 min 316 300 25 min 289 300 45 min 291 425 30 min 320 ... ... ... How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes?

Slide 19

Slide 19 text

Consider the ordering of index keys 250 cal, 25 min 250 cal, 30 min 300 cal, 25 min 300 cal, 45 min How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes? 4 index entries will be scanned, but only 1 will match! 425 cal, 30 min

Slide 20

Slide 20 text

Range queries using an index on A, B A is a range ✔ A is constant, B is a range ✔ A is constant, order by B ✔ A is range, B is constant/range ✖ B is constant/range, A unspecified ✖✖ • • • • •

Slide 21

Slide 21 text

It’s really that straightforward.

Slide 22

Slide 22 text

On to MongoDB!

Slide 23

Slide 23 text

All this is relevant to MongoDB. MongoDB’s indexes are B-Trees, which are designed for range queries. Generally, the best index for your queries is going to be a compound index. • •

Slide 24

Slide 24 text

Key info about MongoDB’s indexes A collection may have at most 64 indexes. Almost all queries can use just 1 index ($and/$or queries are the exception). Every additional index slows down inserts & removes, and may slow updates. The maximum index key size is 1024 bytes. • • • •

Slide 25

Slide 25 text

When are indexes applicable? An index on x gets used most places you’d expect constant-value queries on x, range queries on x, $in expressions on x count, distinct, update, findAndModify that select on x, regular expressions similar to /^abc.*/ • • • • • • •

Slide 26

Slide 26 text

But indexes aren’t used sometimes Most negations: $not, $nin, $ne A few other corner cases: $mod, $where Additionally, matching most regular expressions involves scaning all index keys (cf. /a/ or /foo/i). • • •

Slide 27

Slide 27 text

Indexes do special things with arrays { title : “Chicken and Rice”, ingredients : [ “chicken”, “rice” ] } Insert ingredients page chicken 42 ... ... rice 42 ... ... [ “chicken”, “rice”] 42 “MultiKey” Index on ingredients

Slide 28

Slide 28 text

Getting a query’s plan db.rec.find({t:{$lt: 40}}).explain() { "cursor" : "BtreeCursor t_1", ... "nscanned" : 42, ... "n" : 42, "millis" : 0, ... } Pay attention to the ratio n/nscanned!

Slide 29

Slide 29 text

Operational aspects of index builds Building indexes is easy! db.collection.ensureIndex({ ingredient : 1, cookingTime : 1 }) Building indexes is hard! Read through all docs, sort all index keys, write out sorted tables... usually takes a while. You should schedule index builds carefully. • • • • •

Slide 30

Slide 30 text

Rolling out an index build for (s in secondaries) s.restartAsStandalone() s.buildIndex() s.restartAsReplSetMember() s.waitForCatchup() p.stepDown() p.restartAsStandalone() p.buildIndex() p.restartAsReplSetMember()

Slide 31

Slide 31 text

Absent or suboptimal indexes are the most common avoidable MongoDB performance problem... ...so take some time and get your indexes right!

Slide 32

Slide 32 text

To the future! MongoDB will support “index intersection”. This might make mongod’s index selection a bit less predictable, however. It will still be important to construct the right indexes! • • •

Slide 33

Slide 33 text

10gen is hiring! http://www.10gen.com/jobs