Dex the Index Robot

1 Eric Sedor Index Automation and Dex December 2012

3 Agenda •  MongoDB index basics •  Indexing tips and
tricks •  Dex automation •  Dex details and demo •  Extras

4 Some notable MongoDB fundamentals •  Good performance starts with
indexes –  you create them; they don’t just happen •  Each query uses at most one index –  so index accordingly •  The query optimizer is empirical –  every so often (~1k writes) MongoDB runs a race between query plans. The first query plan to complete wins. –  query plans are also re-run after certain changes to a collection (such as adding an index).

5 Proper indexing is critical •  Indexes can improve query
performance by 2 to 3 orders of magnitude –  1000ms query down to <1ms! •  Bad queries don’t just get in their own way, they get in the way of other things too: –  write lock, queued operations, page faults •  Bad indexing → Memory Apocalypse –  without warning, large portions of your working data topple out of memory and must be page-faulted back

6 Five key commands db.adventurers.find( {"name" : "Eric", "class": "Wizard"}).explain()
db.adventurers.getIndexKeys() db.adventurers.getIndexes() db.adventurers.ensureIndex({"name": 1, "class": 1}, {"background": true}) db.adventurers.dropIndex({"name": 1, "class": 1})

7 explain()will reveal a scanAndOrder •  scanAndOrder is almost always
bad! •  If MongoDB is re-ordering to satisfy a sort clause, explain() includes: { scanAndOrder: true } •  MongoDB sorts documents in-memory! (very expensive!) –  without an index, large result sets are rejected with an error

8 Know Thy B-Tree

9 An index is a b-tree that maps a sequence
of key values to a list of document pointers * “Ben” “Fighter” “Noble” “Eric” “Engineer” “Wizard” { "name": 1, "class": 1 } name-> class-> the order of the keys really matters!

10 Index key order determines how the b-tree is constructed
This ordering of keys influences how: •  applicable an index is to a given query –  a query that doesn't include the first field(s) in the index cannot use the index •  quickly the scope of possible results is pruned –  here is where your data's cardinality weighs in •  documents are sorted in result sets –  did I mention scanAndOrder was bad?

11 Ordering is tricky and especially important with range operators
The order of fields in an index should be the: ①  fields on which you will query for exact values ②  fields on which you will sort ③  fields on which you will query for a range of values ($in, $gt, $lt, etc.) Article explaining this topic in detail: bit.ly/mongoindex

12 Put the range field value last in your index
carsOwned country carsOwned country document visitation order unsorted 1 2 1 2 MongoDB performs a full, in-memory scanAndOrder { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Charlie”, ... “A”, ... 2 } { ...“Holger”, ... “G”, ... 1 } { ...“Dietmar”, ... “G”, ... 2 } sorted Visitation order satisfies sort; no scanAndOrder needed { ...“Charlie”, ... “A”, ... 2 } { ...“Dietmar”, ... “G”, ... 2 } A G { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Holger”, ... “G”, ... 1 } 1 2 1 A G G 1 2 A optimal index: {carsOwned: 1, country: 1} document visitation order suboptimal index: {country: 1, carsOwned: 1} db.drivers.find({country: {$in: [”A”, “G”]}).sort({carsOwned: 1}) diagram at bit.ly/mongoindex

16 Slow Hell (like normal hell only slower) What do
we do?

17 Be warned if you... •  Use a variety of
query patterns •  Give the app user control over queries •  Use MongoDB like a relational database •  Have many indexes in each collection

18 Don’t die the death of a thousand cuts • 
The most expensive queries are not always the slowest queries. –  50 queries * 20 ms == 1 s That’s 1 second other queries can't use! •  Profile your queries and check the <100ms range for a high volume of expensive but relatively fast queries •  Remember... bad queries don't just get into their own way!

19 Identify the problematic queries •  Search the log file
–  logs any query over 100ms •  Use the database profiler ①  Turn it on db.setProfilingLevel(1)logs slow queries db.setProfilingLevel(2)logs all queries (helpful but noisy) ②  Find the slow queries .sort({millis: -1}) .find({ns: "mongoquest.adventurers"}) .find({op: {$in: ["query", "update", "command"]}) ③  Cleanup db.setProfilingLevel(0) db.system.profile.dropCollection()

20 Here’s a hint() if you have too many indexes
•  The query optimizer might choose a suboptimal index –  It’s empirical, so it is vulnerable to poor conditions at query time, especially in high-page-fault environments •  Hint your queries to the better index –  db.adventurers.find(…).hint({“myIndex”: 1})

21 Introducing...

22 How Dex Works ①  Dex iterates over the input
(log or profile collection) ②  A LogParser or ProfileParser extracts queries from each line of input. ③  Dex passes the query to a QueryAnalyzer. ④  The QueryAnalyzer compares the query to existing indexes (from left to right) ⑤  If an index meeting Dex's criteria does not already exist, Dex suggests the best index for that query

The Heart of Dex 23 Dex understands that order of
fields in an index should be: ①  Equivalency checks {a:1} ②  Sorts .sort({b: 1}) ③  Range checks {c: {$in: [1, 2]}}

24 Using Dex is easy Install using pip: > sudo
pip install dex Usage: dex [<options>] uri > dex –f my/mongod/data/path/mongodb.log mongodb://myUser:myPass@myHost:12345/myDb > dex –p mongodb://myUser:myPass@myHost:12345/myDb

25 Demo

26 "runStats": { "linesRecommended": 76, "linesProcessed": 76, "linesPassed": 93 },
"results": [ { "index": "{'name': 1}", "queryCount": 19, "namespace": "mongoquest.adventurers", "totalTimeMillis": 3977, "avgTimeMillis": 209 }, ... Example of Dex's output (use –v for a shell command!)

27 > dex -f my/mongod/data/path/mongodb.log -n "myFirstDb.collectionOne" mongodb:// myUser:myPass@myHost:12345/myFirstDb >
dex -f my/mongod/data/path/mongodb.log -n "*.collectionOne" mongodb:// myUser:myPass@myHost:12345/admin > dex -f my/mongod/data/path/mongodb.log -n "myFirstDb.*" -n "mySecondDb.*" mongodb://myUser:myPass@myHost:12345/admin Note the auth to the admin db to run against more than one db! The namespace filter (-n)

28 For when you want current results, not prior results.
> dex –w -f my/mongod/data/path/mongodb.log mongodb://myUser:myPass@myHost:12345/ myFirstDb > dex –w –p –n "dbname.*" mongodb:// myUser:myPass@myHost:12345/admin Watch mode (-w)

29 {parsed: ..., namespace: db.adventurers, queryAnalysis: {analyzedFields: [{fieldName: name, fieldType:
EQUIV}, {fieldName: class, fieldType: EQUIV}, fieldCount: N, supported: true|false}, indexAnalysis: {fullIndexes: [], partialIndexes: [{name: 1}] needsRecommendation: true|false }, recommendation: {namespace: mongoquest.adventurers index: {name: 1, class: 1} shellCommand: db.ensureIndex... } } Dex's guts

30 Future plans for Dex •  More query statistics (aka
Scoreboard) –  min/max/average nscanned and nreturned –  scanAndOrder true/false •  Improved index recommendations –  set-wise optimization of index fields •  minimize the number of indexes required to cover all of your queries –  order-wise optimization of index fields •  measure cardinality for key ordering •  More command-line options –  filter by time-range •  Support for geospatial queries, map/reduce cond clauses, and aggregation framework

31 Would you like to know more? •  Microsoft Azure
at 11:30 •  NodeStack at 2:30 •  mongolab.org (mongoctl!)

32 PS We’re hiring!

33 Questions? Thank you and good luck out there! [email protected]
www.github.com/mongolab/dex http://mongolab.org http://blog.mongolab.com/2012/06/introducing-dex-the-index-bot/ http://blog.mongolab.com/2012/07/remote-dex/ http://blog.mongolab.com/2012/06/cardinal-ins/

Dex the Index Robot

Dex the Index Robot

MongoLab

Other Decks in Technology

Featured

Transcript

1 Eric Sedor Index Automation and Dex December 2012

3 Agenda •  MongoDB index basics •  Indexing tips and

4 Some notable MongoDB fundamentals •  Good performance starts with

5 Proper indexing is critical •  Indexes can improve query

6 Five key commands db.adventurers.find( {"name" : "Eric", "class": "Wizard"}).explain()

7 explain()will reveal a scanAndOrder •  scanAndOrder is almost always

8 Know Thy B-Tree

9 An index is a b-tree that maps a sequence

10 Index key order determines how the b-tree is constructed

11 Ordering is tricky and especially important with range operators

12 Put the range field value last in your index

13 Put the range field value last in your index

14 Put the range field value last in your index

15 Put the range field value last in your index

16 Slow Hell (like normal hell only slower) What do

17 Be warned if you... •  Use a variety of

18 Don’t die the death of a thousand cuts •

19 Identify the problematic queries •  Search the log file

20 Here’s a hint() if you have too many indexes

21 Introducing...

22 How Dex Works ①  Dex iterates over the input

The Heart of Dex 23 Dex understands that order of

24 Using Dex is easy Install using pip: > sudo

25 Demo

26 "runStats": { "linesRecommended": 76, "linesProcessed": 76, "linesPassed": 93 },

27 > dex -f my/mongod/data/path/mongodb.log -n "myFirstDb.collectionOne" mongodb:// myUser:myPass@myHost:12345/myFirstDb >

28 For when you want current results, not prior results.

29 {parsed: ..., namespace: db.adventurers, queryAnalysis: {analyzedFields: [{fieldName: name, fieldType:

30 Future plans for Dex •  More query statistics (aka

31 Would you like to know more? •  Microsoft Azure

32 PS We’re hiring!

33 Questions? Thank you and good luck out there! [email protected]