Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dex the Index Bot

Eric Sedor
September 12, 2012

Dex the Index Bot

A well-indexed query improves performance by several orders of magnitude. The trick is to identify an ideal set of indexes for a particular use case. Even for experts, hand-crawling MongoDB log for slow queries is a laborious process. Introducing Dex: an open-source automated tool for analyzing the slow query log or system.profile collection. Dex's primary author Eric Sedor demonstrates Dex's usage and elaborates on indexing topics from the basic to the advanced. Includes how to pick indexes in an elegant, practical way. You learn how Dex categorizes slow queries and recommends indexes to help keep your application running smoothly. Eric is an engineer at MongoLab, cloud-hoster of MongoDB, where Dex is used daily to optimize customer indexes.

Eric Sedor

September 12, 2012
Tweet

Other Decks in Technology

Transcript

  1. 3 Agenda •  MongoDB index basics •  Indexing tips and

    tricks •  Dex automation •  Dex details and demo •  Extras
  2. 4 Some notable MongoDB fundamentals •  Good performance starts with

    indexes –  you create them; they don’t just happen •  Each query uses at most one index –  so index accordingly •  The query optimizer is empirical –  every so often (~1k writes) MongoDB runs a race between query plans. The first query plan to complete wins.
  3. 5 Proper indexing is critical •  Indexes can improve query

    performance by 2 to 3 orders of magnitude –  1000ms query down to 0ms! •  Bad queries don’t just get in their own way, they get in the way of other things too: –  write lock, queued operations, page faults •  Bad indexing → Memory Apocalypse –  without warning, large portions of your working data topple out of memory and must be page-faulted back
  4. 6 Five key commands db.adventurers.find( {"name" : "Eric","class": "Wizard"}).explain() db.adventurers.getIndexKeys()

    db.adventurers.getIndexes() db.adventurers.ensureIndex({"name": 1, "class": 1}, {"background": true}) db.adventurers.dropIndex({"name": 1, "class": 1})
  5. 7 explain()will reveal a scanAndOrder •  scanAndOrder is almost always

    bad! •  If MongoDB is re-ordering to satisfy a sort clause, explain() includes: { scanAndOrder: true } •  MongoDB sorts documents in-memory! (very expensive!) –  without an index, large result sets are rejected with an error
  6. 9 An index is a b-tree that maps a sequence

    of key values to a list of document pointers * “Ben” “Fighter” “Noble” “Eric” “Engineer” “Wizard” { "name": 1, "class": 1 } name-> class-> the order of the keys really matters!
  7. 10 Index key order determines how the b-tree is constructed

    This ordering of keys influences how: •  applicable an index is to a given query –  a query that doesn't include the first field(s) in the index cannot use the index •  quickly the scope of possible results is pruned –  here is where your data's cardinality weighs in •  documents are sorted in result sets –  did I mention scanAndOrder was bad?
  8. 11 Ordering is tricky and especially important with range operators

    The order of fields in an index should be the: ①  fields on which you will query for exact values ②  fields on which you will sort ③  fields on which you will query for a range of values ($in, $gt, $lt, etc.) Article explaining this topic in detail: bit.ly/mongoindex
  9. 12 Put the range field value last in your index

    carsOwned country carsOwned country document visitation order unsorted 1 2 1 2 MongoDB performs a full, in-memory scanAndOrder { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Charlie”, ... “A”, ... 2 } { ...“Holger”, ... “G”, ... 1 } { ...“Dietmar”, ... “G”, ... 2 } sorted Visitation order satisfies sort; no scanAndOrder needed { ...“Charlie”, ... “A”, ... 2 } { ...“Dietmar”, ... “G”, ... 2 } A G { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Holger”, ... “G”, ... 1 } 1 2 1 A G G 1 2 A optimal index: {carsOwned: 1, country: 1} document visitation order suboptimal index: {country: 1, carsOwned: 1} db.drivers.find({country: {$in: [”A”, “G”]}).sort({carsOwned: 1}) diagram at bit.ly/mongoindex
  10. 13 Put the range field value last in your index

    carsOwned country carsOwned country document visitation order unsorted 1 2 1 2 MongoDB performs a full, in-memory scanAndOrder { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Charlie”, ... “A”, ... 2 } { ...“Holger”, ... “G”, ... 1 } { ...“Dietmar”, ... “G”, ... 2 } sorted Visitation order satisfies sort; no scanAndOrder needed { ...“Charlie”, ... “A”, ... 2 } { ...“Dietmar”, ... “G”, ... 2 } A G { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Holger”, ... “G”, ... 1 } 1 2 1 A G G 1 2 A optimal index: {carsOwned: 1, country: 1} document visitation order suboptimal index: {country: 1, carsOwned: 1} db.drivers.find({country: {$in: [”A”, “G”]}).sort({carsOwned: 1}) diagram at bit.ly/mongoindex
  11. 14 Put the range field value last in your index

    carsOwned country carsOwned country document visitation order unsorted 1 2 1 2 MongoDB performs a full, in-memory scanAndOrder { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Charlie”, ... “A”, ... 2 } { ...“Holger”, ... “G”, ... 1 } { ...“Dietmar”, ... “G”, ... 2 } sorted Visitation order satisfies sort; no scanAndOrder needed { ...“Charlie”, ... “A”, ... 2 } { ...“Dietmar”, ... “G”, ... 2 } A G { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Holger”, ... “G”, ... 1 } 1 2 1 A G G 1 2 A optimal index: {carsOwned: 1, country: 1} document visitation order suboptimal index: {country: 1, carsOwned: 1} db.drivers.find({country: {$in: [”A”, “G”]}).sort({carsOwned: 1}) diagram at bit.ly/mongoindex
  12. 15 Put the range field value last in your index

    carsOwned country carsOwned country document visitation order unsorted 1 2 1 2 MongoDB performs a full, in-memory scanAndOrder { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Charlie”, ... “A”, ... 2 } { ...“Holger”, ... “G”, ... 1 } { ...“Dietmar”, ... “G”, ... 2 } sorted Visitation order satisfies sort; no scanAndOrder needed { ...“Charlie”, ... “A”, ... 2 } { ...“Dietmar”, ... “G”, ... 2 } A G { name: “Bob”, country: “A”, carsOwned: 1 } { ...“Holger”, ... “G”, ... 1 } 1 2 1 A G G 1 2 A optimal index: {carsOwned: 1, country: 1} document visitation order suboptimal index: {country: 1, carsOwned: 1} db.drivers.find({country: {$in: [”A”, “G”]}).sort({carsOwned: 1}) diagram at bit.ly/mongoindex
  13. 17 Be warned if you... •  Use a variety of

    query patterns •  Give the app user control over queries •  Use MongoDB like a relational database •  Have many indexes in each collection
  14. 18 Don’t die the death of a thousand cuts • 

    The most expensive queries are not always the slowest queries. –  50 queries * 20 ms == 1 s That’s 1 second other queries can't use! •  Profile your queries and check the <100ms range for a high volume of expensive but relatively fast queries •  Remember... bad queries don't just get into their own way!
  15. 19 Identify the problematic queries •  Search the log file

    –  logs any query over 100ms •  Use the database profiler ①  Turn it on db.setProfilingLevel(1)logs slow queries db.setProfilingLevel(2)logs all queries (helpful but noisy) ②  Find the slow queries .sort({millis: -1}) .find({ns: "mongoquest.adventurers"}) .find({op: {$in: ["query", "update", "command"]}) ③  Cleanup db.setProfilingLevel(0) db.system.profile.dropCollection()
  16. 20 Here’s a hint() if you have too many indexes

    •  The query optimizer might choose a suboptimal index –  It’s empirical, so it is vulnerable to poor conditions at query time, especially in high-page-fault environments •  Hint your queries to the better index –  db.adventurers.find(…).hint({“myIndex”: 1})
  17. 22 How Dex Works ①  Dex iterates over the input

    (log or profile collection) ②  A LogParser or ProfileParser extracts queries from each line of input. ③  Dex passes the query to a QueryAnalyzer. ④  The QueryAnalyzer compares the query to existing indexes (from left to right) ⑤  If an index meeting Dex's criteria does not already exist, Dex suggests the best index for that query
  18. The Heart of Dex 23 Dex understands that order of

    fields in an index should be: ①  Equivalency checks {a:1} ②  Sorts .sort({b: 1}) ③  Range checks {c: {$in: [1, 2]}}
  19. 24 Using Dex is easy Install using pip: > sudo

    pip install dex Usage: dex [<options>] uri > dex –f my/mongod/data/path/mongodb.log mongodb://myUser:myPass@myHost:12345/myDb > dex –p mongodb://myUser:myPass@myHost:12345/myDb
  20. 26 ... { "index": "{'name': 1, 'class': 1}", "namespace": "mongoquest.adventurers"

    "shellCommand": "db.adventurers.ensureIndex({'name': 1, 'class': 1}, {'background':true})" } ... Total lines read: 7 Understood query lines: 7 Unique recommendations: 5 Lines impacted by recommendations: 5 Example of Dex's output (he's nice enough to include cut-and-paste commands!)
  21. 27 > dex -f my/mongod/data/path/mongodb.log -n "myFirstDb.collectionOne" mongodb:// myUser:myPass@myHost:12345/myFirstDb >

    dex -f my/mongod/data/path/mongodb.log -n "*.collectionOne" mongodb:// myUser:myPass@myHost:12345/admin > dex -f my/mongod/data/path/mongodb.log -n "myFirstDb.*" -n "mySecondDb.*" mongodb://myUser:myPass@myHost:12345/admin Note the auth to the admin db to run against more than one db! The namespace filter (-n)
  22. 28 For when you want current results, not prior results.

    > dex –w -f my/mongod/data/path/mongodb.log mongodb://myUser:myPass@myHost:12345/ myFirstDb > dex –w –p –n "dbname.*" mongodb:// myUser:myPass@myHost:12345/admin Watch mode (-w)
  23. 29 {parsed: ..., namespace: db.adventurers, queryAnalysis: {analyzedFields: [{fieldName: name, fieldType:

    EQUIV}, {fieldName: class, fieldType: EQUIV}, fieldCount: N, supported: true|false}, indexAnalysis: {fullIndexes: [], partialIndexes: [{name: 1}] needsRecommendation: true|false }, recommendation: {namespace: mongoquest.adventurers index: {name: 1, class: 1} shellCommand: db.ensureIndex... } } Dex's guts
  24. 30 Future plans for Dex •  Provide query statistics (aka

    Scoreboard) –  count –  min/max/average nscanned and nreturned –  mix/max/average time consumed –  scanAndOrder true/false •  Improved index recommendations –  set-wise optimization of index fields •  minimize the number of indexes required to cover all of your queries –  order-wise optimization of index fields •  measure cardinality for key ordering •  More command-line options –  filter by time-range •  Support for geospatial queries
  25. 31 Questions? Thank you and good luck out there! [email protected]

    www.github.com/mongolab/dex http://mongolab.org http://blog.mongolab.com/2012/06/introducing-dex-the-index-bot/ http://blog.mongolab.com/2012/07/remote-dex/ http://blog.mongolab.com/2012/06/cardinal-ins/