Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

Full-Text Search in MongoDB

Full-Text Search in MongoDB

Taking a look at the inner workings of full-text search in MongoDB 3.2

Philipp Krenn

October 03, 2016
Tweet

More Decks by Philipp Krenn

Other Decks in Programming

Transcript

  1. { title : "Moby-Dick" , author : "Herman Melville" ,

    published : 1851 , ISBN : 0451526996 , topics : [ "whaling" , "allegory" , "revenge" , "American" , "novel" , "nautical" , "voyage" , "Cape Cod" ] } db.volumes.createIndex({ topics: 1 }) db.volumes.findOne({ topics : "voyage" }, { title: 1 })
  2. FTS in MongoDB Beta since 2.4 Stable since 3.0 80%

    solution — for more Elasticsearch
  3. FTS in MongoDB In Latin alphabets Case insensitive (default in

    3.2) [A-z] other characters removed (default in 3.2)
  4. $text Updated version in MongoDB 3.2 { $text: { $search:

    "<string>", $language: "<string>", $caseSensitive: <boolean>, $diacriticSensitive: <boolean> } }
  5. Let's try it > db.starwars.ensureIndex({ quote: "text" }) { "createdCollectionAutomatically":

    true, "numIndexesBefore": 1, "numIndexesAfter": 2, "ok": 1 }
  6. Let's try it > db.starwars.getIndices() [ { "v": 1, "key":

    { "_id": 1 }, "name": "_id_", "ns": "starwars.starwars" }, { "v": 1, "key": { "_fts": "text", "_ftsx": 1 }, "name": "quote_text", "ns": "starwars.starwars", "weights": { "quote": 1 }, "default_language": "english", "language_override": "language", "textIndexVersion": 3 } ]
  7. Let's try it > db.starwars.insert({ quote: "These are not the

    droids you are looking for." }) Inserted 1 record(s) in 39ms WriteResult({ "nInserted": 1 })
  8. Let's try it > db.starwars.find({ $text: { $search: "droid" }})

    { "_id": ObjectId("574c50c3920246255ce2ad81"), "quote": "These are not the droids you are looking for." } Fetched 1 record(s) in 35ms > db.starwars.find({ $text: { $search: "look" }}) { "_id": ObjectId("574c50c3920246255ce2ad81"), "quote": "These are not the droids you are looking for." } Fetched 1 record(s) in 4ms > db.starwars.find({ $text: { $search: "you" }}) Fetched 0 record(s) in 1ms
  9. Let's try it > db.starwars.find({ $text: { $search: "look" }}).explain()

    { "queryPlanner": { "plannerVersion": 1, "namespace": "starwars.starwars", "indexFilterSet": false, "parsedQuery": { "$text": { "$search": "look", "$language": "english", "$caseSensitive": false, "$diacriticSensitive": false } }, ...
  10. Let's try it ... "parsedTextQuery": { "terms": [ "look" ],

    "negatedTerms": [ ], "phrases": [ ], "negatedPhrases": [ ] }, ...
  11. Let's try it > db.starwars.find({ $text: { $search: "-look" }

    }).explain() ... "parsedTextQuery": { "terms": [ ], "negatedTerms": [ "look" ], "phrases": [ ], "negatedPhrases": [ ] }, ...
  12. Let's try it > db.starwars.find({ $text: { $search: "look" }},

    {score: {$meta: "textScore"}}) { "_id": ObjectId("574c50c3920246255ce2ad81"), "quote": "These are not the droids you are looking for.", "score": 0.75 } Fetched 1 record(s) in 8ms > db.starwars.find({ $text: { $search: "looks" }}, {score: {$meta: "textScore"}}) .sort({ score: { $meta: "textScore" } }).limit(1) { "_id": ObjectId("574c50c3920246255ce2ad81"), "quote": "These are not the droids you are looking for.", "score": 0.75 } Fetched 1 record(s) in 5ms
  13. Indexing String or array of strings only Optional language or

    translations Optional weighting if multiple fields indexed
  14. Queries // OR > db.starwars.find({ $text: { $search: "look droid"

    } }) // AND but without input stemming > db.starwars.find({ $text: { $search: "\"look\" \"droid\"" } }) // Negation > db.starwars.find({ $text: { $search: "look -droid" } }) // Phrase > db.starwars.find({ $text: { $search: "\"look droid\"" } }) // Translation > db.starwars.find({ $text: { $search: "buscar", $language: "es" } })
  15. Score > db.starwars.find({ $text: { $search: "father look" } },

    { score: { $meta: "textScore" } }) { "_id": ObjectId("574c83c9920246255ce2ad82"), "quote": "These are not the droids you are looking for", "score": 0.75 } { "_id": ObjectId("574c8712920246255ce2ad83"), "quote": "I am your father", "score": 1 } { "_id": ObjectId("574c8763920246255ce2ad84"), "quote": "Look at me father", "score": 1.5 }
  16. Score father look "These are not the droids you are

    looking for" droid look == 1 match, 2 tokens coeff:
  17. Score father look "Look at me father" look father ==

    1 match, 2 tokens look father == 1 match, 2 tokens coeff: