Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Full-Text Search: MongoDB vs Elasticsearch

Full-Text Search: MongoDB vs Elasticsearch

Today’s applications are expected to provide powerful full-text search. But how does that work in general and how do I implement it on my site or in my application?

Actually, this is not as hard as it sounds at first. This talk covers:
* How full-text search works in general and what the differences to databases are.
* How the relevancy of documents is being calculated.
* How search works in MongoDB and Elasticsearch as well as what the differences between the two systems are.

Philipp Krenn

October 04, 2016
Tweet

More Decks by Philipp Krenn

Other Decks in Programming

Transcript

  1. { title : "Moby-Dick" , author : "Herman Melville" ,

    published : 1851 , ISBN : 0451526996 , topics : [ "whaling" , "allegory" , "revenge" , "American" , "novel" , "nautical" , "voyage" , "Cape Cod" ] } db.volumes.createIndex({ topics: 1 }) db.volumes.findOne({ topics : "voyage" }, { title: 1 })
  2. FTS in MongoDB Beta since 2.4 Stable since 3.0 "80%

    solution" — for more Elasticsearch
  3. FTS in MongoDB In Latin alphabets Case insensitive (default in

    3.2) [A-z] other characters removed (default in 3.2)
  4. Indexing String or array of strings Optional language or translations

    Optional weight if multiple fields indexed
  5. $text Updated version in MongoDB 3.2 { $text: { $search:

    "<string>", $language: "<string>", $caseSensitive: <boolean>, $diacriticSensitive: <boolean> } }
  6. > db.starwars.getIndices() [ ... { "v": 1, "key": { "_fts":

    "text", "_ftsx": 1 }, "name": "quote_text", "ns": "starwars.starwars", "weights": { "quote": 1 }, "default_language": "english", "language_override": "language", "textIndexVersion": 3 } ]
  7. > db.starwars.insert( { quote: "These are not the droids you

    are looking for." } ) Inserted 1 record(s) in 39ms WriteResult({ "nInserted": 1 })
  8. GET /_analyze { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter":

    [ "lowercase", "stop", "snowball" ], "text": "These are <em>not</em> the droids you are looking for." }
  9. { "tokens": [ { "token": "droid", "start_offset": 27, "end_offset": 33,

    "type": "<ALPHANUM>", "position": 4 }, { "token": "you", "start_offset": 34, "end_offset": 37, "type": "<ALPHANUM>", "position": 5 }, ... ] }
  10. PUT /my_index { "settings": { "analysis": { "filter": { "my_synonym_filter":

    { "type": "synonym", "synonyms": [ "word1,synonym", "word2,synonym" ] } },
  11. "analyzer": { "my_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "standard",

    "filter": [ "lowercase", "stop", "snowball", "my_synonym_filter" ] } } } },
  12. PUT /starwars/quotes/1 { "quote": "These are <em>not</em> the droids you

    are looking for." } PUT /starwars/quotes/2 { "quote": "Obi-Wan never told you what happened to your father." } PUT /starwars/quotes/3 { "quote": "<b>No</b>. I am your father." }
  13. Inverted Index ID 1 ID 2 ID 3 am 0

    0 1[2] droid 1[4] 0 0 father 0 1[9] 1[4] happen 0 1[6] 0 i 0 0 1[1] look 1[7] 0 0 never 0 1[2] 0 obi 0 1[0] 0 told 0 1[3] 0 wan 0 1[1] 0 what 0 1[5] 0 you 1[5] 1[4] 0 your 0 1[8] 1[3]
  14. > db.starwars.insert( { quote: "Obi-Wan never told you what happened

    to your father." } ) > db.starwars.insert( { quote: "No. I am your father." } )
  15. > db.starwars.find({ $text: { $search: "droid" }}) { "_id": ObjectId("57f2d54de814412463c3adef"),

    "quote": "These are not the droids you are looking for." } Fetched 1 record(s) in 35ms
  16. > db.starwars.find({ $text: { $search: "father" }}) { "_id": ObjectId("57f2d56fe814412463c3adf0"),

    "quote": "Obi-Wan never told you what happened to your father." } { "_id": ObjectId("57f2d581e814412463c3adf1"), "quote": "No. I am your father." } Fetched 2 record(s) in 3ms
  17. > db.starwars.find({ $text: { $search: "droid" }}).explain() { "queryPlanner": {

    ... "$text": { "$search": "droid", "$language": "english", "$caseSensitive": false, "$diacriticSensitive": false } },
  18. "winningPlan": { "stage": "TEXT", "indexPrefix": { }, "indexName": "quote_text", "parsedTextQuery":

    { "terms": [ "droid" ], "negatedTerms": [ ], "phrases": [ ], "negatedPhrases": [ ] }, ...
  19. > db.starwars.find({ $text: { $search: "father -obi" }}) { "_id":

    ObjectId("57f2d581e814412463c3adf1"), "quote": "No. I am your father." } Fetched 1 record(s) in 4ms
  20. > db.starwars.find({ $text: { $search: "father -obi" }}).explain() ... "parsedTextQuery":

    { "terms": [ "father" ], "negatedTerms": [ "obi" ], "phrases": [ ], "negatedPhrases": [ ] }, ...
  21. Queries // OR > db.starwars.find({ $text: { $search: "look droid"

    } }) // AND but without input stemming > db.starwars.find({ $text: { $search: "\"look\" \"droid\"" } }) // Negation > db.starwars.find({ $text: { $search: "look -droid" } }) // Phrase > db.starwars.find({ $text: { $search: "\"look droid\"" } }) // Translation > db.starwars.find({ $text: { $search: "suchen", $language: "de" } })
  22. { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.39556286, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "1", "_score": 0.39556286, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } } ] } }
  23. { "took": 14, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.18155496, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 0.18155496, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] } }
  24. > db.starwars.find({ $text: { $search: "droid" }}, {score: {$meta: "textScore"}})

    { "_id": ObjectId("57f2d54de814412463c3adef"), "quote": "These are not the droids you are looking for.", "score": 0.75 } Fetched 1 record(s) in 14ms
  25. Search for droid "These are not the droids you are

    looking for." droid look == 1 match, 2 tokens coeff:
  26. Search for father "Obi-Wan never told you what happened to

    your father." obi wan never told happen father == 1 match, 6 tokens coeff:
  27. > db.starwars.find({ $text: { $search: "obi-wan" }}, {score: {$meta: "textScore"}})

    { "_id": ObjectId("57f2d56fe814412463c3adf0"), "quote": "Obi-Wan never told you what happened to your father.", "score": 1.1666666666666667 } Fetched 1 record(s) in 6ms
  28. Putting it Together score(q,d) = queryNorm(q) · coord(q,d) · ∑

    ( tf(t in d) · idf(t)² · t.getBoost() · norm(t,d) ) (t in q)
  29. "_explanation": { "value": 0.2876821, "description": "weight(quote:father in 0) [PerFieldSimilarity], result

    of:", "details": [ { "value": 0.2876821, "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:", "details": [ { "value": 0.2876821, "description": "idf(docFreq=1, docCount=1)", "details": [] }, ...
  30. Image Credit → Schnitzel https://flic.kr/p/9m27wm → Architecture https://flic.kr/p/6dwCAe → Conchita

    https://flic.kr/p/nBqSHT → Black and grey http://hdimagelib.com/ zedge+quote+wallpapers