Slide 1

Slide 1 text

Full-Text Search Internals Philipp Krenn̴̴̴̴@xeraa

Slide 2

Slide 2 text

Who is using databases?

Slide 3

Slide 3 text

Who is using search?

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Ceci n'est pas David Pilato.

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Developer

Slide 9

Slide 9 text

Store

Slide 10

Slide 10 text

Apache Lucene Elasticsearch

Slide 11

Slide 11 text

https://cloud.elastic.co

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

--- version: '2' services: elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:$ELASTIC_VERSION environment: - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - discovery.type=single-node ulimits: memlock: soft: -1 hard: -1 mem_limit: 1g volumes: - esdata1:/usr/share/elasticsearch/data ports: - 9200:9200 kibana: image: docker.elastic.co/kibana/kibana:$ELASTIC_VERSION links: - elasticsearch ports: - 5601:5601 volumes: esdata1: driver: local

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Example These are not the droids you are looking for.

Slide 16

Slide 16 text

html_strip Char Filter These are not the droids you are looking for.

Slide 17

Slide 17 text

standard Tokenizer These̴are̴not̴the̴droids̴you̴are̴ looking̴for

Slide 18

Slide 18 text

lowercase Token Filter these̴are̴not̴the̴droids̴you̴are̴ looking̴for

Slide 19

Slide 19 text

stop Token Filter droids̴you̴looking

Slide 20

Slide 20 text

snowball Token Filter droid̴you̴look

Slide 21

Slide 21 text

Analyze

Slide 22

Slide 22 text

GET /_analyze { "analyzer": "english", "text": "These are not the droids you are looking for." }

Slide 23

Slide 23 text

{ "tokens": [ { "token": "droid", "start_offset": 18, "end_offset": 24, "type": "", "position": 4 }, { "token": "you", "start_offset": 25, "end_offset": 28, "type": "", "position": 5 }, ... ] }

Slide 24

Slide 24 text

GET /_analyze { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "stop", "snowball" ], "text": "These are not the droids you are looking for." }

Slide 25

Slide 25 text

{ "tokens": [ { "token": "droid", "start_offset": 27, "end_offset": 33, "type": "", "position": 4 }, { "token": "you", "start_offset": 34, "end_offset": 37, "type": "", "position": 5 }, ... ] }

Slide 26

Slide 26 text

Stop Words a an and are as at be but by for if in into is it no not of on or such that the their then there these they this to was will with https://github.com/apache/lucene-solr/blob/master/lucene/ core/src/java/org/apache/lucene/analysis/standard/ StandardAnalyzer.java#L44-L50

Slide 27

Slide 27 text

Always Use Stop Words?

Slide 28

Slide 28 text

To be, or not to be.

Slide 29

Slide 29 text

French Ce ne sont pas ces droïdes là que vous recherchez.

Slide 30

Slide 30 text

French droïd̴là̴recherchez

Slide 31

Slide 31 text

French with the English Analyzer ce̴ne̴sont̴pa̴ce̴droïd̴là̴que̴ vou̴recherchez

Slide 32

Slide 32 text

French Stop Words https://github.com/apache/lucene-solr/blob/master/lucene/ analysis/common/src/resources/org/apache/lucene/analysis/ snowball/french_stop.txt

Slide 33

Slide 33 text

Languages Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, CJK, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Latvian, Lithuanian, Norwegian, Persian, Portuguese, Romanian, Russian, Sorani, Spanish, Swedish, Turkish, Thai

Slide 34

Slide 34 text

More Language Plugins Core: ICU (Asian languages), Kuromoji (advanced Japanese), Phonetic, SmartCN, Stempel (better Polish stemming), Ukrainian (stemming) Community: Hebrew, Vietnamese, Network Address Analysis, String2Integer,...

Slide 35

Slide 35 text

Language Rules English: Philipp's → philipp French: l'église → eglis German: äußerst → ausserst

Slide 36

Slide 36 text

Another Example Obi-Wan never told you what happened to your father.

Slide 37

Slide 37 text

Another Example obi̴wan̴never̴told̴you̴what̴ happen̴your̴father

Slide 38

Slide 38 text

Another Example No. I am your father.

Slide 39

Slide 39 text

Another Example i̴am̴your̴father

Slide 40

Slide 40 text

Inverted Index ID 1 ID 2 ID 3 am 0 0 1[2] droid 1[4] 0 0 father 0 1[9] 1[4] happen 0 1[6] 0 i 0 0 1[1] look 1[7] 0 0 never 0 1[2] 0 obi 0 1[0] 0 told 0 1[3] 0 wan 0 1[1] 0 what 0 1[5] 0 you 1[5] 1[4] 0 your 0 1[8] 1[3]

Slide 41

Slide 41 text

To / The Index

Slide 42

Slide 42 text

PUT /starwars { "settings": { "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "synonyms": [ "father,dad", "droid => droid,machine" ] } },

Slide 43

Slide 43 text

"analyzer": { "my_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "stop", "snowball", "my_synonym_filter" ] } } } },

Slide 44

Slide 44 text

"mappings": { "properties": { "quote": { "type": "text", "analyzer": "my_analyzer" } } } }

Slide 45

Slide 45 text

PUT /starwars/_doc/1 { "quote": "These are not the droids you are looking for." } PUT /starwars/_doc/2 { "quote": "Obi-Wan never told you what happened to your father." } PUT /starwars/_doc/3 { "quote": "No. I am your father." }

Slide 46

Slide 46 text

GET /starwars/_doc/1 GET /starwars/_source/1

Slide 47

Slide 47 text

Search

Slide 48

Slide 48 text

POST /starwars/_search { "query": { "match_all": { } } }

Slide 49

Slide 49 text

GET vs POST

Slide 50

Slide 50 text

{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 1, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, ...

Slide 51

Slide 51 text

POST /starwars/_search { "query": { "match": { "quote": "Droid" } } }

Slide 52

Slide 52 text

{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.39556286, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 0.39556286, "_source": { "quote": "These are not the droids you are looking for." } } ] } }

Slide 53

Slide 53 text

POST /starwars/_search { "query": { "match": { "quote": "dad" } } }

Slide 54

Slide 54 text

... "hits": { "total": 2, "max_score": 0.41913947, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.41913947, "_source": { "quote": "No. I am your father." } }, { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.39291072, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] } }

Slide 55

Slide 55 text

POST /starwars/_explain/0 { "query": { "match": { "quote": "dad" } } }

Slide 56

Slide 56 text

{ "_index": "starwars", "_type": "_doc", "_id": "0", "matched": false }

Slide 57

Slide 57 text

POST /starwars/_doc/1/_explain { "query": { "match": { "quote": "dad" } } }

Slide 58

Slide 58 text

{ "_index": "starwars", "_type": "_doc", "_id": "1", "matched": false, "explanation": { "value": 0, "description": "no matching term", "details": [] } }

Slide 59

Slide 59 text

POST /starwars/_doc/2/_explain { "query": { "match": { "quote": "dad" } } }

Slide 60

Slide 60 text

{ "_index": "starwars", "_type": "_doc", "_id": "2", "matched": true, "explanation": { ...

Slide 61

Slide 61 text

POST /starwars/_search { "query": { "match": { "quote": "machine" } } }

Slide 62

Slide 62 text

{ "took": 2, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1.2499592, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 1.2499592, "_source": { "quote": "These are not the droids you are looking for." } } ] } }

Slide 63

Slide 63 text

POST /starwars/_search { "query": { "match_phrase": { "quote": "I am your father" } } }

Slide 64

Slide 64 text

{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1.5665855, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 1.5665855, "_source": { "quote": "No. I am your father." } } ] } }

Slide 65

Slide 65 text

POST /starwars/_search { "query": { "match_phrase": { "quote": { "query": "I am father", "slop": 1 } } } }

Slide 66

Slide 66 text

{ "took": 16, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.8327639, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.8327639, "_source": { "quote": "No. I am your father." } } ] } }

Slide 67

Slide 67 text

POST /starwars/_search { "query": { "match_phrase": { "quote": { "query": "I am not your father", "slop": 1 } } } }

Slide 68

Slide 68 text

{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1.0409548, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 1.0409548, "_source": { "quote": "No. I am your father." } } ] } }

Slide 69

Slide 69 text

POST /starwars/_search { "query": { "match": { "quote": { "query": "van", "fuzziness": "AUTO" } } } }

Slide 70

Slide 70 text

{ "took": 14, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.18155496, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.18155496, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] } }

Slide 71

Slide 71 text

POST /starwars/_search { "query": { "match": { "quote": { "query": "ovi-van", "fuzziness": 1 } } } }

Slide 72

Slide 72 text

{ "took": 109, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.3798467, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.3798467, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] } }

Slide 73

Slide 73 text

FuzzyQuery History http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html Before: Brute force Now: Levenshtein Automaton

Slide 74

Slide 74 text

http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata

Slide 75

Slide 75 text

SELECT * FROM starwars WHERE quote LIKE "?an" OR quote LIKE "V?n" OR quote LIKE "Va?"

Slide 76

Slide 76 text

Scoring

Slide 77

Slide 77 text

Term Frequency / Inverse Document Frequency (TF/IDF) Search one term

Slide 78

Slide 78 text

BM25 Default in Elasticsearch 5.0 https://speakerdeck.com/elastic/improved-text-scoring-with- bm25

Slide 79

Slide 79 text

Term Frequency

Slide 80

Slide 80 text

No content

Slide 81

Slide 81 text

Inverse Document Frequency

Slide 82

Slide 82 text

No content

Slide 83

Slide 83 text

Field-Length Norm

Slide 84

Slide 84 text

POST /starwars/_search?explain=true { "query": { "match": { "quote": "father" } } }

Slide 85

Slide 85 text

... "_explanation": { "value": 0.41913947, "description": "weight(Synonym(quote:dad quote:father) in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 0.41913947, "description": "score(doc=0,freq=2.0 = termFreq=2.0\n), product of:", "details": [ { "value": 0.2876821, "description": "idf(docFreq=1, docCount=1)", "details": [] }, { "value": 1.4569536, "description": "tfNorm, computed from:", "details": [ { "value": 2, "description": "termFreq=2.0", "details": [] }, ...

Slide 86

Slide 86 text

Score 0.41913947: i̴am̴your̴father 0.39291072: obi̴wan̴never̴told̴you̴ what̴happen̴your̴father

Slide 87

Slide 87 text

Vector Space Model Search multiple terms

Slide 88

Slide 88 text

Search your father

Slide 89

Slide 89 text

No content

Slide 90

Slide 90 text

Coordination Factor Reward multiple terms

Slide 91

Slide 91 text

Search for 3 terms 1 term: 2 terms: 3 terms:

Slide 92

Slide 92 text

Practical Scoring Function Putting it all together

Slide 93

Slide 93 text

score(q,d) = queryNorm(q) · coord(q,d) · ∑ ( tf(t in d) · idf(t)² · t.getBoost() · norm(t,d) ) (t in q)

Slide 94

Slide 94 text

Function Score Script, weight, random, field value, decay (geo or date)

Slide 95

Slide 95 text

POST /starwars/_search { "query": { "function_score": { "query": { "match": { "quote": "father" } }, "random_score": {} } } }

Slide 96

Slide 96 text

Compare Scores "100% perfect" vs a "50%" match

Slide 97

Slide 97 text

Don't do this. Seriously. Stop trying to think about your problem this way, it's not going to end well. — https://wiki.apache.org/lucene-java/ ScoresAsPercentages

Slide 98

Slide 98 text

GET /starwars/_analyze { "analyzer" : "my_analyzer", "text": "These are my father's machines." }

Slide 99

Slide 99 text

{ "tokens": [ { "token": "my", "start_offset": 10, "end_offset": 12, "type": "", "position": 2 }, { "token": "father", "start_offset": 13, "end_offset": 21, "type": "", "position": 3 }, { "token": "dad", "start_offset": 13, "end_offset": 21, "type": "SYNONYM", "position": 3 }, { "token": "machin", "start_offset": 22, "end_offset": 30, "type": "", "position": 4 } ] }

Slide 100

Slide 100 text

PUT /starwars/_doc/4 { "quote": "These are my father's machines." }

Slide 101

Slide 101 text

POST /starwars/_search { "query": { "match": { "quote": "my father machine" } } }

Slide 102

Slide 102 text

"hits": { "total": 4, "max_score": 2.92523, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "4", "_score": 2.92523, "_source": { "quote": "These are my father's machines." } }, { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 0.8617505, "_source": { "quote": "These are not the droids you are looking for." } }, ...

Slide 103

Slide 103 text

2.92523 == 100%

Slide 104

Slide 104 text

DELETE /starwars/_doc/4 POST /starwars/_search { "query": { "match": { "quote": "my father machine" } } }

Slide 105

Slide 105 text

"hits": { "total": 3, "max_score": 1.2499592, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 1.2499592, "_source": { "quote": "These are not the droids you are looking for." } }, ...

Slide 106

Slide 106 text

1.2499592 == 43% or 100%?

Slide 107

Slide 107 text

PUT /starwars/_doc/4 { "quote": "These droids are my father's father's machines." } POST /starwars/_search { "query": { "match": { "quote": "my father machine" } } }

Slide 108

Slide 108 text

"hits": { "total": 4, "max_score": 3.0068164, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "4", "_score": 3.0068164, "_source": { "quote": "These droids are my father's father's machines." } }, { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 0.89701396, "_source": { "quote": "These are not the droids you are looking for." } }, ...

Slide 109

Slide 109 text

3.0068164 == 103%?

Slide 110

Slide 110 text

No content

Slide 111

Slide 111 text

Performance

Slide 112

Slide 112 text

No content

Slide 113

Slide 113 text

No content

Slide 114

Slide 114 text

Conclusion

Slide 115

Slide 115 text

Indexing Formatting Tokenize Lowercase, Stop Words, Stemming Synonyms

Slide 116

Slide 116 text

Scoring Term Frequency Inverse Document Frequency Field-Length Norm Vector Space Model

Slide 117

Slide 117 text

No content

Slide 118

Slide 118 text

No content

Slide 119

Slide 119 text

No content

Slide 120

Slide 120 text

Thank You! Questions? Philipp Krenn̴̴̴̴̴@xeraa PS: Stickers

Slide 121

Slide 121 text

The End

Slide 122

Slide 122 text

More

Slide 123

Slide 123 text

POST /starwars/_search { "query": { "match": { "quote": "father" } }, "highlight": { "type": "unified", "pre_tags": [ "" ], "post_tags": [ "" ], "fields": { "quote": {} } } }

Slide 124

Slide 124 text

... "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.41913947, "_source": { "quote": "No. I am your father." }, "highlight": { "quote": [ "No. I am your father." ] } }, ...

Slide 125

Slide 125 text

Boolean Queries must must_not should filter

Slide 126

Slide 126 text

POST /starwars/_search { "query": { "bool": { "must": { "match": { "quote": "father" } }, "should": [ { "match": { "quote": "your" } }, { "match": { "quote": "obi" } } ] } } }

Slide 127

Slide 127 text

... "hits": { "total": 2, "max_score": 0.96268076, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.96268076, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.73245656, "_source": { "quote": "No. I am your father." } } ] } }

Slide 128

Slide 128 text

POST /starwars/_search { "query": { "bool": { "filter": { "match": { "quote": "father" } }, "should": [ { "match": { "quote": "your" } }, { "match": { "quote": "obi" } } ] } } }

Slide 129

Slide 129 text

... "hits": { "total": 2, "max_score": 0.56977004, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.56977004, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.31331712, "_source": { "quote": "No. I am your father." } } ] } }

Slide 130

Slide 130 text

Named Queries & minimum_should_match

Slide 131

Slide 131 text

POST /starwars/_search { "query": { "bool": { "must": { "match": { "quote": "father" } }, "should": [ { "match": { "quote": { "query": "your", "_name": "quote-your" } } }, { "match": { "quote": { "query": "obi", "_name": "quote-obi" } } }, { "match": { "quote": { "query": "droid", "_name": "quote-droid" } } } ], "minimum_should_match": 2 } } }

Slide 132

Slide 132 text

... "hits": { "total": 1, "max_score": 1.8154771, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 1.8154771, "_source": { "quote": "Obi-Wan never told you what happened to your father." }, "matched_queries": [ "quote-obi", "quote-your" ] } ] } }

Slide 133

Slide 133 text

Boosting >1 increase, <1 decrease, <0 punish

Slide 134

Slide 134 text

POST /starwars/_search { "query": { "bool": { "must": { "match": { "quote": "father" } }, "should": [ { "match": { "quote": "your" } }, { "match": { "quote": { "query": "obi", "boost": 3 } } } ] } } }

Slide 135

Slide 135 text

... "hits": { "total": 2, "max_score": 1.5324509, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 1.5324509, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.73245656, "_source": { "quote": "No. I am your father." } } ] } }

Slide 136

Slide 136 text

Suggestion Suggest a similar text _search end point _suggest deprecated since 5.0

Slide 137

Slide 137 text

POST /starwars/_search { "query": { "match": { "quote": "drui" } }, "suggest": { "my_suggestion" : { "text" : "drui", "term" : { "field" : "quote" } } } }

Slide 138

Slide 138 text

... "hits": { "total": 0, "max_score": null, "hits": [] }, "suggest": { "my_suggestion": [ { "text": "drui", "offset": 0, "length": 4, "options": [ { "text": "droid", "score": 0.5, "freq": 1 } ] } ] } }

Slide 139

Slide 139 text

NGram Partial matches Trigram & Edge Gram search_as_you_type

Slide 140

Slide 140 text

GET /_analyze { "char_filter": [ "html_strip" ], "tokenizer": { "type": "ngram", "min_gram": "3", "max_gram": "3", "token_chars": [ "letter" ] }, "filter": [ "lowercase" ], "text": "These are not the droids you are looking for." }

Slide 141

Slide 141 text

{ "tokens": [ { "token": "the", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0 }, { "token": "hes", "start_offset": 1, "end_offset": 4, "type": "word", "position": 1 }, { "token": "ese", "start_offset": 2, "end_offset": 5, "type": "word", "position": 2 }, { "token": "are", "start_offset": 6, "end_offset": 9, "type": "word", "position": 3 }, ...

Slide 142

Slide 142 text

GET /_analyze { "char_filter": [ "html_strip" ], "tokenizer": { "type": "edge_ngram", "min_gram": "1", "max_gram": "3", "token_chars": [ "letter" ] }, "filter": [ "lowercase" ], "text": "These are not the droids you are looking for." }

Slide 143

Slide 143 text

{ "tokens": [ { "token": "t", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "th", "start_offset": 0, "end_offset": 2, "type": "word", "position": 1 }, { "token": "the", "start_offset": 0, "end_offset": 3, "type": "word", "position": 2 }, { "token": "a", "start_offset": 6, "end_offset": 7, "type": "word", "position": 3 }, { "token": "ar", "start_offset": 6, "end_offset": 8, "type": "word", "position": 4 }, ...

Slide 144

Slide 144 text

7.2: search_as_you_type

Slide 145

Slide 145 text

Combining Analyzers Reindex Store multiple times Combine scores

Slide 146

Slide 146 text

PUT /starwars_v42 { "settings": { "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "synonyms": [ "droid,machine", "father,dad" ] }, "my_ngram_filter": { "type": "ngram", "min_gram": "3", "max_gram": "3", "token_chars": [ "letter" ] } },

Slide 147

Slide 147 text

"analyzer": { "my_lowercase_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "whitespace", "filter": [ "lowercase" ] }, "my_full_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "stop", "snowball", "my_synonym_filter" ] },

Slide 148

Slide 148 text

"my_ngram_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "whitespace", "filter": [ "lowercase", "stop", "my_ngram_filter" ] } } } },

Slide 149

Slide 149 text

"mappings": { "properties": { "quote": { "type": "text", "fields": { "lowercase": { "type": "text", "analyzer": "my_lowercase_analyzer" }, "full": { "type": "text", "analyzer": "my_full_analyzer" }, "ngram": { "type": "text", "analyzer": "my_ngram_analyzer" } } } } } }

Slide 150

Slide 150 text

POST /_reindex { "source": { "index": "starwars" }, "dest": { "index": "starwars_v42" } }

Slide 151

Slide 151 text

PUT _alias { "actions": [ { "add": { "index": "starwars_v42", "alias": "starwars_extended" } } ] }

Slide 152

Slide 152 text

Aliases Atomic remove and add Point to multiple indices (read-only)

Slide 153

Slide 153 text

POST /starwars_extended/_search?explain=true { "query": { "multi_match": { "query": "obiwan", "fields": [ "quote", "quote.lowercase", "quote.full", "quote.ngram" ], "type": "most_fields" } } }

Slide 154

Slide 154 text

... "hits": { "total": 1, "max_score": 0.4912064, "hits": [ { "_shard": "[starwars_v42][2]", "_node": "BCDwzJ4WSw2dyoGLTzwlqw", "_index": "starwars_v42", "_type": "_doc", "_id": "2", "_score": 0.4912064, "_source": { "quote": "Obi-Wan never told you what happened to your father." }, ...

Slide 155

Slide 155 text

Whitespace Tokenizer "weight( Synonym(quote.ngram:biw quote.ngram:iwa quote.ngram:obi quote.ngram:wan) in 0) [PerFieldSimilarity], result of:"

Slide 156

Slide 156 text

POST /starwars_extended/_search { "query": { "multi_match": { "query": "you", "fields": [ "quote", "quote.lowercase", "quote.full^5", "quote.ngram" ], "type": "best_fields" } } }

Slide 157

Slide 157 text

"hits": [ { "_index": "starwars_v42", "_type": "_doc", "_id": "1", "_score": 1.6022799, "_source": { "quote": "These are not the droids you are looking for." } }, { "_index": "starwars_v42", "_type": "_doc", "_id": "2", "_score": 1.4997643, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars_v42", "_type": "_doc", "_id": "3", "_score": 0.38650417, "_source": { "quote": "No. I am your father." } } ]

Slide 158

Slide 158 text

Multi Match Type best_fields Score of the best field (default) cross_fields All terms in at least one field most_fields Score sum of all fields phrase

Slide 159

Slide 159 text

Different Analyzers for Indexing and Searching Per query In the mapping

Slide 160

Slide 160 text

POST /starwars_extended/_search { "query": { "match": { "quote.ngram": { "query": "the", "analyzer": "standard" } } } }

Slide 161

Slide 161 text

... "hits": [ { "_index": "starwars_extended", "_type": "_doc", "_id": "2", "_score": 0.38254172, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars_extended", "_type": "_doc", "_id": "3", "_score": 0.36165747, "_source": { "quote": "No. I am your father." } } ] ...

Slide 162

Slide 162 text

Edge Gram vs Trigram Extending a mapping Testing a custom mapping

Slide 163

Slide 163 text

POST /starwars_extended/_close PUT /starwars_extended/_settings { "analysis": { "filter": { "my_edgegram_filter": { "type": "edge_ngram", "min_gram": 3, "max_gram": 10 } }, "analyzer": { "my_edgegram_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "my_edgegram_filter" ] } } } } POST /starwars_extended/_open

Slide 164

Slide 164 text

GET starwars_extended/_analyze { "text": "Father", "analyzer": "my_edgegram_analyzer" }

Slide 165

Slide 165 text

{ "tokens": [ { "token": "fat", "start_offset": 0, "end_offset": 6, "type": "", "position": 0 }, { "token": "fath", "start_offset": 0, "end_offset": 6, "type": "", "position": 0 }, { "token": "fathe", "start_offset": 0, "end_offset": 6, "type": "", "position": 0 }, { "token": "father", "start_offset": 0, "end_offset": 6, "type": "", "position": 0 } ] }

Slide 166

Slide 166 text

PUT /starwars_extended/_mapping { "properties": { "quote": { "type": "text", "fields": { "edgegram": { "type": "text", "analyzer": "my_edgegram_analyzer", "search_analyzer": "standard" } } } } }

Slide 167

Slide 167 text

PUT /starwars_extended/_doc/4 { "quote": "I find your lack of faith disturbing." } PUT /starwars_extended/_doc/5 { "quote": "That... is your failure." }

Slide 168

Slide 168 text

GET /starwars_extended/_termvectors/4 { "fields": [ "quote.edgegram" ], "offsets": true, "payloads": true, "positions": true, "term_statistics": true, "field_statistics": true }

Slide 169

Slide 169 text

{ "_index": "starwars_v42", "_type": "_doc", "_id": "4", "_version": 1, "found": true, "took": 3, "term_vectors": { "quote.edgegram": { "field_statistics": { "sum_doc_freq": 26, "doc_count": 2, "sum_ttf": 26 }, "terms": { "dis": { "doc_freq": 1, "ttf": 1, "term_freq": 1, "tokens": [ { "position": 6, "start_offset": 26, "end_offset": 36 } ] }, "dist": { "doc_freq": 1, "ttf": 1, ...

Slide 170

Slide 170 text

POST /starwars_extended/_search { "query": { "match": { "quote": "fail" } } }

Slide 171

Slide 171 text

POST /starwars_extended/_search { "query": { "match": { "quote.lowercase": "fail" } } }

Slide 172

Slide 172 text

POST /starwars_extended/_search { "query": { "match": { "quote.full": "fail" } } }

Slide 173

Slide 173 text

POST /starwars_extended/_search { "query": { "match": { "quote.ngram": "fail" } } }

Slide 174

Slide 174 text

... "hits": { "total": 2, "max_score": 1.0135446, "hits": [ { "_index": "starwars_v42", "_type": "_doc", "_id": "4", "_score": 1.0135446, "_source": { "quote": "I find your lack of faith disturbing." } }, { "_index": "starwars_v42", "_type": "_doc", "_id": "5", "_score": 0.50476736, "_source": { "quote": "That... is your failure." } } ] ...

Slide 175

Slide 175 text

POST /starwars_extended/_search { "query": { "match": { "quote.edgegram": "fail" } } }

Slide 176

Slide 176 text

... "hits": { "total": 1, "max_score": 0.39556286, "hits": [ { "_index": "starwars_v42", "_type": "_doc", "_id": "5", "_score": 0.39556286, "_source": { "quote": "That... is your failure." } } ] ...