information retrieval and NLP components. Are there any action movies to see this weekend ? I’d like to reserve a table for dinner. I forgot my password. 3
index for full text search (MySQL, Postgresql) ◦ SQL join ◦ ACID • MongoDB full text search • Elasticsearch ◦ Restful Http API ◦ Json Document (Nested Structure) ◦ Designed for full text search ◦ Various text processing and ranking plugins 10
is fixed, but the no. of replica shards can be changed at any time. • Segments are immutable, so there is no need for locking. When a document is deleted or updated, the old version of the document is only marked as deleted. 11 Reference: https://www.elastic.co/guide/en/elasticsearch/guide/current/inside-a-shard.html
has no concept of inner objects, so Elasticsearch will flatten it into multi-value fields. The association between alice and white is lost. Use Nested Object for arrays of objects. Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html * Nested documents are indexed as separate documents.
TF/IDF, and the vector space model and combines the scores by the formula called the practical scoring function. • Vector space model: ◦ We have three documents and query “happy hippopotamus” ▪ I am happy in summer. ▪ After Christmas I’m a hippopotamus. ▪ The happy hippopotamus helped Harry. 18 Reference: https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html
termvectors or multi termvectors to get tf/idf • The information is only retrieved for the shard the requested document resides in. Reference: https://docs.google.com/presentation/d/1mzotBUwq55Dwio3XipFlZ2DpTEORX_BJZlw9mzCm5O4