defined as JSON • The elasticsearch document is always converted to a Lucene document. Lucene Document is just key value pairs • Both the nested and parent-child support are just tools for document design.
"Elasticsearch" "summary" : "The definitive guide for Elasticsearch ..." "published_year" : 2014, "num_pages" : 289, "author" : ["Clinton Gormley", "Zachary Tong"], "categories" : ["programming", "information retrieval"], }' • But how to add data to it that is related? Like chapter or page data.
for Elasticsearch ..." "book_num_pages" : 289, "chapter_title" : "Introduction", "chapter_text" : "Short introduction about Elasticsearch’s features ...", "chapter_number_of_pages" : 12 }' • Lets add a book with chapters data. Each chapter as separate document with the book data. { "book_title" : "Elasticsearch" "book_summary" : "The definitive guide for Elasticsearch ..." "book_num_pages" : 289, "chapter_title" : "Data in, Data out", "chapter_text" : "How to manage your data with Elasticsearch ...", "chapter_num_pages" : 39 }' Document 1: Document 2:
for Elasticsearch ..." "num_pages" : 289, }' • Lets add a book with chapters data. Both book and chapters as separate documents and do a query time join. { "chapter_title" : "Data in, Data out", "chapter_text" : "How to manage your data with Elasticsearch ...", "chapter_num_pages" : 39 }' Document 1: Document 2: { "title" : "Introduction", "text" : "Short introduction about Elasticsearch’s features ...", "number_of_pages" : 12 } Document 2: Document 3:
join between different document types in the same index. • Parent and children documents are stored as separate documents in the same index. • Child documents can point to only one parent. • Parent documents can be referred by multiple child documents. • Also a parent document can be a child document of a different parent.
are routed into the same shard. Parent id is used as routing value. • In combination with a parent ids in memory data structure the parent-child join is fast.
to exist at time of indexing. curl -XPUT 'localhost:9200/hotels' -d '{ "mappings" : { "price" : { "_parent" : { "type" : "hotel" } } } }' A price document is a parent of a hotel document curl -XPUT 'localhost:9200/products/offer/12?parent=2345' -d '{ "valid_from" : "2015-05-01", "valid_to" : "2015-10-01", "price" : 125, }' Then when indexing mention to what hotel a price points to.
nested and parent / child queries. • Deals with the limitation that has_child query can only return parent documents. • Inner hits is cheap, it only runs in the fetch phase and only includes hits of the top N hits being returned.
"categories" : ["programming", "information retrieval"], "published_year" : 2014, "summary" : "The definitive guide for Elasticsearch ...", "chapters" : [ { "title" : "Introduction", "summary" : "Short introduction about Elasticsearch’s features ...", "number_of_pages" : 12 }, { "title" : "Data in, Data out", "summary" : "How to manage your data with Elasticsearch ...", "number_of_pages" : 39 }, ... ] } • JSON allows complex nesting of objects. • But how does this get indexed?
"...", "chapters.number_of_pages" : 12}, {"chapters.title" : "Data...", "chapters.summary" : "...", "chapters.number_of_pages" : 39}, ... { "title" : "Elasticsearch", ... } Lucene Documents Structure: • Inlining the inner objects as separate Lucene documents right before the root document. • The root document and its nested documents always remain in the same block.