Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Inner hits

Inner hits

Talk about nested and parent/child features in elasticsearch and the new inner hits feature: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-request-inner-hits.html#search-request-inner-hits

Martijn van Groningen

March 06, 2015
Tweet

Other Decks in Technology

Transcript

  1. Background • Elasticsearch is a document based system. Documents are

    defined as JSON • The elasticsearch document is always converted to a Lucene document. Lucene Document is just key value pairs • Both the nested and parent-child support are just tools for document design.
  2. • Lets design a book document. Background { "title" :

    "Elasticsearch" "summary" : "The definitive guide for Elasticsearch ..." "published_year" : 2014, "num_pages" : 289, "author" : ["Clinton Gormley", "Zachary Tong"], "categories" : ["programming", "information retrieval"], }' • But how to add data to it that is related? Like chapter or page data.
  3. Background { "book_title" : "Elasticsearch" "book_summary" : "The definitive guide

    for Elasticsearch ..." "book_num_pages" : 289, "chapter_title" : "Introduction", "chapter_text" : "Short introduction about Elasticsearch’s features ...", "chapter_number_of_pages" : 12 }' • Lets add a book with chapters data. Each chapter as separate document with the book data. { "book_title" : "Elasticsearch" "book_summary" : "The definitive guide for Elasticsearch ..." "book_num_pages" : 289, "chapter_title" : "Data in, Data out", "chapter_text" : "How to manage your data with Elasticsearch ...", "chapter_num_pages" : 39 }' Document 1: Document 2:
  4. Background • Lets add a book with chapters data. Each

    chapter as inner object. { "title" : "Elasticsearch", "author" : ["Clinton Gormley", "Zachary Tong"], "categories" : ["programming", "information retrieval"], "published_year" : 2014, "summary" : "The definitive guide for Elasticsearch ...", "chapters" : [ { "title" : "Introduction", "summary" : "Short introduction about Elasticsearch’s features ...", "number_of_pages" : 12 }, { "title" : "Data in, Data out", "summary" : "How to manage your data with Elasticsearch ...", "number_of_pages" : 39 }, ... ] }
  5. Background { "title" : "Elasticsearch" "summary" : "The definitive guide

    for Elasticsearch ..." "num_pages" : 289, }' • Lets add a book with chapters data. Both book and chapters as separate documents and do a query time join. { "chapter_title" : "Data in, Data out", "chapter_text" : "How to manage your data with Elasticsearch ...", "chapter_num_pages" : 39 }' Document 1: Document 2: { "title" : "Introduction", "text" : "Short introduction about Elasticsearch’s features ...", "number_of_pages" : 12 } Document 2: Document 3:
  6. Background • Document Granularity is the unit of your data.

    • DG is different per application. The right DG depends on how your data is used.
  7. Parent child • Parent / child is a query time

    join between different document types in the same index. • Parent and children documents are stored as separate documents in the same index. • Child documents can point to only one parent. • Parent documents can be referred by multiple child documents. • Also a parent document can be a child document of a different parent.
  8. Parent child • A parent document and its children documents

    are routed into the same shard. Parent id is used as routing value. • In combination with a parent ids in memory data structure the parent-child join is fast.
  9. Parent child - Indexing • The parent document doesn’t need

    to exist at time of indexing. curl -XPUT 'localhost:9200/hotels' -d '{ "mappings" : { "price" : { "_parent" : { "type" : "hotel" } } } }' A price document is a parent of a hotel document curl -XPUT 'localhost:9200/products/offer/12?parent=2345' -d '{ "valid_from" : "2015-05-01", "valid_to" : "2015-10-01", "price" : 125, }' Then when indexing mention to what hotel a price points to.
  10. Parent child - Querying • The has_child query returns parent

    documents based on matches in its child documents. The optional “score_mode” defines how child hits are mapped to its parent document. curl -XGET 'localhost:9200/hotels/_search' -d '{ "query" : { "has_child" : { "type" : "price", "query" : { "range" : { "price" : { "lte" : 50 } } } } } }'
  11. Parent child - Querying • The has_parent query returns child

    documents based on matches in parent documents. curl -XGET 'localhost:9200/hotels/_search' -d '{ "query" : { "has_parent" : { "type" : "price", "query" : { "match" : { "facilities" : "pool" } } } } }'
  12. Inner hits • Inner hits includes the sub results of

    nested and parent / child queries. • Deals with the limitation that has_child query can only return parent documents. • Inner hits is cheap, it only runs in the fetch phase and only includes hits of the top N hits being returned.
  13. Inner hits curl -XGET 'localhost:9200/hotels/_search' -d '{ "query" : {

    "has_child" : { "type" : "price", "inner_hits" : {}, "query" : { "range" : { "price" : { "lte" : 50 } } } } } }'
  14. Nested objects • In many cases domain models have the

    same write / update live-cycle. • Books & Chapters. • Movies & Actors. • De-normalizing results in the fastest queries. • Compared to using parent/child queries. • Nested objects allow smart de-normalization.
  15. Nested objects { "title" : "Elasticsearch", "author" : "Clinton Gormley",

    "categories" : ["programming", "information retrieval"], "published_year" : 2014, "summary" : "The definitive guide for Elasticsearch ...", "chapters" : [ { "title" : "Introduction", "summary" : "Short introduction about Elasticsearch’s features ...", "number_of_pages" : 12 }, { "title" : "Data in, Data out", "summary" : "How to manage your data with Elasticsearch ...", "number_of_pages" : 39 }, ... ] } • JSON allows complex nesting of objects. • But how does this get indexed?
  16. Nested objects { "title" : "Elasticsearch", ... "chapters" : [

    {"title" : "Introduction", "summary" : "Short ...", "number_of_pages" : 12}, {"title" : "Data in, ...", "summary" : "How to ...", "number_of_pages" : 39}, ... ] } { "title" : "Elasticsearch", ... "chapters.title" : ["Data in, Data out", "Introduction"], "chapters.summary" : ["How to ...", "Short ..."], "chapters.number_of_pages" : [12, 39] } Original json document: Lucene Document Structure:
  17. Nested objects - Mapping • The nested type triggers Lucene’s

    block indexing. • Multiple levels of inner objects is possible. curl -XPUT 'localhost:9200/books' -d '{ "mappings" : { "book" : { "properties" : { "chapters" : { "type" : "nested" } } } } }' Document type Field type: ‘nested’
  18. Nested objects - Block indexing {"chapters.title" : "Into...", "chapters.summary" :

    "...", "chapters.number_of_pages" : 12}, {"chapters.title" : "Data...", "chapters.summary" : "...", "chapters.number_of_pages" : 39}, ... { "title" : "Elasticsearch", ... } Lucene Documents Structure: • Inlining the inner objects as separate Lucene documents right before the root document. • The root document and its nested documents always remain in the same block.
  19. Nested objects - Nested query • Nested query returns the

    complete “book” as hit. (root document) curl -XGET 'localhost:9200/books/book/_search' -d '{ "query" : { "nested" : { "path" : "chapters", "score_mode" : "avg", "query" : { "match" : { "chapters.summary" : { "query" : "indexing data" } } } } } }' Specify the nested level. Chapter level query score mode