Join Data Type in Elasticsearch

Luca Gennari Elasticsearch EMEA, January 28, 2020 Original author: Luca
Gennari - Education Engineer EMEA Did someone said JOIN?  Join Data Type in Elasticsearch

2 A SQL Join walked into Elastic bar. A little
while later it walked out... Because it could't find a table! Did someone said JOIN?

3 Before talking about JDTs, let see Denormalization • Denormalizing
data refers to “flattening” data ‒ storing redundant copies of data in each document, instead of using some type of relationship ‒ _source is compressed which reduces the disk "waste" • Denormalization provides the best performance out of Elasticsearch ‒ That is the standard way to indexing documents in a search engine or in any NoSQL database Source: elastic.co

4 A simple example of Denormalization Source: elastic.co Blog id
d title publish_data 1 Beats 5.0.0 Release 2016-10-26 2 Build your own Beat 2016-07-14 blog author 1 1 1 3 2 2 2 3 Author id d name company 1 Tudor Gulobenco 1 2 Jongmin Kim 1 3 Monica Sarbu 1

5 Join Data Type in Elasticsearch    A way to
keep relations between documents stored in the same index. Available from version 6, they can be useful in many use cases like classification systems, categorising documents or products, frequent updates, etc... Let's have a look in depth...

6 • JDTs can help to redesign a document structure
if just a small part of it must be frequently updated. • Allows you to split data into multiple documents while maintaining a relationship between them. FREQUENT UPDATES

7 • sort documents according to their specifications and characteristics.
• documents can be categorised on multiple levels. • ease to adding new categories when required. CATEGORIZATION

8 • mitigate the number of false positives. • i
n c r e a s e p r e c i s i o n o n aggregations and full-text queries. • particularly suitable for a n a l y s i n g s c i e n t i f i c o r research documents. ANALYSIS

9 The logic behind JDT - Parent & Child 1.
Definition must be done at mapping level • just custom mapping, they can't be mapped dynamically 2. More relations per join field allowed • List all parent:child that are needed 3. Can have hierarchical relationships • a child can also be declared as parent 4. More than one parent document can be created per relationship. • parents documents can be created independently and they will be at the same level Source: elastic.co

10 Where JDTs can help e-commerce Machine Learning Data Science
• Product/category relationship • Customer level management • Promotions and 'Hot Products' • Better precision during the analysis phase • More consistent structures for DataFrames • Categorisation • Structuring the DataSets

11 Three ways to "mapping" a JDT

12 JDT mapping - 1 • Define the name of
the join field • in this example 'fruits_relations' ... (ok it's definitely not the best name in the world, but we have a demo later!) • Then define the type as 'type: join' • The join type requires at least one parent/ child relationship. • in our example we have two parent: • category_fruits • category_citrus • and two child: • fruits • citrus

13 JDT mapping - 2 • More than one child
can be defined per parent • passing a list of child's names as value. • In this case the relationship structure is of the following type:

14 JDT mapping - 3 • It is also possible
to define as second-level parent a child of another relation • In this case the relationship structure is of the following type:

15 • Parent/child relationship 1:1 • Single set of associated
documents for each parent • Multiple documents can be defined independently for each parent • Parent/child relationship 1:n • Different documents sets for each parent • All child documents are grouped by the parent they belong to • Parent/child relationship n:1 • Different documents sets for each parent • Multiple classification for each set of documents To put it simply... Simple Relationship Multiple Child Multiple levels

16 Characteristics of a Join document

17 Parent Document Name of the join field defined in
the mapping 'name' field is required. It is not part of the join but without can't find documents The rest of the document 'name' field that associates it to a parent (must exist in the mapping) PUT my_index/_doc/apple

18 Child Document Name of the join field defined in
the mapping The 'name' field is required as it was in the parent's document The rest of the document Field 'name' associates it with a child. The 'parent' field associates it with the _id of the parent

19 DEMO TIME

20 JDT Limitations • System performance may degrade for large
quantities of documents. To work with JDTs, the effort that elasticsearch makes for query or aggregation is greater than usual. • An index that maps a join field can also contain documents that have no relationship but, all those that use the relationship, must be indexed in the same shard. This means that the use of the _routing parameter is mandatory. (Which makes mandatory the use of _routing parameter for any operation). • Only a join field is allowed per index. • A parent can have an indefinite number of child, but a child can refer to only one parent Source: elastic.co

21 Q&A

Join Data Type in Elasticsearch

Join Data Type in Elasticsearch

Luca Gennari

More Decks by Luca Gennari

Other Decks in Programming

Featured

Transcript

Luca Gennari Elasticsearch EMEA, January 28, 2020 Original author: Luca

2 A SQL Join walked into Elastic bar. A little

3 Before talking about JDTs, let see Denormalization • Denormalizing

4 A simple example of Denormalization Source: elastic.co Blog id

5 Join Data Type in Elasticsearch    A way to

6 • JDTs can help to redesign a document structure

7 • sort documents according to their specifications and characteristics.

8 • mitigate the number of false positives. • i

9 The logic behind JDT - Parent & Child 1.

10 Where JDTs can help e-commerce Machine Learning Data Science

11 Three ways to "mapping" a JDT

12 JDT mapping - 1 • Define the name of

13 JDT mapping - 2 • More than one child

14 JDT mapping - 3 • It is also possible

15 • Parent/child relationship 1:1 • Single set of associated

16 Characteristics of a Join document

17 Parent Document Name of the join field defined in

18 Child Document Name of the join field defined in

19 DEMO TIME

20 JDT Limitations • System performance may degrade for large

21 Q&A