Slide 1

Slide 1 text

‹#› Nik Everett, Software Engineer 2017-06-09
 Elasticsearch:
 Diagram time!

Slide 2

Slide 2 text

Elasticsearch is a distributed search and analytics engine. 2

Slide 3

Slide 3 text

Indexing a document 3 curl \ -XPUT \ -H "Content-Type: Application/json" \ localhost:9200/enwiki/doc/1 \ -d \ '{ "title":"Cat", "text":"The domestic cat (Latin: Felis catus) is a small, typically furry, carnivorous mammal." }'

Slide 4

Slide 4 text

Indexing a document (CONSOLE syntax) 4 PUT /enwiki/doc/1 { "title":"Cat", "text":"The domestic cat (Latin: Felis catus) is a small, typically furry, carnivorous mammal.", "popularity_score":2.990724241035e-5 }

Slide 5

Slide 5 text

Indexing a document PUT /enwiki/doc/1 5 Node 0 enwiki 0 enwiki 1 Node 1 enwiki 0 Node 2 enwiki 1

Slide 6

Slide 6 text

Getting a document GET /enwiki/doc/1 6 Node 0 enwiki 0 enwiki 1 Node 1 enwiki 0 Node 2 enwiki 1

Slide 7

Slide 7 text

Searching 7 POST /enwiki/_search { "query": { "match": { "text": "small domestic" } } } A simple search

Slide 8

Slide 8 text

Searching POST /enwiki/_search 8 Node 0 enwiki 0 enwiki 1 Node 1 enwiki 0 Node 2 enwiki 1

Slide 9

Slide 9 text

Searching POST /enwiki/_search 9 Node 0 Node 1 Node 2 Query Phase Fetch Phase

Slide 10

Slide 10 text

Searching Query Phase: Finding 10 Term Document Ids a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 cat 1, 4, 9, 10 carnivorous 1, 6, 10 domestic 1, 4, 8, 9 furry 1, 8, 10 c a r t n i … s d

Slide 11

Slide 11 text

Searching Fetch Phase: Stored fields 11 Docs Stored Fields 1-3 id: 1, type: doc, _source:{“title”: “Cat”, “text”: “The domestic cat…”} id: 2, type: doc, _source:{“title”: “Dog”, “text”: “The domestic dog…”} id: 3, type: doc, _source:{“title”: “Bird”, “Birds (aves), a subgroup of reptiles…”} 4-10 …

Slide 12

Slide 12 text

Searching 12 POST /enwiki/_search { "query": {"match": {"text": "small domestic"}}, "aggs": { "score": { "extended_stats": { "field": "popularity_score" } } } } Aggregations

Slide 13

Slide 13 text

Searching Doc values 13 • stored fields: doc id → field → values • doc values: field → doc id → values • Go watch: • Amusing Algorithms and Details on Data Structures • All About Elasticsearch Algorithms and Data Structures

Slide 14

Slide 14 text

Searching Segments 14

Slide 15

Slide 15 text

Analysis Example The domestic cat (Latin: Felis catus) is a small, typically furry, carnivorous mammal. The domestic cat (Latin: Felis catus) is a small, typically furry, carnivorous mammal. the domestic cat (latin: felis catus) is a small, typically furry, carnivorous mammal. the domestic cat latin felis catus is a small typically furry carnivorous mammal the domst cat latin feli catu is a small typic furri carnivor mammal POST /enwiki/_search { "query": { "term": {"text": "carnivore"} } } POST /enwiki/_search { "query": { "match": {"text": "carnivore"} } }

Slide 16

Slide 16 text

Analysis Boost precise matches the domestic cat latin felis catus is a small typically furry carnivorous mammal the domst cat latin feli catu is a small typic furri carnivor mammal POST /enwiki/_search { "query": { "bool": { "should": [ {"match":{"text.precise": {"query": "carnivore", "boost": 5}}}, {"match":{"text.stemmed": {"query": "carnivore", "boost": 1}}} ] } } }

Slide 17

Slide 17 text

Summary 17 • Requests bounce from node to node asynchronously • Indexes logically hold documents • Shard physically hold documents • Data is written to disk multiple times in different ways to optimize access patterns • Each data structure is immutable and optimized by a background process • Analyze text for finding things better and faster

Slide 18

Slide 18 text

‹#› Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/ Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 18