Slide 1

Slide 1 text

elasticsearch the missing intro Part 1: Indexing & Querying by Erik Rose

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

‘ Full-text search ‘ Big data ‘ Faceting ‘ Geographical queries what it’s good for

Slide 4

Slide 4 text

one (insanely productive) man

Slide 5

Slide 5 text

the rest of us ?

Slide 6

Slide 6 text

data structures

Slide 7

Slide 7 text

JSON HTTP on port 9200

Slide 8

Slide 8 text

index doctype another doctype {…}

Slide 9

Slide 9 text

IDs 6a8ca01c-7896-48e9- 81cc-9f70661fcb32

Slide 10

Slide 10 text

diplodocus …………………………… 333 duodenum …………………………… 201 dwaal …………………………… 500, 119

Slide 11

Slide 11 text

row → 0,1,3 boat → 0,1 chicken → 2 row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3

Slide 12

Slide 12 text

row → 0,1,3 boat → 0,1 chicken → 2 row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3

Slide 13

Slide 13 text

row → 0,1,3 boat → 0,1 chicken → 2 row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3

Slide 14

Slide 14 text

row → 0,1,3 boat → 0,1 chicken → 2 row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3

Slide 15

Slide 15 text

row → 0,1,3 boat → 0,1 chicken → 2 row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3

Slide 16

Slide 16 text

doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3 positions

Slide 17

Slide 17 text

doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 positions chicken chicken chicken the front row 2 3

Slide 18

Slide 18 text

doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 positions chicken chicken chicken the front row 2 3 ?

Slide 19

Slide 19 text

doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 positions chicken chicken chicken the front row 2 3 ?

Slide 20

Slide 20 text

analysis

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

stock analyzers original: Red-orange gerbils live at #43A Franklin St. whitespace: Red-orange, gerbils, live, at, #43A, Franklin, St. standard: red, orange, gerbils, live, 43a, franklin, st simple: red, orange, gerbils, live, at, a, franklin, st stop: red, orange, gerbils, live, franklin, st snowball: red, orang, gerbil, live, 43a, franklin, st ‘ stopwords ‘ stemming ‘ punctuation ‘ case-folding

Slide 23

Slide 23 text

curl -XGET -s 'http://localhost:9200/_analyze? analyzer=whitespace&pretty=true' -d 'Red- orange gerbils live at #43A Franklin St.' { "tokens" : [ { "token" : "Red-orange", "start_offset" : 0, "end_offset" : 10, "type" : "word", "position" : 1 }, { "token" : "gerbils", "start_offset" : 11, "end_offset" : 18, "type" : "word", "position" : 2 }, ...

Slide 24

Slide 24 text

'address': {'type': 'string', 'analyzer': 'address_analyzer'} address_analyzer CharFilter Tokenizer Token Filter terms

Slide 25

Slide 25 text

'analysis': { 'analyzer': { 'name_analyzer': { 'type': 'custom', 'tokenizer': 'name_tokenizer', 'filter': ['lowercase'] } }, 'tokenizer': { 'name_tokenizer': { 'type': 'pattern', 'pattern': "[^a-zA-Z']+" } } } name_analyzer CharFilter Tokenizer Token Filter terms

Slide 26

Slide 26 text

synonyms "filter": { "synonym": { "type": "synonym", "synonyms": [ "albert => albert, al", "allan => allan, al" ] } } original query: Allan Smith after synonyms: [allan, al] smith original query: Albert Smith after synonyms: [albert, al] smith

Slide 27

Slide 27 text

quer ying

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

{ "bool" : { "must" : { "term" : { "user" : "fred" } }, "must_not" : { "range" : { "age" : { "from" : 12, "to" : 21 } } }, "should" : [ { "term" : { "tag" : "crunchy" } }, { "term" : { "tag" : "elasticsearch" } } ], "minimum_number_should_match" : 1, "boost" : 1.0 } }

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

filters ‘ Boolean queries ‘ Fuzzy, scoring ‘ Fast ‘ Slower ‘ Cacheable ‘ Not cacheable

Slide 32

Slide 32 text

curl -XGET -s 'http://localhost:9200/blog/_search?pretty=true' -d \ '{ "query": { "filtered": { "filter": { "term": { "category": "rants" } }, "query": { "bool": { "should": [ { "match_phrase": { "body": "fix your little red wagon" } }, { "match": { "body": "fix your little red wagon" } } ] } } } } }'

Slide 33

Slide 33 text

‘ pyes ‘ pyelasticsearch query = {'query': { 'filtered': { 'query': { 'match': {'name': 'test tester'} }, 'filter': { 'range': { 'age': {'from': 27, 'to': 37} }}}}} es.search(query, index='people') ‘ elasticutils print [item['title'] for item in searcher.query(title__text='cookie') .filter(topics='websites')] libraries ✚

Slide 34

Slide 34 text

thank you twitter: ErikRose [email protected] Background image by Tim and Julie Wilson: https://secure.flickr.com/photos/secondtree/. This presentation is noncommercial sharealike in accordance with that image's license. Part 2: Sunday at 1:10pm