Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

elasticsearch

Slide 3

Slide 3 text

me Developer at Twinslash email: [email protected] skype: dima.zhlobo github: proghat twitter: @proghat Dmitry Zhlobo

Slide 4

Slide 4 text

search is hard ● speed vs. relevancy

Slide 5

Slide 5 text

search is hard ● speed vs. relevancy ● real time

Slide 6

Slide 6 text

search is hard ● speed vs. relevancy ● real time ● different kinds of data

Slide 7

Slide 7 text

usual approach ● SELECT * FROM posts WHERE `body` LIKE '%query%'

Slide 8

Slide 8 text

usual approach ● SELECT * FROM posts WHERE `body` LIKE '%query%' ● gem 'thinking-sphinx' … Article.search(params[:q])

Slide 9

Slide 9 text

usual approach ● SELECT * FROM posts WHERE `body` LIKE '%query%' ● gem 'thinking-sphinx' … Article.search(params[:q])

Slide 10

Slide 10 text

how search works? ● document 1: flexible and powerful open source, distributed real- time search and analytics engine for the cloud... ● document 2: Apache Mahout has implementations of a wide range of machine learning and data mining... ● document 3: Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the MapReduce...

Slide 11

Slide 11 text

how search works? data mapreduce learning classification recommenders analysis

Slide 12

Slide 12 text

how search works? data mapreduce learning classification recommenders analysis 1 2 3 1 2 2 3 2 3 2 3

Slide 13

Slide 13 text

how search works? data mapreduce learning classification recommenders analysis 1 2 3 1 2 2 3 2 3 2 3

Slide 14

Slide 14 text

how search works? data mapreduce learning classification recommenders analysis 1 2 3 1 2 2 3 2 3 2 3

Slide 15

Slide 15 text

how search works? data mapreduce learning classification recommenders analysis 1 2 3 1 2 2 3 2 3 2 3

Slide 16

Slide 16 text

elasticsearch

Slide 17

Slide 17 text

● full text search ● real time data ● opensource ● restful api ● distributed ● schema free & document oriented elasticsearch

Slide 18

Slide 18 text

analysis Flexible and powerful search engine

Slide 19

Slide 19 text

analysis Flexible and powerful search engine char filters Mapping HTML Strip Pattern Replace

Slide 20

Slide 20 text

analysis char filters Flexible and powerful search engine

Slide 21

Slide 21 text

analysis char filters tokenizer Flexible and powerful search engine Path Hierarchy Keyword Letter Lowercase NGram Standard Whitespace Pattern Edge NGram

Slide 22

Slide 22 text

analysis char filters tokenizer Flexible and powerful search engine

Slide 23

Slide 23 text

analysis char filters tokenizer Flexible and powerful search engine token filters Stop Lowercase Snowball Synonym Trim Unique Normalization Stemmer Shingle Truncate Reverse

Slide 24

Slide 24 text

analysis tokenizer token filters char filters flexible powerful search engine

Slide 25

Slide 25 text

russian morphology “Он отлично рассказал о лучшем поисковом движке”

Slide 26

Slide 26 text

russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный

Slide 27

Slide 27 text

russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать

Slide 28

Slide 28 text

russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший

Slide 29

Slide 29 text

russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший поисковый

Slide 30

Slide 30 text

russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший движок поисковый

Slide 31

Slide 31 text

russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший движок поисковый

Slide 32

Slide 32 text

russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший движок “хороший поисковой движок” хороший движок поисковый поисковый

Slide 33

Slide 33 text

russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший движок “хороший поисковой движок” хороший поисковый движок поисковый

Slide 34

Slide 34 text

phonetic analysis

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

phonetic analysis Eyjafjallajökull

Slide 37

Slide 37 text

phonetic analysis Eyjafjallajökull Eiyafyalayokul iofiolDkul

Slide 38

Slide 38 text

analysis char filters tokenizer token filters

Slide 39

Slide 39 text

analysis char filters tokenizer token filters analyzer

Slide 40

Slide 40 text

analysis char filters tokenizer token filters analyzer ● has name

Slide 41

Slide 41 text

analysis char filters tokenizer token filters analyzer ● has name ● reusable

Slide 42

Slide 42 text

analysis char filters tokenizer token filters analysis: analyzer: rus_morphology: type: "custom" char_filter: ["html_strip"] tokenizer: "standard" filter: ["lowercase", "russian_morphology", "stopwords"]

Slide 43

Slide 43 text

getting started # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: "Search" }'

Slide 44

Slide 44 text

getting started # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: "Search" }'

Slide 45

Slide 45 text

getting started # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: "Search" }'

Slide 46

Slide 46 text

getting started # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: "Search" }'

Slide 47

Slide 47 text

# Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: “Search" }' curl -XPOST "localhost:9200/posts/post/whatever" -d '{ title: "ES" }' getting started

Slide 48

Slide 48 text

# Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: “Search" }' curl -XPOST "localhost:9200/posts/post/whatever" -d '{ title: "ES" }' # Search documents curl -XGET "localhost:9200/posts/_search?q=data" curl -XGET "localhost:9200/posts/_search?q=title:elasticsearch" getting started

Slide 49

Slide 49 text

# Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: “Search" }' curl -XPOST "localhost:9200/posts/post/whatever" -d '{ title: "ES" }' # Search documents curl -XGET "localhost:9200/posts/_search?q=data" curl -XGET "localhost:9200/posts/_search?q=title:elasticsearch" # Update and delete documents curl -XPUT "localhost:9200/posts/post/1" -d '{ title: “Data" }' curl -XDELETE "localhost:9200/posts/post/whatever" getting started

Slide 50

Slide 50 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" }

Slide 51

Slide 51 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } repository: { type: "string" }

Slide 52

Slide 52 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } repository: { type: "string", boost: 5 }

Slide 53

Slide 53 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } repository: { type: "string", boost: 5, analyzer: "repo_name" } repo_name: { tokenizer: "letter", filter: ["lowercase","phonetic"] }

Slide 54

Slide 54 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } description: { type: "string" }

Slide 55

Slide 55 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } description: { type: "string", analyzer: "english_text" } english_text: { tokenizer: "standard", filter: ["lowercase", "stopwords", "snowball"] }

Slide 56

Slide 56 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { } }

Slide 57

Slide 57 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { name: { type: "string" } } }

Slide 58

Slide 58 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { name: { type: "string", analyzer: "phonetic" } } } phonetic: { tokenizer: "standard", filter: ["lowercase", "stopwords", "beidermorse"] }

Slide 59

Slide 59 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { email: { type: "string" } } }

Slide 60

Slide 60 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { email: { type: "string", index: "not_analyzed" } } }

Slide 61

Slide 61 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } language: { type: "string" }

Slide 62

Slide 62 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } language: { type: "string", analyzer: "programming_lang" } programming_lang: { tokenizer: "keyword", filter: ["lowercase"] }

Slide 63

Slide 63 text

mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } created_at: { type: "date", format: "YYYY-MM-DD" }

Slide 64

Slide 64 text

mapping curl -XPOST "localhost:9200/repositories" -d ' settings: { analysis: { analyzer: { ... }, filter: { ... } } }, mappings: { repository: { properties: { ... } } }'

Slide 65

Slide 65 text

mapping curl -XPOST "localhost:9200/repositories" -d '...' curl -XPOST "localhost:9200/repositories/repository" -d ' { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "[email protected]" }, languages: ["Java", "Shell"], created_at: "2010-02-08" }'

Slide 66

Slide 66 text

search curl -XGET "localhost:9200/repositories/repository/_search?q=engine"

Slide 67

Slide 67 text

search curl -XPOST "localhost:9200/repositories/repository/_search" -d ' { query: { match: { description: "search" } } }'

Slide 68

Slide 68 text

search "hits" : { "total" : 3, "hits" : [ { "_score" : 0.22295055, "_source" : { repository: "elasticsearch/elasticsearch" } }, { "_score" : 0.22295055, "_source" : { repository: "ankane/searchkick" } }, { "_score" : 0.095891505, "_source" : { repository: "karmi/tire" } } ] }

Slide 69

Slide 69 text

search curl -XPOST "localhost:9200/repositories/repository/_search" -d ' { query: { match: { _all: "elasticsearch" } } }'

Slide 70

Slide 70 text

search "hits" : { "total" : 2, "hits" : [ { "_score" : 5.46875, "_source" : { repository: "elasticsearch/elasticsearch" } }, { "_score" : 0.04746387, "_source" : { repository: "karmi/tire" } } ] }

Slide 71

Slide 71 text

facets curl -XPOST "localhost:9200/repositories/repository/_search" -d ' { query: { match: { _all: "search" } }, facets: { language: { terms: { field: "languages" } } } }'

Slide 72

Slide 72 text

facets "hits" : { "total" : 2, "hits" : [ ... ] }, "facets" : { "language" : { "terms" : [ { "term" : "ruby", "count" : 2 }, { "term" : "shell", "count" : 1 }, { "term" : "java", "count" : 1 } ] } }

Slide 73

Slide 73 text

filters curl -XPOST "localhost:9200/repositories/repository/_search" -d ' { query: { filtered: { query: { match: { _all: "search" } }, filter: { term: { "languages": "java" } } } } }'

Slide 74

Slide 74 text

filters "hits" : { "total" : 1, "hits" : [ { "_score" : 5.46875, "_source" : { repository: "elasticsearch/elasticsearch" } } ] }

Slide 75

Slide 75 text

performance and scaling

Slide 76

Slide 76 text

performance and scaling elasticsearch is web scale

Slide 77

Slide 77 text

random facts ● bulk operations ● real time ● highlights ● geo types and geo distance facets ● attachments ● “did you mean?” and completions ● common terms ● filters and caching ● river

Slide 78

Slide 78 text

You know. For search.