Slide 1

Slide 1 text

Beyond the basics with Elasticsearch Honza Král @honzakral

Slide 2

Slide 2 text

Search

Slide 3

Slide 3 text

Bible concordance A simple form lists Biblical words alphabetically, with indications to enable the inquirer to find the passages of the Bible where the words occur. The first concordance, completed in 1230, was undertaken under the guidance of Hugo de Saint-Cher (Hugo de Sancto Charo), assisted by fellow Dominicans.

Slide 4

Slide 4 text

Inverted index python file_1.txt file_2.txt file_3.txt web file_2.txt file_3.txt file_2.txt file_4.txt django file_3.txt flask jazz file_4.txt

Slide 5

Slide 5 text

"Python" and "Django" python file_1.txt file_2.txt file_3.txt file_2.txt file_4.txt django file_3.txt flask jazz file_4.txt web file_2.txt file_3.txt

Slide 6

Slide 6 text

Reversed search index queries, run documents Alerting threshold reached Classification language, geo Percolator

Slide 7

Slide 7 text

Geo Classification curl -XPUT 'localhost:9200/events/.percolator/city-warsaw' -d '{ "query": {"filtered": {"filter": { "geo_shape": { "location": { "indexed_shape": { "index": "shapes", "type": "city", "id": "warsaw", "path": "area" } } } } } } }'

Slide 8

Slide 8 text

Language Detection curl -XPUT 'localhost:9200/events/.percolator/lang-polish' -d '{ "query": { "match": { "description": { "query": "cześć chrząszcz Żubrówka Wyborowa kur...", "type": "boolean", "minimum_should_match": 4 } } } }'

Slide 9

Slide 9 text

Classification curl -XPUT 'localhost:9200/events/conf/_percolate' -d '{ "doc": { "title": "pywaw summit", "description": "konferencja zorganizowana dla wszystkich zainteresowanych w technologii związanej z Pythona...", "location" : "40.73,-74.1" } }' { ... "matches" : [ {"_index" : "events", "_id" : "city-warsaw"}, {"_index" : "events", "_id" : "lang-polish"}, {"_index" : "events", "_id" : "topic-python"} ] }

Slide 10

Slide 10 text

Relevancy

Slide 11

Slide 11 text

Standard is TF-IDF good for text Lucene adds on top factors text length etc. Sometimes not good enough user contributed "quality" time/space decay ... Relevancy 0 0.2 0.4 0.6 0.8 1 20 30 40 50 60 score age "gauss" "exp" "lin" decay reference scale offset

Slide 12

Slide 12 text

Hotel Search "function_score": { "query": {"match": {"name": "grand hotel"}}, "functions": [ { "filter": {"terms": {"facilities": ["balcony"]}}, "boost_factor": 2 },{ "gauss": { "field": "location", "scale": "1km", "reference": [51,0] } },{ "field_value_factor": {"field“: "popularity"} },{ "random_score": {} } ] }

Slide 13

Slide 13 text

Aggregations

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Recommendations

Slide 16

Slide 16 text

Example: recommendations Product A user A purchase history Product B Product C user B purchase history Product D Users represented as documents Products represented as terms

Slide 17

Slide 17 text

Dumb approach to recommendation (no relevance) Popular != Relevant { "query": { "terms": { "artists": user_likes} }, "aggs":{ "popular":{ "terms":{ "field": "artists", "exclude": user_likes } } } }

Slide 18

Slide 18 text

Strong recommendations based on relevance Use the score! Compare to background { "query": { "terms": { "artists": user_likes} }, "aggs":{ "significant":{ "significant_terms":{ "field": "artists", "exclude": user_likes } } } }

Slide 19

Slide 19 text

www.elastic.co Super-connected nodes in graphs We just figured out the way to surf only the meaningful connections in a graph! Concept A Concept B Concept C useful useless

Slide 20

Slide 20 text

Meaningful Connections

Slide 21

Slide 21 text

www.elastic.co Example: wikipedia Yoko Ono John Lennon article John Lennon United States Beatles article Ringo Starr Wikipedia articles are documents Links and page titles are terms

Slide 22

Slide 22 text

Aggregation + Relevancy is how we look at world

Slide 23

Slide 23 text

Thanks! Honza Král @honzakral