Beyond the basics with Elasticsearch

Beyond the basics with Elasticsearch

098332e9d988080a9057816f84d668f7?s=128

Elasticsearch Inc

June 08, 2015
Tweet

Transcript

  1. Beyond the basics with Elasticsearch Honza Král @honzakral

  2. Search

  3. Bible concordance A simple form lists Biblical words alphabetically, with

    indications to enable the inquirer to find the passages of the Bible where the words occur. The first concordance, completed in 1230, was undertaken under the guidance of Hugo de Saint-Cher (Hugo de Sancto Charo), assisted by fellow Dominicans.
  4. Inverted index python file_1.txt file_2.txt file_3.txt web file_2.txt file_3.txt file_2.txt

    file_4.txt django file_3.txt flask jazz file_4.txt
  5. "Python" and "Django" python file_1.txt file_2.txt file_3.txt file_2.txt file_4.txt django

    file_3.txt flask jazz file_4.txt web file_2.txt file_3.txt
  6. Reversed search index queries, run documents Alerting threshold reached Classification

    language, geo Percolator
  7. Geo Classification curl -XPUT 'localhost:9200/events/.percolator/city-warsaw' -d '{ "query": {"filtered": {"filter":

    { "geo_shape": { "location": { "indexed_shape": { "index": "shapes", "type": "city", "id": "warsaw", "path": "area" } } } } } } }'
  8. Language Detection curl -XPUT 'localhost:9200/events/.percolator/lang-polish' -d '{ "query": { "match":

    { "description": { "query": "cześć chrząszcz Żubrówka Wyborowa kur...", "type": "boolean", "minimum_should_match": 4 } } } }'
  9. Classification curl -XPUT 'localhost:9200/events/conf/_percolate' -d '{ "doc": { "title": "pywaw

    summit", "description": "konferencja zorganizowana dla wszystkich zainteresowanych w technologii związanej z Pythona...", "location" : "40.73,-74.1" } }' { ... "matches" : [ {"_index" : "events", "_id" : "city-warsaw"}, {"_index" : "events", "_id" : "lang-polish"}, {"_index" : "events", "_id" : "topic-python"} ] }
  10. Relevancy

  11. Standard is TF-IDF good for text Lucene adds on top

    factors text length etc. Sometimes not good enough user contributed "quality" time/space decay ... Relevancy 0 0.2 0.4 0.6 0.8 1 20 30 40 50 60 score age "gauss" "exp" "lin" decay reference scale offset
  12. Hotel Search "function_score": { "query": {"match": {"name": "grand hotel"}}, "functions":

    [ { "filter": {"terms": {"facilities": ["balcony"]}}, "boost_factor": 2 },{ "gauss": { "field": "location", "scale": "1km", "reference": [51,0] } },{ "field_value_factor": {"field“: "popularity"} },{ "random_score": {} } ] }
  13. Aggregations

  14. None
  15. Recommendations

  16. Example: recommendations Product A user A purchase history Product B

    Product C user B purchase history Product D Users represented as documents Products represented as terms
  17. Dumb approach to recommendation (no relevance) Popular != Relevant {

    "query": { "terms": { "artists": user_likes} }, "aggs":{ "popular":{ "terms":{ "field": "artists", "exclude": user_likes } } } }
  18. Strong recommendations based on relevance Use the score! Compare to

    background { "query": { "terms": { "artists": user_likes} }, "aggs":{ "significant":{ "significant_terms":{ "field": "artists", "exclude": user_likes } } } }
  19. www.elastic.co Super-connected nodes in graphs We just figured out the

    way to surf only the meaningful connections in a graph! Concept A Concept B Concept C useful useless
  20. Meaningful Connections

  21. www.elastic.co Example: wikipedia Yoko Ono John Lennon article John Lennon

    United States Beatles article Ringo Starr Wikipedia articles are documents Links and page titles are terms
  22. Aggregation + Relevancy is how we look at world

  23. Thanks! Honza Král @honzakral