Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond the basics with Elasticsearch

Beyond the basics with Elasticsearch

Elasticsearch Inc

June 08, 2015
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. Bible concordance A simple form lists Biblical words alphabetically, with

    indications to enable the inquirer to find the passages of the Bible where the words occur. The first concordance, completed in 1230, was undertaken under the guidance of Hugo de Saint-Cher (Hugo de Sancto Charo), assisted by fellow Dominicans.
  2. "Python" and "Django" python file_1.txt file_2.txt file_3.txt file_2.txt file_4.txt django

    file_3.txt flask jazz file_4.txt web file_2.txt file_3.txt
  3. Geo Classification curl -XPUT 'localhost:9200/events/.percolator/city-warsaw' -d '{ "query": {"filtered": {"filter":

    { "geo_shape": { "location": { "indexed_shape": { "index": "shapes", "type": "city", "id": "warsaw", "path": "area" } } } } } } }'
  4. Language Detection curl -XPUT 'localhost:9200/events/.percolator/lang-polish' -d '{ "query": { "match":

    { "description": { "query": "cześć chrząszcz Żubrówka Wyborowa kur...", "type": "boolean", "minimum_should_match": 4 } } } }'
  5. Classification curl -XPUT 'localhost:9200/events/conf/_percolate' -d '{ "doc": { "title": "pywaw

    summit", "description": "konferencja zorganizowana dla wszystkich zainteresowanych w technologii związanej z Pythona...", "location" : "40.73,-74.1" } }' { ... "matches" : [ {"_index" : "events", "_id" : "city-warsaw"}, {"_index" : "events", "_id" : "lang-polish"}, {"_index" : "events", "_id" : "topic-python"} ] }
  6. Standard is TF-IDF good for text Lucene adds on top

    factors text length etc. Sometimes not good enough user contributed "quality" time/space decay ... Relevancy 0 0.2 0.4 0.6 0.8 1 20 30 40 50 60 score age "gauss" "exp" "lin" decay reference scale offset
  7. Hotel Search "function_score": { "query": {"match": {"name": "grand hotel"}}, "functions":

    [ { "filter": {"terms": {"facilities": ["balcony"]}}, "boost_factor": 2 },{ "gauss": { "field": "location", "scale": "1km", "reference": [51,0] } },{ "field_value_factor": {"field“: "popularity"} },{ "random_score": {} } ] }
  8. Example: recommendations Product A user A purchase history Product B

    Product C user B purchase history Product D Users represented as documents Products represented as terms
  9. Dumb approach to recommendation (no relevance) Popular != Relevant {

    "query": { "terms": { "artists": user_likes} }, "aggs":{ "popular":{ "terms":{ "field": "artists", "exclude": user_likes } } } }
  10. Strong recommendations based on relevance Use the score! Compare to

    background { "query": { "terms": { "artists": user_likes} }, "aggs":{ "significant":{ "significant_terms":{ "field": "artists", "exclude": user_likes } } } }
  11. www.elastic.co Super-connected nodes in graphs We just figured out the

    way to surf only the meaningful connections in a graph! Concept A Concept B Concept C useful useless
  12. www.elastic.co Example: wikipedia Yoko Ono John Lennon article John Lennon

    United States Beatles article Ringo Starr Wikipedia articles are documents Links and page titles are terms