Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch and Recommendations

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Elasticsearch and Recommendations

Quick overview about:
* Elasticsearch
* The Elastic Stack
* How you can get recommendations from Elasticsearch. Either in combination with Spark or through Elasticsearch aggregations.

Avatar for Philipp Krenn

Philipp Krenn

December 20, 2017
Tweet

More Decks by Philipp Krenn

Other Decks in Programming

Transcript

  1. $ curl http://localhost:9200 { "name": "instance-0000000003", "cluster_name": "44d59d42507ebde50eec3dd749c45b86", "cluster_uuid": "4mZwuq5eS4ieREDzVq0B_Q",

    "version": { "number": "6.1.1", "build_hash": "bd92e7f", "build_date": "2017-12-17T20:23:25.338Z", "build_snapshot": false, "lucene_version": "7.1.0", "minimum_wire_compatibility_version": "5.6.0", "minimum_index_compatibility_version": "5.0.0" }, "tagline": "You Know, for Search" }
  2. $ curl http://localhost:9200 { "name": "instance-0000000003", "cluster_name": "44d59d42507ebde50eec3dd749c45b86", "cluster_uuid": "4mZwuq5eS4ieREDzVq0B_Q",

    "version": { "number": "6.1.1", "build_hash": "bd92e7f", "build_date": "2017-12-17T20:23:25.338Z", "build_snapshot": false, "lucene_version": "7.1.0", "minimum_wire_compatibility_version": "5.6.0", "minimum_index_compatibility_version": "5.0.0" }, "tagline": "You Know, for Search" }
  3. { "tokens": [ { "token": "droid", "start_offset": 18, "end_offset": 24,

    "type": "<ALPHANUM>", "position": 4 }, { "token": "you", "start_offset": 25, "end_offset": 28, "type": "<ALPHANUM>", "position": 5 }, ... ] }
  4. GET /_analyze { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter":

    [ "lowercase", "stop", "snowball" ], "text": "These are <em>not</em> the droids you are looking for." }
  5. { "tokens": [ { "token": "droid", "start_offset": 27, "end_offset": 33,

    "type": "<ALPHANUM>", "position": 4 }, { "token": "you", "start_offset": 34, "end_offset": 37, "type": "<ALPHANUM>", "position": 5 }, ... ] }
  6. Common POST movies/user/_search { "size": 0, "query": { "match": {

    "movies_liked": "Terminator" } }, "aggregations": { "movies_like_terminator": { "terms": { "field": "movies_liked.keyword", "min_doc_count": 1 } } } }
  7. Significant POST movies/user/_search { "size": 0, "query": { "match": {

    "movies_liked": "Terminator" } }, "aggregations": { "movies_like_terminator": { "significant_terms": { "field": "movies_liked.keyword", "min_doc_count": 1 } } } }
  8. Significance: JLH https:/ /github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/search/aggregations/bucket/significant/heuristics/JLHScore.java double subsetProbability = (double) subsetFreq /

    (double) subsetSize; double supersetProbability = (double) supersetFreq / (double) supersetSize; double absoluteProbabilityChange = subsetProbability - supersetProbability; if (absoluteProbabilityChange <= 0) { return 0; } double relativeProbabilityChange = (subsetProbability / supersetProbability); return absoluteProbabilityChange * relativeProbabilityChange;