Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch and Recommendations

Elasticsearch and Recommendations

Quick overview about:
* Elasticsearch
* The Elastic Stack
* How you can get recommendations from Elasticsearch. Either in combination with Spark or through Elasticsearch aggregations.

Philipp Krenn

December 20, 2017
Tweet

More Decks by Philipp Krenn

Other Decks in Programming

Transcript

  1. $ curl http://localhost:9200 { "name": "instance-0000000003", "cluster_name": "44d59d42507ebde50eec3dd749c45b86", "cluster_uuid": "4mZwuq5eS4ieREDzVq0B_Q",

    "version": { "number": "6.1.1", "build_hash": "bd92e7f", "build_date": "2017-12-17T20:23:25.338Z", "build_snapshot": false, "lucene_version": "7.1.0", "minimum_wire_compatibility_version": "5.6.0", "minimum_index_compatibility_version": "5.0.0" }, "tagline": "You Know, for Search" }
  2. $ curl http://localhost:9200 { "name": "instance-0000000003", "cluster_name": "44d59d42507ebde50eec3dd749c45b86", "cluster_uuid": "4mZwuq5eS4ieREDzVq0B_Q",

    "version": { "number": "6.1.1", "build_hash": "bd92e7f", "build_date": "2017-12-17T20:23:25.338Z", "build_snapshot": false, "lucene_version": "7.1.0", "minimum_wire_compatibility_version": "5.6.0", "minimum_index_compatibility_version": "5.0.0" }, "tagline": "You Know, for Search" }
  3. { "tokens": [ { "token": "droid", "start_offset": 18, "end_offset": 24,

    "type": "<ALPHANUM>", "position": 4 }, { "token": "you", "start_offset": 25, "end_offset": 28, "type": "<ALPHANUM>", "position": 5 }, ... ] }
  4. GET /_analyze { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter":

    [ "lowercase", "stop", "snowball" ], "text": "These are <em>not</em> the droids you are looking for." }
  5. { "tokens": [ { "token": "droid", "start_offset": 27, "end_offset": 33,

    "type": "<ALPHANUM>", "position": 4 }, { "token": "you", "start_offset": 34, "end_offset": 37, "type": "<ALPHANUM>", "position": 5 }, ... ] }
  6. Common POST movies/user/_search { "size": 0, "query": { "match": {

    "movies_liked": "Terminator" } }, "aggregations": { "movies_like_terminator": { "terms": { "field": "movies_liked.keyword", "min_doc_count": 1 } } } }
  7. Significant POST movies/user/_search { "size": 0, "query": { "match": {

    "movies_liked": "Terminator" } }, "aggregations": { "movies_like_terminator": { "significant_terms": { "field": "movies_liked.keyword", "min_doc_count": 1 } } } }
  8. Significance: JLH https:/ /github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/search/aggregations/bucket/significant/heuristics/JLHScore.java double subsetProbability = (double) subsetFreq /

    (double) subsetSize; double supersetProbability = (double) supersetFreq / (double) supersetSize; double absoluteProbabilityChange = subsetProbability - supersetProbability; if (absoluteProbabilityChange <= 0) { return 0; } double relativeProbabilityChange = (subsetProbability / supersetProbability); return absoluteProbabilityChange * relativeProbabilityChange;