Slide 1

Slide 1 text

From Keyword Search to Data Science

Slide 2

Slide 2 text

Introduction

Slide 3

Slide 3 text

Kau and e-commerce

Slide 4

Slide 4 text

Architecture BigQuery Kafka Indexer Elasticsearch Searcher Monolith reads reads reads writes reads

Slide 5

Slide 5 text

Searches

Slide 6

Slide 6 text

Phase: Analysis { "title": "Modernes Wandbild ... Kunstdruck New York", "search_stats": [ { "id_item": 306593629, "rank": 0.00012203718164007487, "term": "modernes wandbild new york" } ] }

Slide 7

Slide 7 text

Phase: Analysis - Category Guessing

Slide 8

Slide 8 text

Phase: Analysis - Shorter queries

Slide 9

Slide 9 text

Phase: Search

Slide 10

Slide 10 text

Next steps

Slide 11

Slide 11 text

Keyword search

Slide 12

Slide 12 text

ELSER

Slide 13

Slide 13 text

Query Expansion

Slide 14

Slide 14 text

Query Expansion

Slide 15

Slide 15 text

Query expansion steps

Slide 16

Slide 16 text

Query Example

Slide 17

Slide 17 text

Retrieve candidates PUT query-expansion-phrases { "mappings": { "properties": { "candidate": { "type": "text", "similarity" : "boolean", "fields": { "keyword" : { "type" : "keyword" } } } } } } POST query-expansion-phrases/_doc { "candidate": "nintendo switch controller" }

Slide 18

Slide 18 text

Retrieve candidates GET query_expansion_phrases/_search { "size": 500, "query": { "bool": { "must": [ { "match": { "candidate": "nintendo switch controller grün" } } ], "filter": [ { "script": { "script": { "source": ... } } } ] } } }

Slide 19

Slide 19 text

Retrieve candidates // params.query = "nintendo switch controller grün" if (params.query.indexOf(doc['candidate.keyword'].value) < 0) { return false; } def asList = Arrays.asList(/ /.split(params.query)); def tokenizer = new StringTokenizer(doc['candidate.keyword'].value, " "); while (tokenizer.hasMoreElements()) { def term = tokenizer.nextElement(); def isTermInQuery = asList.contains(term); // if the query does not contain the current term, we bail if (!isTermInQuery) { return false } } return true;

Slide 20

Slide 20 text

Retrieve candidates

Slide 21

Slide 21 text

Retrieve scores PUT query_expansion_cohesion_scores { "mappings": { "properties": { "item_term": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "query_term": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "score": { "type": "float" } } } }

Slide 22

Slide 22 text

Retrieve scores: Sample document POST query_expansion_cohesion_scores/_doc { "query_term": "nintendo switch controller", "item_term": "joy-con", "score": 0.023316383085096128 }

Slide 23

Slide 23 text

Retrieve scores: Query

Slide 24

Slide 24 text

Correlate query & document terms FROM query_expansion_cohesion_scores | WHERE ( query_term.keyword LIKE "nintendo switch controller" OR query_term.keyword LIKE "switch controller" OR query_term.keyword LIKE "nintendo switch" ) | STATS final_score=SUM(score) * POW( TO_DOUBLE( COUNT(item_term.keyword) )/3, 0.01 ), query_terms=VALUES(query_term.keyword) BY item_term.keyword | SORT final_score DESC | LIMIT 10

Slide 25

Slide 25 text

Correlate query & document terms "nintendo switch" "controller" "joy-con" { 1 "columns": [ 2 { "name": "final_score", "type": "double" }, 3 { "name": "query_terms", "type": "keyword" }, 4 { "name": "item_term.keyword", "type": "keyword" } 5 ], 6 "values": [ 7 [ 8 0.07394268180954819, 9 [ "switch controller", "nintendo switch controller" ], 10 11 ], 12 [ 13 0.073837760835886, 14 [ "switch controller", "nintendo switch", "nintendo switch controller" ], 15 16 ], 17 [ 18 0.05886813160032034, 19 [ "switch controller", "nintendo switch", "nintendo switch controller" ], 20 21 ], 22 23

Slide 26

Slide 26 text

Results

Slide 27

Slide 27 text

Next steps

Slide 28

Slide 28 text

Summary

Slide 29

Slide 29 text

Thank you

Slide 30

Slide 30 text

Thank you