Slide 1

Slide 1 text

ROSE: Robust Caches for Amazon Product Search Kanji Yomoda (@k-yomo) May 2022

Slide 2

Slide 2 text

Confidential & Proprietary 2021 Outline ● Difficulty of the cache in search system ● ROSE ○ Index Generation ○ Online Retrieval ○ Theoretical Analysis ○ Experiments ● Summary

Slide 3

Slide 3 text

Confidential & Proprietary 2021 Difficulty of the cache in search system ● High cardinality of queries increases cache size and dismiss search performance ● Typos, misspellings, and redundancy of the queries lead cache miss ○ ”Nike shoes”, “Nike shoe”, and “Nike’s shoe”, ”Nike shooes”, ”Shoes Nike”

Slide 4

Slide 4 text

Confidential & Proprietary 2021 Difficulty of the cache in search system ”Nike shooes” Search Backend - Search Engine - ML models ”Nike shoes” ”Nike’s shoes” C a c h e

Slide 5

Slide 5 text

Confidential & Proprietary 2021 ROSE

Slide 6

Slide 6 text

Confidential & Proprietary 2021 ROSE ”Nike shoes” Search System - Search Engine - ML models C a c h e ”Nike shoes” ”Nike shoes” ”Nike shooes” R O S E ”Nike shoes” ”Nike footwear” ROSE is a robust cache that maps an online query to cached queries.

Slide 7

Slide 7 text

Confidential & Proprietary 2021 ROSE - Requirements ● Cache needs to capture the query similarity = robust to typos and semantic variance ● Cache size needs to avoid scaling with the volume of queries ● Lookup cost needs to be constant-time

Slide 8

Slide 8 text

Confidential & Proprietary 2021 Index generation

Slide 9

Slide 9 text

Confidential & Proprietary 2021 Index generation

Slide 10

Slide 10 text

Confidential & Proprietary 2021 LSH - Locality Sensitive Hashing

Slide 11

Slide 11 text

Confidential & Proprietary 2021 LSH - Definition R: threshold c: approximation factor h(x) = (hy): x and y collide (= same bucket)

Slide 12

Slide 12 text

Confidential & Proprietary 2021 LSH - Minwise Hashing

Slide 13

Slide 13 text

Confidential & Proprietary 2021 LSH - Minwise Hashing S1={2,5,7,9}, S2={1,2,4,7,10} Minwise Hashing shuffle(S1) shuffle(S2) Pr(S1[0] == S2[0]) = 2/7 Example Jaccard similarity S1∪S2={1,2,4,5,7,9,10} S1∩S2={2,7} S1∩S2 / S1∪S2 = 2/7 ≒

Slide 14

Slide 14 text

Confidential & Proprietary 2021 LSH - Minwise Hashing jaccard similarity

Slide 15

Slide 15 text

Confidential & Proprietary 2021 ROSE - Requirements ● Cache needs to capture the query similarity => LHS (Locality Sensitive Hashing) ● Cache size needs to avoid scaling with the volume of queries. ● Lookup cost needs to be constant-time

Slide 16

Slide 16 text

Confidential & Proprietary 2021 Reservoir sampling algorithm processes a stream of m numbers and can generate R uniform samples only using an array of size R

Slide 17

Slide 17 text

Confidential & Proprietary 2021 Reservoir sampling algorithm Example: Choose 1(R) person equally randomly out of m=? ・・・ 1/1 1/2 1/3

Slide 18

Slide 18 text

Confidential & Proprietary 2021 ROSE - Requirements ● Cache needs to capture the query similarity => LHS (Locality Sensitive Hashing) ● Cache size needs to avoid scaling with the volume of queries. => Reservoir sampling algorithm ● Lookup cost must be constant-time

Slide 19

Slide 19 text

Confidential & Proprietary 2021 Online retrieval

Slide 20

Slide 20 text

Confidential & Proprietary 2021 Online retrieval 1. Computing the LSH signature of this query and looking up the corresponding bucket in the hash tables 2. Rank the similarity of the cached queries within the bucket to the new search and return the top result

Slide 21

Slide 21 text

Confidential & Proprietary 2021 Online retrieval

Slide 22

Slide 22 text

Confidential & Proprietary 2021 Count-based 𝑘-selection

Slide 23

Slide 23 text

Confidential & Proprietary 2021 Count-based 𝑘-selection Count collision and rank

Slide 24

Slide 24 text

Confidential & Proprietary 2021 ROSE - Requirements ● Cache needs to capture the query similarity => LHS (Locality Sensitive Hashing) ● Cache size needs to avoid scaling with the volume of queries. => Reservoir sampling algorithm ● Lookup cost must be constant-time => Count-based 𝑘-selection

Slide 25

Slide 25 text

Confidential & Proprietary 2021 Theoretical Analysis L=Number of LSH, N=Number of query T=Average number of tokens, B=Bucket size Indexing Step Time Complexity 𝑂(𝐿·𝑁·𝑇 ) => O(N) in practice since L and H are small constants. L=Number of LSH, N=Number of query, T=Average number of tokens Retrieval Step Time Complexity O(𝐿T·𝐵L) => O(1) in practice since L,B, and T are small constants. (LT=calculating the hash values, BL=k-selection in the combined sets) Memory Complexity 𝑂(L·𝑁B·B) => memory usage is not increasing with the size of the cache (NB=Number of buckets in LSH)

Slide 26

Slide 26 text

Confidential & Proprietary 2021 ROSE - Requirements ● Cache needs to capture the query similarity => LHS (Locality Sensitive Hashing) ● Cache size needs to avoid scaling with the volume of queries. => Reservoir sampling algorithm ● Lookup cost must be constant-time => Count-based 𝑘-selection

Slide 27

Slide 27 text

Confidential & Proprietary 2021 Deployment in Amazon.com ● ROSE for query rewrite ○ Rewrite queries to improve cache hit ratio and search experience ● ROSE for Product Type Annotation ○ Identifying the correct product type from the query and apply product type filter

Slide 28

Slide 28 text

Confidential & Proprietary 2021 Experiments Result Cache the intended product type of 5- 10 million frequent queries and measured metrics with and without product type recognition

Slide 29

Slide 29 text

Confidential & Proprietary 2021 Experiments Result With ROSE, most of the search traffic is covered with single digit milliseconds latency

Slide 30

Slide 30 text

Confidential & Proprietary 2021 Summary ● ROSE improved both search performance by rewriting tail query and filter by query type search latency by robust caching ● Several algorithms (LSH, Minhash, Reservoir sampling algorithm) used to reduce time / space complexity ● Keep query similarity precision by preserving lexical similarity and product type

Slide 31

Slide 31 text

Confidential & Proprietary 2021 References ● ROSE: Robust Caches for Amazon Product Search ● Locality Sensitive Hashing (LSH): The Illustrated Guide ● MinHashによる高速な類似検索 ● Some Rare LSH Gems for Large-scale Machine Learning

Slide 32

Slide 32 text

Confidential & Proprietary 2021 Thanks!