Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Paper reading - ROSE: Robust Caches for Amazon Product Search

Paper reading - ROSE: Robust Caches for Amazon Product Search

Kanji Yomoda

May 30, 2022
Tweet

More Decks by Kanji Yomoda

Other Decks in Technology

Transcript

  1. Confidential & Proprietary 2021 Outline • Difficulty of the cache

    in search system • ROSE ◦ Index Generation ◦ Online Retrieval ◦ Theoretical Analysis ◦ Experiments • Summary
  2. Confidential & Proprietary 2021 Difficulty of the cache in search

    system • High cardinality of queries increases cache size and dismiss search performance • Typos, misspellings, and redundancy of the queries lead cache miss ◦ ”Nike shoes”, “Nike shoe”, and “Nike’s shoe”, ”Nike shooes”, ”Shoes Nike”
  3. Confidential & Proprietary 2021 Difficulty of the cache in search

    system ”Nike shooes” Search Backend - Search Engine - ML models ”Nike shoes” ”Nike’s shoes” C a c h e
  4. Confidential & Proprietary 2021 ROSE ”Nike shoes” Search System -

    Search Engine - ML models C a c h e ”Nike shoes” ”Nike shoes” ”Nike shooes” R O S E ”Nike shoes” ”Nike footwear” ROSE is a robust cache that maps an online query to cached queries.
  5. Confidential & Proprietary 2021 ROSE - Requirements • Cache needs

    to capture the query similarity = robust to typos and semantic variance • Cache size needs to avoid scaling with the volume of queries • Lookup cost needs to be constant-time
  6. Confidential & Proprietary 2021 LSH - Definition R: threshold c:

    approximation factor h(x) = (hy): x and y collide (= same bucket)
  7. Confidential & Proprietary 2021 LSH - Minwise Hashing S1={2,5,7,9}, S2={1,2,4,7,10}

    Minwise Hashing shuffle(S1) shuffle(S2) Pr(S1[0] == S2[0]) = 2/7 Example Jaccard similarity S1∪S2={1,2,4,5,7,9,10} S1∩S2={2,7} S1∩S2 / S1∪S2 = 2/7 ≒
  8. Confidential & Proprietary 2021 ROSE - Requirements • Cache needs

    to capture the query similarity => LHS (Locality Sensitive Hashing) • Cache size needs to avoid scaling with the volume of queries. • Lookup cost needs to be constant-time
  9. Confidential & Proprietary 2021 Reservoir sampling algorithm processes a stream

    of m numbers and can generate R uniform samples only using an array of size R
  10. Confidential & Proprietary 2021 Reservoir sampling algorithm Example: Choose 1(R)

    person equally randomly out of m=? ・・・ 1/1 1/2 1/3
  11. Confidential & Proprietary 2021 ROSE - Requirements • Cache needs

    to capture the query similarity => LHS (Locality Sensitive Hashing) • Cache size needs to avoid scaling with the volume of queries. => Reservoir sampling algorithm • Lookup cost must be constant-time
  12. Confidential & Proprietary 2021 Online retrieval 1. Computing the LSH

    signature of this query and looking up the corresponding bucket in the hash tables 2. Rank the similarity of the cached queries within the bucket to the new search and return the top result
  13. Confidential & Proprietary 2021 ROSE - Requirements • Cache needs

    to capture the query similarity => LHS (Locality Sensitive Hashing) • Cache size needs to avoid scaling with the volume of queries. => Reservoir sampling algorithm • Lookup cost must be constant-time => Count-based 𝑘-selection
  14. Confidential & Proprietary 2021 Theoretical Analysis L=Number of LSH, N=Number

    of query T=Average number of tokens, B=Bucket size Indexing Step Time Complexity 𝑂(𝐿·𝑁·𝑇 ) => O(N) in practice since L and H are small constants. L=Number of LSH, N=Number of query, T=Average number of tokens Retrieval Step Time Complexity O(𝐿T·𝐵L) => O(1) in practice since L,B, and T are small constants. (LT=calculating the hash values, BL=k-selection in the combined sets) Memory Complexity 𝑂(L·𝑁B·B) => memory usage is not increasing with the size of the cache (NB=Number of buckets in LSH)
  15. Confidential & Proprietary 2021 ROSE - Requirements • Cache needs

    to capture the query similarity => LHS (Locality Sensitive Hashing) • Cache size needs to avoid scaling with the volume of queries. => Reservoir sampling algorithm • Lookup cost must be constant-time => Count-based 𝑘-selection
  16. Confidential & Proprietary 2021 Deployment in Amazon.com • ROSE for

    query rewrite ◦ Rewrite queries to improve cache hit ratio and search experience • ROSE for Product Type Annotation ◦ Identifying the correct product type from the query and apply product type filter
  17. Confidential & Proprietary 2021 Experiments Result Cache the intended product

    type of 5- 10 million frequent queries and measured metrics with and without product type recognition
  18. Confidential & Proprietary 2021 Experiments Result With ROSE, most of

    the search traffic is covered with single digit milliseconds latency
  19. Confidential & Proprietary 2021 Summary • ROSE improved both search

    performance by rewriting tail query and filter by query type search latency by robust caching • Several algorithms (LSH, Minhash, Reservoir sampling algorithm) used to reduce time / space complexity • Keep query similarity precision by preserving lexical similarity and product type
  20. Confidential & Proprietary 2021 References • ROSE: Robust Caches for

    Amazon Product Search • Locality Sensitive Hashing (LSH): The Illustrated Guide • MinHashによる高速な類似検索 • Some Rare LSH Gems for Large-scale Machine Learning