system • High cardinality of queries increases cache size and dismiss search performance • Typos, misspellings, and redundancy of the queries lead cache miss ◦ ”Nike shoes”, “Nike shoe”, and “Nike’s shoe”, ”Nike shooes”, ”Shoes Nike”
Search Engine - ML models C a c h e ”Nike shoes” ”Nike shoes” ”Nike shooes” R O S E ”Nike shoes” ”Nike footwear” ROSE is a robust cache that maps an online query to cached queries.
to capture the query similarity = robust to typos and semantic variance • Cache size needs to avoid scaling with the volume of queries • Lookup cost needs to be constant-time
to capture the query similarity => LHS (Locality Sensitive Hashing) • Cache size needs to avoid scaling with the volume of queries. • Lookup cost needs to be constant-time
to capture the query similarity => LHS (Locality Sensitive Hashing) • Cache size needs to avoid scaling with the volume of queries. => Reservoir sampling algorithm • Lookup cost must be constant-time
signature of this query and looking up the corresponding bucket in the hash tables 2. Rank the similarity of the cached queries within the bucket to the new search and return the top result
to capture the query similarity => LHS (Locality Sensitive Hashing) • Cache size needs to avoid scaling with the volume of queries. => Reservoir sampling algorithm • Lookup cost must be constant-time => Count-based 𝑘-selection
of query T=Average number of tokens, B=Bucket size Indexing Step Time Complexity 𝑂(𝐿·𝑁·𝑇 ) => O(N) in practice since L and H are small constants. L=Number of LSH, N=Number of query, T=Average number of tokens Retrieval Step Time Complexity O(𝐿T·𝐵L) => O(1) in practice since L,B, and T are small constants. (LT=calculating the hash values, BL=k-selection in the combined sets) Memory Complexity 𝑂(L·𝑁B·B) => memory usage is not increasing with the size of the cache (NB=Number of buckets in LSH)
to capture the query similarity => LHS (Locality Sensitive Hashing) • Cache size needs to avoid scaling with the volume of queries. => Reservoir sampling algorithm • Lookup cost must be constant-time => Count-based 𝑘-selection
query rewrite ◦ Rewrite queries to improve cache hit ratio and search experience • ROSE for Product Type Annotation ◦ Identifying the correct product type from the query and apply product type filter
performance by rewriting tail query and filter by query type search latency by robust caching • Several algorithms (LSH, Minhash, Reservoir sampling algorithm) used to reduce time / space complexity • Keep query similarity precision by preserving lexical similarity and product type