Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic Equivalence of e-Commerce Queries

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Semantic Equivalence of e-Commerce Queries

This 2023 Workshop on e-Commerce and NLP presentation discusses computing the semantic equivalence of e-Commerce search queries. Search query variation poses a challenge in e-commerce search, as equivalent search intents can be expressed through different queries with surface-level differences. This paper introduces a framework to recognize and leverage query equivalence to enhance searcher and business outcomes. The proposed approach addresses three key problems: mapping queries to vector representations of search intent, identifying nearest neighbor queries expressing equivalent or similar intent, and optimizing for user or business objectives. The framework utilizes both surface similarity and behavioral similarity to determine query equivalence. Surface similarity involves canonicalizing queries based on word inflection, word order, compounding, and noise words. Behavioral similarity leverages historical search behavior to generate vector representations of query intent. An offline process is used to train a sentence similarity model, while an online nearest neighbor approach supports processing of unseen queries. Experimental evaluations demonstrate the effectiveness of the proposed approach, outperforming popular sentence transformer models and achieving a Pearson correlation of 0.85 for query similarity. The results highlight the potential of leveraging historical behavior data and training models to recognize and utilize query equivalence in e-commerce search, leading to improved user experiences and business outcomes. Further advancements and benchmark datasets are encouraged to facilitate the development of solutions for this critical problem in the e-commerce domain.

Avatar for Daniel Tunkelang

Daniel Tunkelang

May 26, 2026

More Decks by Daniel Tunkelang

Other Decks in Technology

Transcript

  1. Search query != search intent. • Information retrieval researchers worry

    about queries that map to multiple intents. jaguar or ? • Practitioners worry more about multiple queries that map to the same intent. lightning to 3.5mm iphone to aux
  2. High-level strategy to leverage query equivalence. Map queries to vectors.

    Store in nearest-neighbor database. (i.e., optimize for user or business outcome)
  3. Two strategies for recognizing equivalent queries. • Surface Similarity ◦

    Variation in inflection, word order, compounding, noise words. black tshirts for men = mens black t-shirt = • Behavioral Similarity ◦ Queries lead to engagement with equivalent or similar results. lightning to 3.5mm = iphone to aux =
  4. Query vectors are centroids of associated product vectors ► ►

    [0.13, 0.81, … ] [0.09, 0.75, … ] … ► [0.11, 0.79, … ] [0.13, 0.81, … ] [0.09, 0.77, … ] … ► [0.12, 0.78, … ] ► cos > 0.98 black tshirts for men mens black t-shirt
  5. Works well, but only for head and torso queries. •

    Offline approach works for queries with enough engagement history. • Would be expensive to compute aggregates of result vectors online. • Still, head and torso queries tend to represent a large fraction of traffic.
  6. Train online sentence transformer model for tail queries. • Train

    using (query1, query2, similarity) triples from offline model. • Oversample similar query pairs to increase sensitivity where it matters. • Fine-tune a pre-trained micro-BERT sentence transformer model. • Concatenate the output of a query classifier to the query keywords.
  7. Results Model Dataset Name Pearson’s correlation query-sim-ecom eBay Internal 0.87

    query-sim-ecom ESCI query-query 0.85 all-MiniLM-L12-v2 ESCI query-query 0.68 Query 1 Query 2 cosine hdmi to galaxy s8 s9 hdmi 0.9993 movie money prop money 0.9995 cassette adapter for iphone tape to aux 0.9993 Examples from ESCI of queries with low surface but high behavioral similarity:
  8. Summary • Queries with equivalent intent should yield equivalent experiences.

    • Query similarity can increase recall while preserving precision. • Signals can come from either surface or behavioral similarity. • Offline bag-of-documents model: queries as means of product vectors. • Fine-tune online Micro-BERT sentence transformer model for tail queries. • It just works!