Modeling Queries as Bags of Documents

Modeling Queries as Bags of Documents Daniel Tunkelang Aritra Mandal,
Search @ eBay Inc.

Overview • Motivation ◦ Why model queries as bags of
documents? • Approach ◦ How do we compute the query vectors? • Applications ◦ What can we do with these query vectors?

• Search queries don’t always look like product titles. iphone
laptops headphones Motivation Straight Talk Apple iPhone 13, 128GB, Midnight - Prepaid Smartphone [Locked to Straight Talk] HP 14 inch Laptop Intel Core i3-N305 8GB RAM 256GB SSD Moonlight Blue Beats Solo3 Wireless On-Ear Headphones - Gold

What We Don’t Want iphone

Fundamental Misalignment • Not just a matter of translation or
rewording. • Query intents vary in specificity and can be broad, while products are inherently singular. • So we cannot simply translate the query into a hypothetical document. [Gao et al, 2023]

Query Understanding • If search application uses embedding-based retrieval, then
it is essential to have robust query vectors! [−0.9704, 0.2045, 0.1281 … ] Daniel

Note: Density is not Destiny! • Most queries are short
sequences of entities. • For such queries, focus on precision and desirability. • Embedding-based retrieval helps more for recall. • Query similarity addresses tail queries for head intents.

Approach [0.13, 0.81, …], [0.09, 0.75, …], [0.98, 0.77, …],…
[0.11, 0.79, … ] mens black tshirts • Model query as a bag of documents. [Mandal et al, 2023]

Frequent Queries • Associate queries with products based on engagement.
• Can use clicks, purchases, or other engagement signals. • Aggregate product vectors to obtain query representation. • Specifically compute the mean and a kind of “variance”.

Infrequent Queries • Could obtain products from retrieval at query
time. • More efficient to train a sentence transformer model.

Applications • Bag-of-documents model enables robust retrieval through better query
understanding.

Query Similarity ► ► [0.13, 0.81, … ] [0.09, 0.75,
… ] … ► [0.11, 0.79, … ] [0.13, 0.81, … ] [0.09, 0.77, … ] … ► [0.12, 0.78, … ] ► cos = 0.98 black tshirts for men mens black t-shirt

Failing to Recognize Query Equivalence

Improving Recall and More • Query similarity can replace token-level
query expansion and query relaxation with a holistic approach. • Identifying equivalent queries defragments search intents that are spread across multiple queries in autocomplete, search suggestions, analytics, etc. • Can even relate keyword queries to browse nodes.

Same Intent in Search and Browse

Same Intent in Keywords vs. Facets

Query Specificity

Specificity = Variance • Query vector is mean of the
document vectors in the bag. • Query specificity measures how tightly the document vectors in the bag cluster around the mean query vector. • Specificity is a continuous measure that captures the intuition of a spectrum between broad and narrow intent.

Examples clothing mens shoes mens af1 sz 9

Computing Specificity for Frequent Queries [0.13, 0.81, …], [0.09, 0.75,
…], [0.98, 0.77, …],… [0.11, 0.79, … ] mens black tshirts 0.82 0.75 0.81 0.79 • Mean of cosine of query vector and document vectors.

Computing Specificity for Infrequent Queries • Can compute based on
retrieval at query time. • Better to train a transformer-based regression model.

Informing Search Experience and Tradeoffs • Low query specificity can
trigger interface elements that elicit more signal from the searcher, e.g., refinements. • Autocomplete can favor high-specificity queries, which are more likely to lead to a conversion. • High specificity means that relevance is critical, while lower specificity suggests there is more room to trade off relevance for desirability or other factors.

Summary • Bag-of-documents model aligns query and product vectors. •
Aggregate document vectors to obtain query vectors for frequent queries. Train a model for infrequent queries. • Apply bag-of-documents to compute query similarity and specificity, improving retrieval and search experience.

Thank You! Daniel Tunkelang [email protected] https://www.linkedin.com/in/dtunkelang/ https://dtunkelang.medium.com/ https://queryunderstanding.com/ http://contentunderstanding.com/ Aritra
Mandal [email protected] https://www.linkedin.com/in/aritram

Modeling Queries as Bags of Documents

Modeling Queries as Bags of Documents

Daniel Tunkelang

More Decks by Daniel Tunkelang

Other Decks in Technology

Featured

Transcript