Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond Ranking: Focus on Content and Query Unde...

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Beyond Ranking: Focus on Content and Query Understanding

This 2025 talk, given as a teaser for Doug Turnbull and Trey Grainger's class on AI-powered search, makes the case for prioritizing content understanding and query understanding in your search applications to ensure relevance and thus enable better ranking. It also explains ways to use the bag-of-documents model for holistic query understanding.

Avatar for Daniel Tunkelang

Daniel Tunkelang

May 20, 2026

More Decks by Daniel Tunkelang

Other Decks in Technology

Transcript

  1. tl;dr Ranking optimizes order of relevant results – at best.

    Content and query understanding ensures retrieval of relevant results.
  2. Let’s make the case for ranking. The searcher has an

    information need. Each result provides some utility. Sort all results by expected utility. Sounds reasonable, right?
  3. earbuds under $30 The problem here isn’t the ranking of

    results. It’s the failure to understand the query and the retrieval of irrelevant results.
  4. Problems with ranking. Similar results are equally relevant. van Rijsbergen,

    1979 Ranking ≠ Relevance ≠ Utility. Turpin and Scholer, 2006 Utility of results is not additive. Non-relevant results = negative utility.
  5. A better approach. Content understanding: Establish a robust representation of

    the content. Query understanding: Establish a robust representation of the query. Align these to retrieve relevant content, then rank.
  6. Understanding the query means retrieving the right, relevant results. Ranking

    still matters, but it is secondary to relevance. Better yet, it can assume that results are relevant!
  7. Reductionist and Holistic Approaches Reductionist: break the problem into parts.

    Holistic: solve the problem as a whole. Complementary approaches – so use both!
  8. Step 1: Content Understanding Reductionist approach: Extract content attributes to

    populate structured data. Holistic approach: Populate vectors so that cosines reflects similarity / substitutability.
  9. Evaluating Content Understanding Reductionist approach: Compute precision and recall of

    structured data. Holistic approach: Correlate vector similarity to ground truth. If you don’t have ground truth, use an LLM to generate it.
  10. Step 2: Query Understanding Reductionist approach: Extract query attributes to

    populate structured data. Holistic approach: Bag-of-documents query vector that aggregates relevant result vectors.
  11. Evaluating Query Understanding Reductionist approach: Compute precision and recall of

    query attributes. Holistic approach: Correlate query-result and query-query similarity to ground truth. Again, you can use an LLM to generate it.
  12. What’s a Bag of Documents? Straight Talk Apple iPhone 13,

    128GB, Midnight - Prepaid Smartphone [Locked to Straight Talk] HP 14 inch Laptop Intel Core i3-N305 8GB RAM 256GB SSD Moonlight Blue Beats Solo3 Wireless On-Ear Headphones - Gold Search queries often don’t look like document titles. iphone laptops headphones
  13. Computing Bag-of-Documents Vector [0.13, 0.81, …], [0.09, 0.75, …], [0.98,

    0.77, …],… [0.11, 0.79, … ] mens black tshirts Easy for head queries. More work to train model that generalizes to tail and unseen queries. query → documents → aggregate vector
  14. ► ► [0.13, 0.81, … ] [0.09, 0.75, … ]

    … ► [0.11, 0.79, … ] [0.13, 0.81, … ] [0.09, 0.77, … ] … ► [0.12, 0.78, … ] ► cos = 0.98 black tshirts for men mens black t-shirt Computing Query Similarity
  15. Improve Recall and More! Replace token-level expansion with holistic approach.

    Identifying equivalent queries defragments intents spread across queries in autocomplete, search suggestions, etc. Can even relate keyword queries to browse nodes!
  16. [0.13, 0.81, …], [0.09, 0.75, …], [0.98, 0.77, …],… [0.11,

    0.79, … ] mens black tshirts 0.82 0.75 0.81 0.79 Computing Query Specificity Broad queries have low specificity, while narrow queries have high specificity.
  17. Optimize the Search Journey! Low query specificity can trigger interface

    elements that elicit more signal from the searcher, e.g., refinements. Autocomplete can favor high-specificity queries, which are more likely to lead to a conversion. High specificity makes relevance is critical; low specificity means more room to trade off relevance for desirability.
  18. Ranking still matters! All results should be relevant, but not

    all relevant results are equally valuable to searchers. Ranking should reflect desirability, personalization, etc. In fact, getting query understanding and relevance right is what makes it possible for ranking do its job!
  19. Summary Ranking can only optimize after relevance is guaranteed. Invest

    in content and query understanding first. Apply reductionist and holistic methods together.