Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reconsidering Relevance

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Reconsidering Relevance

This 2009 Google Tech Talk focuses on search concerns beyond relevance.

We've become complacent about relevance. The overwhelming success of web search engines has lulled even information retrieval (IR) researchers to expect only incremental improvements in relevance in the near future. And beyond web search, there are still broad search problems where relevance still feels hopelessly like the pre-Google web.

But even some of the most basic IR questions about relevance are unresolved. We take for granted the very idea that a computer can determine which documents are relevant to a person's needs. And we still rely on two-word queries (on average) to communicate a user's information need. But this approach is a contrivance; in reality, we need to think of information-seeking as a problem of optimizing the communication between people and machines.

We can do better. In fact, there are a variety of ongoing efforts to do so, often under the banners of "interactive information retrieval", "exploratory search", and "human computer information retrieval". This presentations discusses these initiatives and how they are helping to move "relevance" beyond today's outdated assumptions.

Avatar for Daniel Tunkelang

Daniel Tunkelang

May 25, 2026

More Decks by Daniel Tunkelang

Other Decks in Technology

Transcript

  1. © 2009 Endeca Technologies, Inc. All rights reserved. 2 howdy!

    • 1988 – 1992 • 1993 – 1998 • 1999 -
  2. © 2009 Endeca Technologies, Inc. All rights reserved. 3 overview

    what is relevance? what’s wrong with relevance? what are the alternatives?
  3. © 2009 Endeca Technologies, Inc. All rights reserved. 5 iconic

    businesses of the 20th and 21st centuries I’m Feeling Lucky
  4. © 2009 Endeca Technologies, Inc. All rights reserved. 9 an

    interesting contrast “Search on the internet is solved. I always find what I need. But why not in the enterprise? Seems like a solution waiting to happen.” - a Fortune 500 CTO
  5. © 2009 Endeca Technologies, Inc. All rights reserved. 10 the

    real questions • What is “search on the internet” and why is it perceived a solved problem? • What is “search in the enterprise” and why is it perceived as an unsolved problem? • And what does this have to do with relevance?
  6. © 2009 Endeca Technologies, Inc. All rights reserved. 11 easy

    vs. hard search problems • easy where to buy Ender in Exile? • hard good novel to read on the beach? • easy proof that sorting has n log n lower bound? • hard algorithm to sort partially ordered set, given a constant-time comparator?
  7. © 2009 Endeca Technologies, Inc. All rights reserved. 12 what

    is relevance? what’s wrong with relevance? what are the alternatives?
  8. © 2009 Endeca Technologies, Inc. All rights reserved. 13 defining

    relevance Relevance is defined as a measure of information conveyed by a document relative to a query. It is shown that the relationship between the document and the query, though necessary, is not sufficient to determine relevance. William Goffman, On relevance as a measure, 1964.
  9. © 2009 Endeca Technologies, Inc. All rights reserved. 15 let’s

    work top-down • information retrieval (IR) = study of retrieval of information (not data) from collection of written documents retrieved documents aim at satisfying user information need
  10. © 2009 Endeca Technologies, Inc. All rights reserved. 16 IR

    assumes information needs • user information need = natural language declaration of informational need of user • query = expression of user information need in input language provided by information system
  11. © 2009 Endeca Technologies, Inc. All rights reserved. 17 relevance

    drives IR modeling • modeling = studies algorithms used for ranking documents according to system assigned likelihood of relevance • model = a set of premises and an algorithm for ranking documents with regard to a user query
  12. © 2009 Endeca Technologies, Inc. All rights reserved. 18 a

    relevance-centric approach information Need query select from results rank using IR model USER: SYSTEM: tf-idf PageRank
  13. © 2009 Endeca Technologies, Inc. All rights reserved. 19 what

    is relevance? what’s wrong with relevance? what are the alternatives?
  14. © 2009 Endeca Technologies, Inc. All rights reserved. 20 our

    first communication problem information need query • 2 words? • natural language? • telepathy?
  15. © 2009 Endeca Technologies, Inc. All rights reserved. 21 and

    the game of telephone continues query rank using IR model • cumulative error • relevance is subjective • what Goffman said
  16. © 2009 Endeca Technologies, Inc. All rights reserved. 22 and

    hopefully users feel lucky rank using IR model • selection bias • inefficient channel • backup plan? select from results
  17. © 2009 Endeca Technologies, Inc. All rights reserved. 23 queries

    are misinterpreted Results 1-10 out of about 344,000,000 for ir
  18. © 2009 Endeca Technologies, Inc. All rights reserved. 25 assumptions

    of relevance-centric approach • self-awareness • self-expression • model knows best • answer is a document • one-shot query
  19. © 2009 Endeca Technologies, Inc. All rights reserved. 27 what

    is relevance? what’s wrong with relevance? what are the alternatives?
  20. © 2009 Endeca Technologies, Inc. All rights reserved. 28 human-computer

    information retrieval • don’t just guess the user’s intent – optimize communication • increase user responsibility and control – require and reward human intellectual effort “Toward Human-Computer Information Retrieval” Gary Marchionini
  21. © 2009 Endeca Technologies, Inc. All rights reserved. 30 a

    concrete use case • Colleague: Hey Daniel! You should check out what this guy Steve Pollitt’s been researching. Sounds right up your alley. • Daniel: Sure thing, I’ll look into it.
  22. © 2009 Endeca Technologies, Inc. All rights reserved. 41 practical

    considerations • which facets to show • which facet values to show • when to suggest faceted refinement • how to automate faceted classification
  23. © 2009 Endeca Technologies, Inc. All rights reserved. 44 query-driven

    clarification before refinement Matching Categories include: Appliances > Small Appliances > Irons & Steamers Appliances > Small Appliances > Microwaves & Steamers Bath > Sauna & Spas > Steamers Kitchen > Bakeware & Cookware > Cookware > Open Stock Pots > Double Boilers & Steamers Kitchen > Small Appliances > Steamers
  24. © 2009 Endeca Technologies, Inc. All rights reserved. 45 results-driven

    clarification before refinement Search: storage
  25. © 2009 Endeca Technologies, Inc. All rights reserved. 47 recall

    precision hcir cheats the precision / recall trade-off
  26. © 2009 Endeca Technologies, Inc. All rights reserved. 48 set

    retrieval 2.0 • set retrieval that responds to queries with – overview of the user's current context – organized set of options for exploration • contextual summaries of document sets – optimize system’s communication with user • query refinement options – optimize user’s communication with system
  27. © 2009 Endeca Technologies, Inc. All rights reserved. 49 hcir

    using set retrieval 2.0 emphasize set summaries over ranked lists establish a dialog between the user and the data enable exploration and discovery
  28. © 2009 Endeca Technologies, Inc. All rights reserved. 50 think

    outside the (search) box • relevance-centric search solves many use cases • but not some of the most valuable ones • support interaction, exploration • human-computer information retrieval
  29. © 2009 Endeca Technologies, Inc. All rights reserved. 52 “Google's

    mission is to organize the world's information and make it universally accessible and useful.”
  30. © 2009 Endeca Technologies, Inc. All rights reserved. 54 thank

    you communication 1.0 email: [email protected] communication 2.0 blog: http://thenoisychannel.com twitter: http://twitter.com/dtunkelang