Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Set Retrieval 2.0

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Set Retrieval 2.0

This 2008 MIT CSAIL presentation discusses the evolution and current state of search technology, highlighting user frustrations and the need for improved enterprise search solutions. It introduces 'Set Retrieval 2.0', which combines traditional search with user guidance, allowing for better communication and exploration of information. The framework emphasizes the importance of contextual summaries and user engagement to enhance search effectiveness and meet complex information needs.

Avatar for Daniel Tunkelang

Daniel Tunkelang

May 26, 2026

More Decks by Daniel Tunkelang

Other Decks in Technology

Transcript

  1. © 2008 Endeca Technologies, Inc. All rights reserved. Set Retrieval

    2.0 Daniel Tunkelang Chief Scientist, Endeca
  2. © 2008 Endeca Technologies, Inc. All rights reserved. 2 howdy!

    • 1988 – 1992 • 1993 – 1998 • 1999 -
  3. © 2008 Endeca Technologies, Inc. All rights reserved. 3 overview

    what’s right with search today? what’s wrong with search today? how do we fix it?
  4. © 2008 Endeca Technologies, Inc. All rights reserved. 14 …though

    they do have complaints 78% wish search engines could read their minds what frustrates users most? – 25%: deluge of results – 24%: too many paid listings – 19%: inability to understand their keywords – 19%: disorganized / random results The State of Search Autobytel & Kelton Research, Oct ’07
  5. © 2008 Endeca Technologies, Inc. All rights reserved. 15 web

    search vs. enterprise search “Search on the internet is solved. I always find what I need. But why not in the enterprise? Seems like a solution waiting to happen.” - a Fortune 500 CTO
  6. © 2008 Endeca Technologies, Inc. All rights reserved. 16 enterprise

    users really have complaints Why is Joe the Knowledge Worker so upset? – 49%: finding the information needed to do their job is difficult and time consuming – 50%: findability within organization worse than on their own consumer-facing site Market IQ Report on Findability AIIM, June ’08
  7. © 2008 Endeca Technologies, Inc. All rights reserved. 18 the

    library and information science critique • models – relevance is subjective • evaluation – neglects interactivity • tools – no support for exploration
  8. © 2008 Endeca Technologies, Inc. All rights reserved. 19 the

    rebuttal "Tell us what to do, and we will do it."
  9. © 2008 Endeca Technologies, Inc. All rights reserved. 21 we

    need to call a truce - real, effective systems - that support interaction - cost-effective to evaluate
  10. © 2008 Endeca Technologies, Inc. All rights reserved. 23 then

    vs. now • known-item search was an open problem – now it’s a commodity • library and information science ideas of the 80s – ahead of their time • now we can find known items – let’s tackle more ambitious information needs
  11. © 2008 Endeca Technologies, Inc. All rights reserved. 28 precision

    = fraction of retrieved documents that are relevant recall = fraction of relevant documents that are retrieved retrieved documents relevant documents set retrieval
  12. © 2008 Endeca Technologies, Inc. All rights reserved. 31 set

    retrieval 2.0 = set retrieval + guidance Did you mean: guidance Related Searches Guidance Counselor Salary Guidance Counselor Job Description Definition of Guidance Guidance Counseling History of Guidance Counseling Child Guidance Career Guidance What Is the Meaning of Guidance Free Marriage Counseling Problems in Marriage Career Exploration Role of School Counselor
  13. © 2008 Endeca Technologies, Inc. All rights reserved. 32 guidance

    vs. mind reading • system can’t read your mind • spouse / best friend can’t read your mind • sometimes you can’t read your own mind
  14. © 2008 Endeca Technologies, Inc. All rights reserved. 35 human-computer

    information retrieval • don’t just guess the user’s intent – optimize communication • de-emphasize the top ten documents – response is a set of documents • think beyond single queries – support refinement and exploration
  15. © 2008 Endeca Technologies, Inc. All rights reserved. 38 set

    retrieval 2.0 • set retrieval that responds to queries with – overview of the user's current context – organized set of options for exploration • contextual summaries of document sets – optimize system’s communication with user • query refinement options – optimize user’s communication with system
  16. © 2008 Endeca Technologies, Inc. All rights reserved. 42 query-driven

    clarification before refinement Matching Categories include: Appliances > Small Appliances > Irons & Steamers Appliances > Small Appliances > Microwaves & Steamers Bath > Sauna & Spas > Steamers Kitchen > Bakeware & Cookware > Cookware > Open Stock Pots > Double Boilers & Steamers Kitchen > Small Appliances > Steamers
  17. © 2008 Endeca Technologies, Inc. All rights reserved. 43 results-driven

    clarification before refinement Search: storage
  18. © 2008 Endeca Technologies, Inc. All rights reserved. 45 dynamic

    topic facet Subject Electronic data processing (1002) Distributed processing (937) Parallel processing (619) Computer networks (562) Fault-tolerant-computing (365) Show more…
  19. © 2008 Endeca Technologies, Inc. All rights reserved. 46 facets

    populated using entity extraction apple production
  20. © 2008 Endeca Technologies, Inc. All rights reserved. 49 hcir

    using set retrieval 2.0 emphasize set summaries over ranked lists establish a dialog between the user and the data enable exploration and discovery
  21. © 2008 Endeca Technologies, Inc. All rights reserved. 50 think

    outside the (search) box • best-first search works for many use cases • but not for some of the most valuable ones • set retrieval 2.0 = set retrieval + guidance • human-computer information retrieval
  22. © 2008 Endeca Technologies, Inc. All rights reserved. 51 thank

    you communication 1.0 email: [email protected] communication 2.0 blog: http://thenoisychannel.com twitter: http://twitter.com/dtunkelang