Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Is Search Broken?

Is Search Broken?

This 2008 Carnegie Mellon Night presentation at Fidelity's Center for Advanced Technology discusses Endeca's approach to faceted search and human-computer information retrieval (HCIR).

Avatar for Daniel Tunkelang

Daniel Tunkelang

May 24, 2026

More Decks by Daniel Tunkelang

Other Decks in Technology

Transcript

  1. © 2008 Endeca Technologies, Inc. All rights reserved. Is Search

    Broken?! Daniel Tunkelang Chief Scientist, Endeca
  2. © 2008 Endeca Technologies, Inc. All rights reserved. 2 howdy!

    • 1992: Bachelor’s + Master’s from MIT in CS + Math • 1998: PhD from CMU in CS (ACO program) • 1999: Co-founded Endeca! • 2008: ???
  3. © 2008 Endeca Technologies, Inc. All rights reserved. 3 overview

    • Who is Endeca? • Is search broken? • If it is, what can we do about it?
  4. © 2008 Endeca Technologies, Inc. All rights reserved. 4 who

    / what is endeca? • Software to help people explore, analyze, and understand complex information, guiding them to unexpected insights and better decisions. • 500+ customers • $108M revenue in 2007.
  5. © 2008 Endeca Technologies, Inc. All rights reserved. 9 search

    hits a wall in knowledge management Current Search: it outsourcing
  6. © 2008 Endeca Technologies, Inc. All rights reserved. 10 search

    even hits a wall on the web Results 1-10 out of about 344,000,000 for ir
  7. © 2008 Endeca Technologies, Inc. All rights reserved. 14 or

    do they? 78% wish search engines could read their minds. What frustrates users most? – 25%: deluge of results – 24%: too many paid listings – 19%: inability to understand their keywords – 19%: disorganized / random results The State of Search Autobytel & Kelton Research, Oct ’07
  8. © 2008 Endeca Technologies, Inc. All rights reserved. 15 web

    search vs. enterprise search “Search on the internet is solved. I always find what I need. But why not in the enterprise? Seems like a solution waiting to happen.” - a Fortune 500 CTO
  9. © 2008 Endeca Technologies, Inc. All rights reserved. 17 precision

    = fraction of retrieved documents that are relevant recall = fraction of relevant documents that are retrieved retrieved documents relevant documents
  10. © 2008 Endeca Technologies, Inc. All rights reserved. 18 the

    truth, nothing but the truth why improve precision?
  11. © 2008 Endeca Technologies, Inc. All rights reserved. 20 the

    truth, the whole truth, nothing but the truth what we want…
  12. © 2008 Endeca Technologies, Inc. All rights reserved. 22 Precision…to

    avoid annoying users with irrelevant results? which should we favor? Recall…to make sure we don’t throw away results the user wants / needs?
  13. © 2008 Endeca Technologies, Inc. All rights reserved. 25 you

    get what you pay for • There are easy use cases… – 30% of queries are navigational. – 30% of queries lead to Wikipedia pages. – Users won’t pay, but advertisers will! • …and hard use cases. – Queries where recall matters. – Exploratory search. – Enterprises will pay for insight.
  14. © 2008 Endeca Technologies, Inc. All rights reserved. 27 technology

    alone can’t provide insight • The system can’t read your mind. • Your spouse / best friend can’t read your mind. • Sometimes you can’t read your own mind.
  15. © 2008 Endeca Technologies, Inc. All rights reserved. 29 technology

    is a catalyst • Computers are good at analysis. • People are good at using what they know. • How do we get the best of both worlds?
  16. © 2008 Endeca Technologies, Inc. All rights reserved. 31 human-computer

    information retrieval • Instead of guessing the user’s intent, optimize communication. • De-emphasize the top ten documents; response is a set of documents. • Think beyond single queries; support refinement and exploration.
  17. © 2008 Endeca Technologies, Inc. All rights reserved. 34 endeca's

    approach: guided summarization • Set retrieval that responds to queries with – an overview of the user's current context. – an organized set of options for incremental exploration. • Contextual summaries of document sets optimize system’s communication with user. • Query refinement options optimize user’s communication with system.
  18. © 2008 Endeca Technologies, Inc. All rights reserved. 35 guided

    summarization for ecommerce Matching Categories include: Appliances > Small Appliances > Irons & Steamers Appliances > Small Appliances > Microwaves & Steamers Bath > Sauna & Spas > Steamers Kitchen > Bakeware & Cookware > Cookware > Open Stock Pots > Double Boilers & Steamers Kitchen > Small Appliances > Steamers
  19. © 2008 Endeca Technologies, Inc. All rights reserved. 37 Guided

    summarization starts with faceted search.
  20. © 2008 Endeca Technologies, Inc. All rights reserved. 43 dynamic

    topic facet Subject Electronic data processing (1002) Distributed processing (937) Parallel processing (619) Computer networks (562) Fault-tolerant-computing (365) Show more… Subject Artificial intelligence (227) High performance computing (244) Automatic theorem proving (9) History (11) Client/server computing (185) Information technology (145) Computer algorithms (110) Java (77) Computer architecture (162) Law and legislation (70) Computer networks (552) Logic, Symbolic and mathematical (16) Computer programs (139) Mathematics (70) Computer security (151) Mobile communication systems (54) Computer software (253) Operating systems (87) Computers (124) Parallel processing (619) Database management (277) Research (83) Distributed processing (937) Software engineering (197) Electronic data processing (1002) Supercomputers (139) Electronic digital computers (148) Web databases (54) Fault-tolerant computing (365) Wireless communication systems (97)
  21. © 2008 Endeca Technologies, Inc. All rights reserved. 44 facets

    populated using entity extraction apple production
  22. © 2008 Endeca Technologies, Inc. All rights reserved. 45 cutting

    through facets to show the big picture Search: storage
  23. © 2008 Endeca Technologies, Inc. All rights reserved. 47 guided

    summarization – a summary Guided summarization enables a dialog between the user and the data, enabling exploration and discovery.
  24. © 2008 Endeca Technologies, Inc. All rights reserved. 49 think

    outside the box • Search works for many use cases. • But not for some of the most valuable ones. • Focus on human-computer information retrieval.