Slide 9
Slide 9 text
Google acknowledges query-only based
matching is pretty terrible.
“Direct “Boolean” matching of query terms has well known limitations,
and in particular does not identify documents that do not have the
query terms, but have related words [...]The problem here is that
conventional systems index documents based on individual terms,
rather than on concepts. Concepts are often expressed in phrases [...]
Accordingly, there is a need for an information retrieval system and
methodology that can comprehensively identify phrases in a large
scale corpus, index documents according to phrases, search and rank
documents in accordance with their phrases, and provide additional
clustering and descriptive information about the documents. [...]”
- Information retrieval system for archiving multiple document
versions, granted 2017 (link)