Phrase-based searching in an information retrieval system
First Claim
Patent Images
1. A computer-implemented method comprising:
- obtaining, from a phrase-based index for an Internet search engine, a list of documents from a collection of documents available via the Internet that contain a first phrase, the first phrase being relevant to a query;
for each document in the list;
determining, using related phrase information stored in the index for each document in the list of documents, whether the document includes one or more related phrases of the first phrase, where each related phrase has an actual co-occurrence rate of the related phrase and the first phrase in the document collection that exceeds an expected co-occurrence rate of the related phrase and the first phrase in the document collection;
ranking the documents in the list based on a quantity of related phrases determined for each document, so that documents with more related phrases are ranked higher than documents with fewer related phrases; and
selecting at least some of the highest-ranked documents to include in a result to the query.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.
233 Citations
18 Claims
-
1. A computer-implemented method comprising:
-
obtaining, from a phrase-based index for an Internet search engine, a list of documents from a collection of documents available via the Internet that contain a first phrase, the first phrase being relevant to a query; for each document in the list; determining, using related phrase information stored in the index for each document in the list of documents, whether the document includes one or more related phrases of the first phrase, where each related phrase has an actual co-occurrence rate of the related phrase and the first phrase in the document collection that exceeds an expected co-occurrence rate of the related phrase and the first phrase in the document collection; ranking the documents in the list based on a quantity of related phrases determined for each document, so that documents with more related phrases are ranked higher than documents with fewer related phrases; and selecting at least some of the highest-ranked documents to include in a result to the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for selecting documents from a document collection in response to a query, the system comprising:
-
one or more memory devices configured store executable instructions; and one or more processors configured to execute the stored instructions to cause the system to; obtain, from a phrase-based index for an Internet search engine, a list of documents from a collection of documents available via the Internet that contain a first phrase, the first phrase being relevant to a query, for each document in the list;
determine, using related phrase information stored in the index for each document in the list of documents, whether the document includes one or more related phrases of the first phrase, where each related phrase has an actual co-occurrence rate of the related phrase and the first phrase in the document collection that exceeds an expected co-occurrence rate of the related phrase and the first phrase in the document collection,rank the documents in the list based on a quantity of related phrases determined for each document, so that documents with more related phrases are ranked higher than documents with fewer related phrases, and select at least some of the highest-ranked documents to include in a result to the query. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification