Phrase-based searching in an information retrieval system
First Claim
Patent Images
1. A computer-implemented method of selecting documents in a document collection in response to a query, the method comprising:
- receiving a query including a first phrase and a second phrase;
retrieving, by at least one processor of a computing system, a posting list of documents containing the first phrase;
for each document in the posting list;
accessing, by at least one processor of the computing system, a list of related phrases of the first phrase, wherein the list indicates whether a related phrase is present in the document, the first phrase predicting the occurrence of each of the related phrases in the document collection, wherein the first phrase predicts an occurrence of a related phrase based on a measure of an actual co-occurrence rate of the related phrase and the first phrase in the document collection exceeding an expected co-occurrence rate of the related phrase and the first phrase in the document collection;
comparing, by at least one processor of the computing system, the second phrase to the list of related phrases that are present in the document; and
when the comparison indicates that the second phrase is a related phrase of the first phrase that is present in the document, then selecting the document to include in a result to the query, without retrieving a posting list of documents containing the second phrase.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.
236 Citations
19 Claims
-
1. A computer-implemented method of selecting documents in a document collection in response to a query, the method comprising:
-
receiving a query including a first phrase and a second phrase; retrieving, by at least one processor of a computing system, a posting list of documents containing the first phrase; for each document in the posting list; accessing, by at least one processor of the computing system, a list of related phrases of the first phrase, wherein the list indicates whether a related phrase is present in the document, the first phrase predicting the occurrence of each of the related phrases in the document collection, wherein the first phrase predicts an occurrence of a related phrase based on a measure of an actual co-occurrence rate of the related phrase and the first phrase in the document collection exceeding an expected co-occurrence rate of the related phrase and the first phrase in the document collection; comparing, by at least one processor of the computing system, the second phrase to the list of related phrases that are present in the document; and when the comparison indicates that the second phrase is a related phrase of the first phrase that is present in the document, then selecting the document to include in a result to the query, without retrieving a posting list of documents containing the second phrase. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for selecting documents from a document collection in response to a query, the system comprising:
-
one or more memory devices configured store executable instructions; and one or more processors configured to execute the stored instructions to cause the system to; receive a query including a first phrase and a second phrase; retrieve a posting list of documents containing the first phrase;
for each document in the posting list;access a list of related phrases of the first phrase, wherein the lists indicates whether a related phrase is present in the document, the first phrase predicting the occurrence of each of the related phrases in the document collection based on a measure of an actual co-occurrence rate of the related phrase and the first phrase in the document collection exceeding an expected co-occurrence rate of the related phrase and the first phrase in the document collection; compare the second phrase to the list of related phrases that are present document; and when the comparison indicates that the second phrase is a related phrase of the first phrase that is present in the document, then select the document to include in a result to the query, without retrieving a posting list of documents containing the second phrase. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
Specification