Phrase-based searching in an information retrieval system
First Claim
Patent Images
1. A computer-implemented method of selecting documents in a document collection in response to a query, the method comprising:
- receiving a query;
identifying an incomplete phrase in the query, wherein other phrases predicted by the incomplete phrase in the document collection include only phrase extensions of the incomplete phrase;
replacing the incomplete phrase with a phrase extension, wherein the phrase extension of the incomplete phrase is a super-sequence of the incomplete phrase that begins with the incomplete phrase, and wherein the incomplete phrase predicts the phrase extension based on a measure of an actual co-occurrence rate of the phrase extension and the incomplete phrase exceeding an expected co-occurrence rate of the phrase extension and the incomplete phrase in the document collection, the expected co-occurrence rate of the phrase extension and the incomplete phrase being a function of a plurality of occurrences of the phrase extension and the incomplete phrase in the document collection;
selecting documents from the document collection containing the phrase extension; and
storing the selected documents in a memory as part of a search result.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.
-
Citations
10 Claims
-
1. A computer-implemented method of selecting documents in a document collection in response to a query, the method comprising:
-
receiving a query; identifying an incomplete phrase in the query, wherein other phrases predicted by the incomplete phrase in the document collection include only phrase extensions of the incomplete phrase; replacing the incomplete phrase with a phrase extension, wherein the phrase extension of the incomplete phrase is a super-sequence of the incomplete phrase that begins with the incomplete phrase, and wherein the incomplete phrase predicts the phrase extension based on a measure of an actual co-occurrence rate of the phrase extension and the incomplete phrase exceeding an expected co-occurrence rate of the phrase extension and the incomplete phrase in the document collection, the expected co-occurrence rate of the phrase extension and the incomplete phrase being a function of a plurality of occurrences of the phrase extension and the incomplete phrase in the document collection; selecting documents from the document collection containing the phrase extension; and storing the selected documents in a memory as part of a search result. - View Dependent Claims (2)
-
-
3. A computer-implemented method of selecting documents in a document collection in response to a query, the method comprising:
-
receiving a query including a first phrase and a second phrase; retrieving a posting list of documents containing the first phrase; for each document in the posting list; accessing a list indicating related phrases of the first phrase that are present in the document, the first phrase predicting the occurrence of each of the related phrases in the document collection, wherein the first phrase predicts a related phrase based on a measure of an actual co-occurrence rate of the related phrase and the first phrase exceeding an expected co-occurrence rate of the related phrase and the first phrase in the document collection, the expected co-occurrence rate of the related phrase and the first phrase being a function of a plurality of occurrences of the related phrase and the first phrase in the document collection; responsive to the list of related phrases indicating that the second phrase is present in a document, selecting the document to include in a result to the query, without retrieving a posting list of documents containing the second phrase; and storing the selected documents in a memory as part of a search result. - View Dependent Claims (4, 5, 6)
-
-
7. A computer readable storage medium storing a computer program executable by a processor for selecting documents in a document collection in response to a query, the actions of the computer program comprising:
-
receiving a query; identifying an incomplete phrase in the query, wherein other phrases predicted by the incomplete phrase in the document collection include only phrase extensions of the incomplete phrase; replacing the incomplete phrase with a phrase extension, wherein the phrase extension of the incomplete phrase is a super-sequence of the incomplete phrase that begins with the incomplete phrase, and wherein the incomplete phrase predicts the phrase extension based on a measure of an actual co-occurrence rate of the phrase extension and the incomplete phrase exceeding an expected co-occurrence rate of the phrase extension and the incomplete phrase in the document collection, the expected co-occurrence rate of the phrase extension and the incomplete phrase being a function of a plurality of occurrences of the phrase extension and the incomplete phrase in the document collection; selecting documents from the document collection containing the phrase extension; and storing the selected documents in a memory as part of a search result.
-
-
8. A computer readable storage medium storing a computer program executable by a processor for selecting documents in a document collection in response to a query, the actions of the computer program comprising:
-
receiving a query including a first phrase and a second phrase; retrieving a posting list of documents containing the first phrase; for each document in the posting list; accessing a list indicating related phrases of the first phrase that are present in the document, the first phrase predicting the occurrence of each of the related phrases in the document collection, wherein the first phrase predicts a related phrase based on a measure of an actual co-occurrence rate of the related phrase and the first phrase exceeding an expected co-occurrence rate of the related phrase and the first phrase in the document collection, the expected co-occurrence rate of the related phrase and the first phrase being a function of a plurality of occurrences of the related phrase and the first phrase in the document collection; responsive to the list of related phrases indicating that the second phrase is present in a document, selecting the document to include in a result to the query, without retrieving a posting list of documents containing the second phrase; and storing the selected documents in a memory as part of a search result.
-
-
9. A system for selecting documents in a document collection in response to a query, comprising:
-
a memory configured for storing a phrase data repository, comprising a list of incomplete phrases; and a processor configured for operating a query processing module configured to cause an apparatus to; receive a query; identify, in the query, an incomplete phrase from the list of incomplete phrases, wherein other phrases predicted by the incomplete phrase in the document collection include only phrase extensions of the incomplete phrase; replace the incomplete phrase with a phrase extension, wherein the phrase extension of the incomplete phrase is a super-sequence of the incomplete phrase that begins with the incomplete phrase, and wherein the incomplete phrase predicts the phrase extension based on a measure of an actual co-occurrence rate of the phrase extension and the incomplete phrase exceeding an expected co-occurrence rate of the phrase extension and the incomplete phrase in the document collection, the expected co-occurrence rate of the phrase extension and the incomplete phrase being a function of a plurality of occurrences of the phrase extension and the incomplete phrase in the document collection; select documents from the document collection containing the phrase extension; and store the selected documents in a memory as part of a search result.
-
-
10. A system for selecting documents in a document collection in response to a query, comprising:
-
a memory configured for storing a phrase-based index comprising a list of phrases and a plurality of phrase posting lists associated with a phrase from the list of phrases; and a processor configured for operating a query processing module configured to cause an apparatus to; receive a query including a first phrase and a second phrase; retrieve, from the phrase-based index, a posting list of documents containing the first phrase; for each document in the posting list; access a list indicating related phrases of the first phrase that are present in the document, the first phrase predicting the occurrence of each of the related phrases in the document collection, wherein the first phrase predicts a related phrase based on a measure of an actual co-occurrence rate of the related phrase and the first phrase exceeding an expected co-occurrence rate of the related phrase and the first phrase in the document collection, the expected co-occurrence rate of the related phrase and the first phrase being a function of a plurality of occurrences of the related phrase and the first phrase in the document collection; responsive to the list of related phrases indicating that the second phrase is present in a document, select the document to include in a result to the query, without retrieving a posting list of documents containing the second phrase; and store the selected documents in a memory as part of a search result.
-
Specification