Phrase-based searching in an information retrieval system
First Claim
Patent Images
1. A method of selecting documents in a document collection in response to a query, the method comprising:
- receiving a query;
identifying a plurality of phrases in the query, wherein at least one phrase is a multiple word phrase;
identifying a phrase extension of at least one of the identified phrases; and
selecting documents from the document collection containing at one phrase from a set including phrases in the query and the phrase extension.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.
-
Citations
14 Claims
-
1. A method of selecting documents in a document collection in response to a query, the method comprising:
-
receiving a query;
identifying a plurality of phrases in the query, wherein at least one phrase is a multiple word phrase;
identifying a phrase extension of at least one of the identified phrases; and
selecting documents from the document collection containing at one phrase from a set including phrases in the query and the phrase extension. - View Dependent Claims (2)
-
-
3. A method of selecting documents in a document collection in response to a query, the method comprising:
-
receiving a query;
identifying an incomplete phrase in the query;
replacing the incomplete phrase with a phrase extension; and
selecting documents from the document collection containing the phrase extension. - View Dependent Claims (4, 5)
-
-
6. A method of selecting documents in a document collection in response to a query, the method comprising:
-
receiving a query including a first phrase and second phrase;
retrieving a posting list of documents containing the first phrase;
for each document in the posting list;
accessing a list indicating related phrases of the first phrase that are present in the document; and
responsive to the list of related phrase indicating that the second phrase is present in a document, selecting the document to include in a result to the query, without retrieving a posting list of documents containing the second phrase. - View Dependent Claims (7, 8, 9)
-
-
10. A method of ranking documents included in a search result in response to a query, the query comprising at least one query phrase, the method comprising:
-
for each document in the search result, accessing a related phrase bit vector for a query phrase, wherein each bit of the bit vector indicates the presence or absence of a related phrase of the query phrase; and
sorting the documents in the search results by the value of their related phrase bit vectors, such the document with the highest value related phrase bit vector is ranked highest in the search result. - View Dependent Claims (11)
-
-
12. A method of ranking documents included in a search result in response to a query, the query comprising at least one query phrase, the method comprising:
-
for each document in the search result;
accessing a related phrase bit vector for a phrase of the query, wherein each bit of the bit vector indicates the presence or absence of a related phrase of the query phrase;
for each bit indicating the presence of a related phrase of the query phrase, adding a predetermined number of points associated with the bit to a score for the document; and
sorting the documents in the search results by their document scores. - View Dependent Claims (13)
-
-
14. A method of providing an information retrieval system, the method comprising:
-
automatically identifying valid phrases in a document collection comprising a plurality of documents, wherein the valid phrases contain multiple word phrases;
indexing the documents according to valid phrases contained in the documents;
receiving a search query;
identifying phrases contained in the query;
selecting documents according to the identified phrases; and
ranking the selected documents according to the identified phrases.
-
Specification