Phrase-based indexing in an information retrieval system
First Claim
Patent Images
1. A method of indexing documents in a document collection, the method comprising:
- providing a list of phrases;
identifying for a given document, each phrase in the document;
for each phrase in the document, identifying a related phrase also present in the document; and
for each phrase in the document, storing in a posting list of the phrase an identifier of the document, and an indication of each related phrase also present in the document.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.
163 Citations
3 Claims
-
1. A method of indexing documents in a document collection, the method comprising:
-
providing a list of phrases;
identifying for a given document, each phrase in the document;
for each phrase in the document, identifying a related phrase also present in the document; and
for each phrase in the document, storing in a posting list of the phrase an identifier of the document, and an indication of each related phrase also present in the document. - View Dependent Claims (2)
-
-
3. A method of indexing documents in a document collection, the method comprising:
-
providing a list of valid phrases, wherein each phrase on the list appears a minimum number of times in the document collection, and predicts at least one other phrase;
accessing a plurality of documents in the document collection;
for each accessed document, identifying each phrase in the document, from the list of valid phrases; and
for each identified phrase in the document, storing in a posting list of the phrase an identifier of the document.
-
Specification