Multiple index based information retrieval system
First Claim
Patent Images
1. A computer implemented method for indexing documents with respect to a first phrase, wherein each document has a document identifier, the method comprising:
- storing a primary index of phrases;
storing a secondary index of phrases;
establishing a list of documents that contain the first phrase;
partitioning the documents, by operation of a processor adapted to manipulate data within a computer system, in the list into at least a first portion comprising higher ranked documents in the list, and a second portion comprising lesser ranked documents in the list, based on ranking the documents in the list by a relevance score;
storing the first portion in the primary index, the higher ranked documents of the first portion stored relative to one another in the primary index in rank order of the respective relevance scores of the ranked documents; and
based on the partitioning, storing the second portion in the secondary index, the lesser ranked documents of the second portion stored relative to one another in the secondary index in numerical order of the respective document identifiers of the ranked documents, wherein an identifier indicating a reference to the secondary index is stored in the primary index, and is associated with the stored first portion.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. The document index is partitioned into multiple indexes, including a primary index and a secondary index. The primary index stores phrase posting lists with relevance rank ordered documents. The secondary index stores excess documents from the posting lists in document order.
-
Citations
15 Claims
-
1. A computer implemented method for indexing documents with respect to a first phrase, wherein each document has a document identifier, the method comprising:
-
storing a primary index of phrases; storing a secondary index of phrases; establishing a list of documents that contain the first phrase; partitioning the documents, by operation of a processor adapted to manipulate data within a computer system, in the list into at least a first portion comprising higher ranked documents in the list, and a second portion comprising lesser ranked documents in the list, based on ranking the documents in the list by a relevance score; storing the first portion in the primary index, the higher ranked documents of the first portion stored relative to one another in the primary index in rank order of the respective relevance scores of the ranked documents; and based on the partitioning, storing the second portion in the secondary index, the lesser ranked documents of the second portion stored relative to one another in the secondary index in numerical order of the respective document identifiers of the ranked documents, wherein an identifier indicating a reference to the secondary index is stored in the primary index, and is associated with the stored first portion. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer implemented method for indexing documents with respect to a first phrase, wherein each document has a document identifier, the method comprising:
-
storing a primary index of phrases; storing a secondary index of phrases; establishing a list of documents that contain the first phrase; ranking, by operation of a processor adapted to manipulate data within a computer system, the documents in the list by a relevance score; storing a first portion of the list comprising higher ranked documents in the primary index, the higher ranked documents of the first portion stored relative to one another in the primary index in rank order of the respective relevance scores of the ranked documents, wherein the first portion includes a first section wherein each document listed in the first section includes a first plurality of relevance attributes, and a second section wherein each document listed in the second section comprises a second plurality of relevance attributes that are a subset of the first set of relevance attributes, and wherein the documents listed in the first section are ranked higher than the documents listed in the second section; and storing a second portion of the list comprising lesser ranked documents in the secondary index, the lesser ranked documents of the second portion stored relative to one another in the secondary index in numerical order of the respective document identifiers of the ranked documents. - View Dependent Claims (9, 10)
-
-
11. A computer readable storage medium storing a computer program executable by a processor for indexing documents with respect to a first phrase, the actions of the computer program comprising:
-
storing a primary index of phrases; storing a secondary index of phrases; establishing a list of documents that contain the first phrase; partitioning the documents in the list into at least a first portion comprising higher ranked documents in the list, and a second portion comprising lesser ranked documents in the list, based on ranking the documents in the list by a relevance score; storing the first portion in the primary index, the higher ranked documents of the first portion stored relative to one another in the primary index in rank order of the respective relevance scores of the ranked documents; and based on the partitioning, storing the second portion in the secondary index, the lesser ranked documents of the second portion stored relative to one another in the secondary index in numerical order of the respective document identifiers of the ranked documents, wherein an identifier indicating a reference to the secondary index is stored in the primary index, and is associated with the stored first portion. - View Dependent Claims (12, 13, 14, 15)
-
Specification