Multiple index based information retrieval system
First Claim
Patent Images
1. A computer implemented method for indexing documents with respect to a phrase, wherein each document as a document identifier, the method comprising:
- establishing a list of documents that contain the phrase;
ranking the documents in the list by a relevance score;
storing a first portion of the list comprising higher ranked documents in a primary index in rank order of the relevance scores; and
storing a second portion of the list comprising lesser ranked documents in a secondary index in numerical order of the document identifiers.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. The document index is partitioned into multiple indexes, including a primary index and a secondary index. The primary index stores phrase posting lists with relevance rank ordered documents. The secondary index stores excess documents from the posting lists in document order.
197 Citations
12 Claims
-
1. A computer implemented method for indexing documents with respect to a phrase, wherein each document as a document identifier, the method comprising:
-
establishing a list of documents that contain the phrase;
ranking the documents in the list by a relevance score;
storing a first portion of the list comprising higher ranked documents in a primary index in rank order of the relevance scores; and
storing a second portion of the list comprising lesser ranked documents in a secondary index in numerical order of the document identifiers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of providing an information retrieval system, the method comprising:
-
storing a primary index including primary phrase posting lists, each posting list associated with a phrase and including up to a maximum number documents that contain the phrase, the documents rank ordered by respective relevance scores;
storing a secondary index including secondary phrase posting lists, each posting list associated with a primary phrase posting list in the primary index, and including documents that contain the phrase and which have relevance scores less than the relevance score of a lowest ranked document in the primary posting list for the phrase, the documents ordered by document identifier;
receiving a search query comprising at least one phrase;
responsive to the search query containing a first phrase having a primary posting list and a secondary posting list and a second phrase having only a primary posting list, intersecting the primary posting list of the first phrase with the primary posting list of the second phrase to obtain a first set of common documents, and intersecting the secondary posting list of the first phrase with the primary posting list of the second phrase to obtain a second set of common documents, and conjoining the first and second sets of common documents; and
ranking the common documents.
-
-
12. An information retrieval system, comprising:
-
a primary index including primary phrase posting lists, each posting list associated with a phrase and including up to a maximum number documents that contain the phrase, the documents rank ordered by respective relevance scores; and
a secondary index including secondary phrase posting lists, each posting list associated with a primary phrase posting list in the primary index, and including documents that contain the phrase and which have relevance scores less than the relevance score of a lowest ranked document in the primary posting list for the phrase, the documents ordered by document identifier.
-
Specification