Multiple index based information retrieval system
First Claim
Patent Images
1. A computer implemented method of indexing documents of a document collection, wherein each indexed document has a document identifier, the method comprising:
- storing a posting list of identifiers of documents of the document collection that contain a phrase;
partitioning the posting list, by operation of at least one processor, into at least a first portion including identifiers of higher ranked documents and a second portion including identifiers of lesser ranked documents, the partitioning being based on a relevance score for a document identified in the posting list indicating the document'"'"'s relevance to the phrase;
storing the first portion in a primary index; and
storing the second portion in a secondary index.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. The document index is partitioned into multiple indexes, including a primary index and a secondary index. The primary index stores phrase posting lists with relevance rank ordered documents. The secondary index stores excess documents from the posting lists in document order.
-
Citations
15 Claims
-
1. A computer implemented method of indexing documents of a document collection, wherein each indexed document has a document identifier, the method comprising:
-
storing a posting list of identifiers of documents of the document collection that contain a phrase; partitioning the posting list, by operation of at least one processor, into at least a first portion including identifiers of higher ranked documents and a second portion including identifiers of lesser ranked documents, the partitioning being based on a relevance score for a document identified in the posting list indicating the document'"'"'s relevance to the phrase; storing the first portion in a primary index; and storing the second portion in a secondary index. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable storage medium storing a computer program executable by at least one processor for indexing documents, actions of the computer program comprising:
-
determining a set of documents related to a first phrase; determining a relevance score for each of the documents in the set, the relevance score for a document indicating the document'"'"'s relevance to the first phrase; partitioning the set of documents by the relevance scores into at least a first sub-set and a second sub-set, the first sub-set having relevance scores higher than the second sub-set; generating a posting list for the first phrase partitioned across at least two tiers, the posting list comprising document entries that include a document identifier and the relevance score for the document, wherein a first tier stores document entries from the first sub-set and a second tier stores document entries from the second sub-set; repeating the determining, partitioning, and generating for a second phrase; and storing the posting lists in an index. - View Dependent Claims (8, 9)
-
-
10. A computer based method executed by one or more processors, the method comprising:
-
receiving a search query, which includes at least a first phrase and a second phrase, at a server that is in communication with an index server system storing a plurality of posting lists, at least some of the posting lists being associated with a phrase and including document identifiers for documents containing the phrase, and wherein the document identifiers in at least some posting lists are partitioned based on a relevance score that indicates a respective document'"'"'s relevance to the phrase of the posting list, such that a first portion of the posting list contains document identifiers of documents with a higher relevance score than documents identified in a second portion of the posting list; responsive to the first phrase having a partitioned posting list and the second phrase having a partitioned posting list, intersecting the first portion of the posting list of the first phrase with the first portion of the posting list of the second phrase; determining that the intersection includes a sufficient result set; and ranking the documents in the result set. - View Dependent Claims (11, 12, 13, 14, 15)
-
Specification