Multiple index based information retrieval system
First Claim
Patent Images
1. A computer-based method executed by one or more processors, the method comprising:
- receiving a search query, which includes at least a first phrase and a second phrase, at a server that is in communication with an index server system storing a plurality of posting lists, at least some of the plurality of posting lists being associated with a phrase and including document identifiers for documents containing the phrase, and wherein the document identifiers in the at least some posting lists are partitioned based on a relevance score that indicates a respective document'"'"'s relevance to the phrase of the posting list, such that a first portion of the posting list contains document identifiers of documents with a higher relevance score than documents identified in a second portion of the posting list;
responsive to the first phrase having a partitioned posting list and the second phrase lacking a partitioned posting list;
intersecting the first portion of the posting list of the first phrase with the posting list of the second phrase to generate a first set of common documents, andintersecting the second portion of the posting list of the first phrase with the posting list of the second phrase to generate a second set of common documents;
ranking the documents in a combination of the first set of common documents and the second set of common documents; and
providing highest ranked documents in the combination as search results for the search query.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. The document index is partitioned into multiple indexes, including a primary index and a secondary index. The primary index stores phrase posting lists with relevance rank ordered documents. The secondary index stores excess documents from the posting lists in document order.
-
Citations
16 Claims
-
1. A computer-based method executed by one or more processors, the method comprising:
-
receiving a search query, which includes at least a first phrase and a second phrase, at a server that is in communication with an index server system storing a plurality of posting lists, at least some of the plurality of posting lists being associated with a phrase and including document identifiers for documents containing the phrase, and wherein the document identifiers in the at least some posting lists are partitioned based on a relevance score that indicates a respective document'"'"'s relevance to the phrase of the posting list, such that a first portion of the posting list contains document identifiers of documents with a higher relevance score than documents identified in a second portion of the posting list; responsive to the first phrase having a partitioned posting list and the second phrase lacking a partitioned posting list; intersecting the first portion of the posting list of the first phrase with the posting list of the second phrase to generate a first set of common documents, and intersecting the second portion of the posting list of the first phrase with the posting list of the second phrase to generate a second set of common documents; ranking the documents in a combination of the first set of common documents and the second set of common documents; and providing highest ranked documents in the combination as search results for the search query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A search system comprising:
-
a primary index storing a first portion for each of a plurality of posting lists, each posting list being associated with a respective phrase; a secondary index storing a second portion for at least some of the plurality of posting lists, wherein the first portion of a posting list for a phrase includes document identifiers of documents with a higher relevance score than documents identified in the second portion of the posting list for the phrase, the relevance score indicating a respective document'"'"'s relevance to the phrase; at least one processor; and memory storing instructions that, when executed by the at least one processor cause the search system to perform operations including; receive a search query that includes at least a first phrase and a second phrase, responsive to determining that the first phrase has a posting list with a first portion and a second portion and that the second phrase has a posting list with a first portion but not a second portion; intersect the first portion of the posting list of the first phrase with the posting list of the second phrase to generate a first set of common documents, and intersect the second portion of the posting list of the first phrase with the posting list of the second phrase to generate a second set of common documents, rank the documents in a combination of the first set of common documents and the second set of common documents, and provide highest ranked documents in the combination as search results for the search query. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification