Multiple index based information retrieval system
First Claim
Patent Images
1. A computer based method executed by one or more processors, the method comprising:
- receiving a search query comprising at least one phrase at a server that is in communication with an index server system that includes;
(a) a primary index server system that stores a primary index including primary phrase posting lists, each primary phrase posting list being associated with a phrase and including up to a maximum number of documents that contain the phrase and comprising references to documents in rank order based on relevance of the phrase to each respective document and (b) a secondary index server system including secondary phrase posting lists, each secondary phrase posting list being associated with a primary phrase posting list in the primary index, and including documents that contain the phrase that is associated with the primary phrase posting list in the primary index and which have relevance scores less than the relevance score of a lowest ranked document in the primary posting list for the phrase;
responsive to the search query containing a first phrase having a primary posting list and a secondary posting list and a second phrase having only a primary posting list, intersecting the primary posting list of the first phrase with the primary posting list of the second phrase to obtain a first set of common documents, and intersecting the secondary posting list of the first phrase with the primary posting list of the second phrase to obtain a second set of common documents, and conjoining the first and second sets of common documents to provide a result set; and
ranking the common documents in the result set.
1 Assignment
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. The document index is partitioned into multiple indexes, including a primary index and a secondary index. The primary index stores phrase posting lists with relevance rank ordered documents. The secondary index stores excess documents from the posting lists in document order.
-
Citations
45 Claims
-
1. A computer based method executed by one or more processors, the method comprising:
-
receiving a search query comprising at least one phrase at a server that is in communication with an index server system that includes;
(a) a primary index server system that stores a primary index including primary phrase posting lists, each primary phrase posting list being associated with a phrase and including up to a maximum number of documents that contain the phrase and comprising references to documents in rank order based on relevance of the phrase to each respective document and (b) a secondary index server system including secondary phrase posting lists, each secondary phrase posting list being associated with a primary phrase posting list in the primary index, and including documents that contain the phrase that is associated with the primary phrase posting list in the primary index and which have relevance scores less than the relevance score of a lowest ranked document in the primary posting list for the phrase;responsive to the search query containing a first phrase having a primary posting list and a secondary posting list and a second phrase having only a primary posting list, intersecting the primary posting list of the first phrase with the primary posting list of the second phrase to obtain a first set of common documents, and intersecting the secondary posting list of the first phrase with the primary posting list of the second phrase to obtain a second set of common documents, and conjoining the first and second sets of common documents to provide a result set; and ranking the common documents in the result set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An information retrieval system for retrieving information from a corpus of documents, the system comprising:
-
a primary index server system comprising a primary index, the primary index including primary phrase posting lists, each primary phrase posting list being associated with a phrase and including up to a maximum number of documents of the corpus that contain the phrase, the documents being ranked relative to one another in the primary index in rank order by respective relevance scores; and a secondary index server system comprising a secondary index, the secondary index including secondary phrase posting lists, each secondary phrase posting list being associated with a primary phrase posting list in the primary index, and including documents that contain the phrase that is associated with the primary phrase posting list in the primary index and which have relevance scores less than the relevance score of a lowest ranked document in the primary phrase posting list for the phrase, wherein the primary index server system comprises multiple machines, and wherein each phrase is assigned an identification number and has a primary phrase posting list located on one of the machines. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An information retrieval system, comprising:
-
a primary index server system storing primary phrase posting lists, each primary phrase posting list being associated with a phrase and including up to a maximum number of documents that contain the phrase, the documents being rank ordered by respective relevance scores; a secondary index server system storing secondary phrase posting lists, each secondary phrase posting list being associated with a primary phrase posting list in the primary index, and including documents that contain the phrase that is associated with the primary phrase posting list in the primary index and which have relevance scores less than the relevance score of a lowest ranked document in the primary phrase posting list for the phrase, the documents being ordered by respective document identifiers; a front end server configured to receive a search query that includes at least one phrase; and a search server configured, in response to the search query containing a first phrase having a primary posting list and a secondary posting list and a second phrase having only a primary posting list, to intersect the primary posting list of the first phrase with the primary posting list of the second phrase to obtain a first set of common documents, and to intersect the secondary posting list of the first phrase with the primary posting list of the second phrase to obtain a second set of common documents, and conjoin the first and second sets of common documents, and to rank the common documents. - View Dependent Claims (21, 22, 23, 24)
-
-
25. An information retrieval system for indexing documents with respect to a first phrase, wherein each document has a document identifier, the system comprising:
-
a primary index server system storing primary phrase posting lists; a secondary index server system storing secondary phrase posting lists, an index server system configured for; ranking the documents in the list by a relevance score; storing a first portion of the list comprising higher ranked documents in the primary index server system, the higher ranked documents of the first portion stored relative to one another in the primary index server system in rank order of the respective relevance scores of the ranked documents, wherein the first portion includes a first section wherein each document listed in the first section includes a first plurality of relevance attributes, and a second section wherein each document listed in the second section comprises a second plurality of relevance attributes that are a subset of the first set of relevance attributes, and wherein the documents listed in the first section are ranked higher than the documents listed in the second section; and storing a second portion of the list comprising lesser ranked documents in the secondary index server system, the lesser ranked documents of the second portion stored relative to one another in the secondary index server system in numerical order of the respective document identifiers of the ranked documents. - View Dependent Claims (26, 27)
-
-
28. A computer based method executed by one or more processors, the method comprising:
-
receiving a search query, which includes at least one phrase, at a server that is in communication with an index server system storing a plurality of phrase posting lists, each phrase posting list being associated with a phrase and including document identifiers for documents containing the phrase, and wherein the document identifiers in at least some phrase posting lists are partitioned based on a relevance score indicating a respective document'"'"'s relevance to the phrase of the posting list such that a first portion of the posting list contains document identifiers of documents for which the phrase of the posting list has more relevance than documents identified in a second portion of the posting list, responsive to the search query containing a first phrase having a partitioned posting list and a second phrase having a partitioned posting list, intersecting the first portion of the posting list of the first phase with the first portion of the posting list of the second phrase without processing the remainders of each of the respective posting lists, to identify a result set of documents having the first phrase and the second phrase; and ranking the documents in the result set. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35)
-
-
36. An information retrieval system for retrieving information from a corpus of documents, the system comprising:
-
a front end server configured for receiving a search query, which includes at least one phrase; and an index server system storing a plurality of posting lists, each phrase posting list being associated with a phrase and including document identifiers for documents containing the phrase, wherein the document identifiers in at least some phrase posting lists are partitioned based on a relevance score indicating a respective document'"'"'s relevance to the phrase of the posting list such that a first portion of the posting list contains document identifiers of documents for which the phrase of the posting list has more relevance than documents identified in a second portion of the posting list, wherein, the index server system is configured to, responsive to the search query containing a first phrase having a partitioned posting list and a second phrase having a partitioned posting list, intersect the first portion of the posting list of the first phase with the first portion of the posting list of the second phrase without processing the remainders of each of the respective posting lists, to identify a result set of documents having the first phrase and the second phrase. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45)
-
Specification