HYBRID TERM AND DOCUMENT-BASED INDEXING FOR SEARCH QUERY RESOLUTION
First Claim
1. A method of distributing on a computing cluster an inverted index comprising terms respectively associated with posting lists of document identifiers (DocIDs), comprising:
- organizing n computers into B banks;
distributing document IDentifiers (DocIDs) appearing in posting lists of an inverted index among the B banks of computers, each posting list corresponding to a search term;
within a bank of the B banks, distributing portions of the DocIDs, which appear in a large posting list and are distributed to that bank, to a plurality of the computers within that bank;
within that bank, assigning responsibility to produce posting list results for a small posting list term to fewer of the computers of that bank; and
providing for the distribution of DocIDs appearing in the small posting list, which are not already distributed thereto, to its assigned computer(s).
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatuses relate to hosting an inverted index for term-based document searching. According to disclosed aspects, each bank of a plurality of banks receives a plurality of Document IDentifiers (DocIDs) in the inverted index, and within each bank, posting lists for each term are determined large or small. DocIDs for large posting lists are distributed among computers in a bank while responsibility for producing DocIDs identifiers in a small posting list are distributed by term to one or fewer computers in the bank. During operation, each term of a query is distributed to each bank, and then for small terms, only those computers assigned responsibility for a given term need to search for responsive DocIDs. DocIDs can be redistributed among computers in a bank such that results are presented from the computers that would have produced those results in a cluster having a pure DocIDs distribution scheme.
-
Citations
21 Claims
-
1. A method of distributing on a computing cluster an inverted index comprising terms respectively associated with posting lists of document identifiers (DocIDs), comprising:
-
organizing n computers into B banks; distributing document IDentifiers (DocIDs) appearing in posting lists of an inverted index among the B banks of computers, each posting list corresponding to a search term; within a bank of the B banks, distributing portions of the DocIDs, which appear in a large posting list and are distributed to that bank, to a plurality of the computers within that bank; within that bank, assigning responsibility to produce posting list results for a small posting list term to fewer of the computers of that bank; and providing for the distribution of DocIDs appearing in the small posting list, which are not already distributed thereto, to its assigned computer(s). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer cluster for providing searching of an inverted index comprising posting lists of document identifiers of documents in which each term of a plurality of terms appears, comprising:
n computers organized into B banks, each computer operable for storing data assigned to it, wherein each computer of a respective bank stores a portion of document identifiers that are assigned to that bank and which are associated with a large posting list, and all the document identifiers assigned to that bank which are associated with a small posting list corresponding to a term assigned to that computer. - View Dependent Claims (11, 12, 13)
-
14. A method of identifying documents potentially relevant to a term-based query, comprising:
-
receiving a query comprising search terms; using a computer cluster of n computers organized into B banks, the computer cluster hosting an inverted index comprising posting lists of DocIDs in which each term of a plurality of terms appears, and each computer of a respective bank stores a portion of DocIDs that are assigned to that bank and which are associated with a large posting list, and all the DocIDs assigned to that bank, which are associated with a small posting list corresponding to a term assigned to that computer; distributing the search terms to each bank; and in each bank, for any term corresponding to a small posting list, retrieving its corresponding smaller posting list from the computer to which it was assigned, and for any term corresponding to a large posting list, retrieving a portion of its corresponding posting list from each computer of the bank. - View Dependent Claims (15)
-
-
16. A method of organizing a computer cluster for supporting term-based searching of an inverted index, comprising:
-
dividing n computers of the computer cluster into B banks; distributing selections of document identifiers of an inverted index among the B banks, wherein at least some of the document identifiers are distributed to fewer than all of the B banks; and distributing the document identifiers assigned to each bank among the computers of that bank, wherein B is selected for balancing an aggregate search throughput of the computer cluster with respective search latencies for individual searches. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification