Index server architecture using tiered and sharded phrase posting lists
First Claim
1. A phrase-based indexing system comprising:
- a first tier of index servers including N index servers, each of which stores a portion of a phrase posting list for each of a plurality of phrases, each phrase posting list being associated with a phrase and a list of documents having at least one occurrence of the phrase; and
M additional tiers of index servers, wherein;
M is one to a predetermined number,each Mth tier includes T index servers, where T is an integer multiple of N when M equals one and where T is an integer multiple of T for an (M−
1)th tier when M is greater than or equal to two, andeach Mth tier index server stores a portion of a phrase posting list of each of a plurality of phrases, each phrase posting list being associated with a phrase and a list of documents having at least one occurrence of the phrase.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
224 Citations
8 Claims
-
1. A phrase-based indexing system comprising:
-
a first tier of index servers including N index servers, each of which stores a portion of a phrase posting list for each of a plurality of phrases, each phrase posting list being associated with a phrase and a list of documents having at least one occurrence of the phrase; and M additional tiers of index servers, wherein; M is one to a predetermined number, each Mth tier includes T index servers, where T is an integer multiple of N when M equals one and where T is an integer multiple of T for an (M−
1)th tier when M is greater than or equal to two, andeach Mth tier index server stores a portion of a phrase posting list of each of a plurality of phrases, each phrase posting list being associated with a phrase and a list of documents having at least one occurrence of the phrase. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification