Index server architecture using tiered and sharded phrase posting lists
First Claim
1. A method comprising:
- determining a phrase posting list for a phrase, the phrase posting list identifying a set of web-pages that include the phrase;
determining a cost for the phrase posting list; and
assigning the phrase posting list to one of a plurality of tiers based on the cost, where each tier has a respective minimum cost used to assign the phrase posting list and where higher cost results in the phrase posting list being assigned to a tier with higher performance.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
-
Citations
16 Claims
-
1. A method comprising:
-
determining a phrase posting list for a phrase, the phrase posting list identifying a set of web-pages that include the phrase; determining a cost for the phrase posting list; and assigning the phrase posting list to one of a plurality of tiers based on the cost, where each tier has a respective minimum cost used to assign the phrase posting list and where higher cost results in the phrase posting list being assigned to a tier with higher performance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An indexing system comprising:
-
an index having at least two tiers, each tier being associated with a minimum cost; at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the indexing system to; determine a phrase posting list for a phrase, the phrase posting list identifying a set of web-pages that include the phrase; determining a cost for the phrase posting list; and assigning the phrase posting list to one of the at least two tiers based on the cost by determining a tier having a minimum cost less than or equal to the cost for the phrase, where the minimum cost for a next tier is greater than the cost for the phrase. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification