INDEX SERVER ARCHITECTURE USING TIERED AND SHARDED PHRASE POSTING LISTS
First Claim
1. A method of indexing documents based on phrases occurring in the documents, the method comprising:
- selecting a phrase posting list associated with a phrase, and identifying a plurality of documents having at least one occurrence of the phrase;
determining a length of the phrase posting list;
responsive to the length of the phrase posting list being less than a first predetermined length, associating the phrase posting list with one of a plurality of first tier index servers;
responsive to the length of the phrase posting list being greater than the first predetermined length;
dividing the phrase posting list into a plurality of shards, each shard including a subset of the plurality of the documents; and
associating each phrase posting list shard with a corresponding selected second tier index server, wherein the number of shards correspond to the number of second tier index servers.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
-
Citations
1 Claim
-
1. A method of indexing documents based on phrases occurring in the documents, the method comprising:
-
selecting a phrase posting list associated with a phrase, and identifying a plurality of documents having at least one occurrence of the phrase; determining a length of the phrase posting list; responsive to the length of the phrase posting list being less than a first predetermined length, associating the phrase posting list with one of a plurality of first tier index servers; responsive to the length of the phrase posting list being greater than the first predetermined length; dividing the phrase posting list into a plurality of shards, each shard including a subset of the plurality of the documents; and associating each phrase posting list shard with a corresponding selected second tier index server, wherein the number of shards correspond to the number of second tier index servers.
-
Specification