Query scheduling using hierarchical tiers of index servers
First Claim
1. A computer-implemented method for generating a query schedule for executing a query for information from a corpus of documents, the query comprising a phrase tree, the phrase tree having a plurality of nodes including phrase nodes and operator nodes, each phrase node being associated with a phrase of the query, the query being executed by a plurality of index servers, wherein at least some different ones of the index servers are associated with different phrase posting lists for phrases contained in the corpus of documents;
- the method comprising;
determining, via at least one processor of a computer system, a query cost estimate for each phrase node of the phrase tree;
assigning, via at least one processor of the computer system, each phrase node to an index server associated with the phrase posting list for a phrase of the phrase node, the assigned index server being selected from the plurality of index servers for execution of a portion of the query associated with the phrase node;
assigning, via at least one processor of the computer system, each operator node to an index server associated with the phrase posting list for a phrase of a child phrase node of the operator node, the assigned index server being selected from the plurality of index servers for execution of a portion of the query associated with the child phrase node; and
generating, via at least one processor of the computer system, a query schedule for executing the query on a plurality of index servers, the query schedule based on at least one query cost estimate for a phrase node of the phrase tree.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
225 Citations
15 Claims
-
1. A computer-implemented method for generating a query schedule for executing a query for information from a corpus of documents, the query comprising a phrase tree, the phrase tree having a plurality of nodes including phrase nodes and operator nodes, each phrase node being associated with a phrase of the query, the query being executed by a plurality of index servers, wherein at least some different ones of the index servers are associated with different phrase posting lists for phrases contained in the corpus of documents;
- the method comprising;
determining, via at least one processor of a computer system, a query cost estimate for each phrase node of the phrase tree; assigning, via at least one processor of the computer system, each phrase node to an index server associated with the phrase posting list for a phrase of the phrase node, the assigned index server being selected from the plurality of index servers for execution of a portion of the query associated with the phrase node; assigning, via at least one processor of the computer system, each operator node to an index server associated with the phrase posting list for a phrase of a child phrase node of the operator node, the assigned index server being selected from the plurality of index servers for execution of a portion of the query associated with the child phrase node; and generating, via at least one processor of the computer system, a query schedule for executing the query on a plurality of index servers, the query schedule based on at least one query cost estimate for a phrase node of the phrase tree. - View Dependent Claims (2, 3, 4, 5)
- the method comprising;
-
6. A computer program product embodied on a computer readable storage medium comprising instructions that when executed causes a computer system to:
-
generate a query schedule for executing a query for information from a corpus of documents, the query comprising a phrase tree, the phrase tree having a plurality of nodes including phrase nodes and operator nodes, each phrase node being associated with a phrase of the query, the query executed by a plurality of index servers, wherein at least some different ones of the index servers are associated with different phrase posting lists for phrases contained in the corpus of documents; determine a query cost estimate for each node of the phrase tree; assign each phrase node to an index server associated with the phrase posting list for a phrase of the phrase node, the assigned index server being selected from the plurality of index servers for execution of a portion of the query associated with the phrase node; assign each operator node to an index server associated with the phrase posting list for a phrase of a child phrase node of the operator node, the assigned index server being selected from the plurality of index servers for execution of a portion of the query associated with the child phrase node; and optimize the query schedule based on a cost associated with execution of the query. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer-implemented method for generating a query schedule for executing a query for information from a corpus of documents, the query comprising a phrase tree, the phrase tree having a plurality of nodes including phrase nodes and operator nodes, each phrase node being associated with a phrase of the query, the query executed by a plurality of index servers, wherein at least some different ones of the index servers are associated with different phrase posting lists for phrases contained in the corpus of documents;
- the method comprising;
determining, via at least one processor of a computer system, a query cost estimate for phrase nodes of the phrase tree; assigning, via at least one processor of the computer system, ones of the phrase nodes to respective index servers associated with the phrase posting list for a phrase of the phrase node, each respective index server being selected from the plurality of index servers for execution of a portion of the query associated with the respective phrase node; assigning, via at least one processor of the computer system, ones of the operator nodes to respective index servers associated with a child phrase node of the respective operator node, each respective index server being selected from the plurality of index servers for execution of a portion of the query associated with the child phrase node; and generating, via at least one processor of the computer system, a query schedule for executing the query on a plurality of index servers, the query schedule based on at least one query cost estimate for a phrase node of the phrase tree. - View Dependent Claims (12, 13, 14, 15)
- the method comprising;
Specification