Integrated distributed query processor for data grids
First Claim
1. A method for reducing query response time for processing a distributed query in a network of a plurality of computational resources, wherein at least one of the plurality of computational resources hosts one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the method comprising:
- a. receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion indicates a version of data desired by a user in the distributed query processing results;
b. determining from among an available set of distributed query processing solutions, a solution with the minimum overall cost for query execution, the determination comprising applying an integrated cost model to each available query processing solution and thereafter selecting the solution with the lowest overall cost, wherein the integrated cost model is based on at least the following cost model factors;
i. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources;
ii. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of the computational resources;
iii. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and
iv. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for processing a distributed query in a network of computational resources is provided. The method includes receiving a user-defined freshness criterion and a distributed query from a user. The user-defined data freshness criterion is based on the version of the distributed query results desired by the user. An integrated cost model is formulated to optimize the execution of the distributed query. The integrated cost model is based on one or more integrated cost model factors. Thereafter, an objective function is constructed, based on the processing cost for each of the one or more copies of the one or more relations, and a data transmission cost for the transfer of the one or more copies of the one or more relations from a first to a second computational resource. Subsequently, an optimal solution of the objective function is calculated by using one or more heuristic approaches.
15 Citations
14 Claims
-
1. A method for reducing query response time for processing a distributed query in a network of a plurality of computational resources, wherein at least one of the plurality of computational resources hosts one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the method comprising:
-
a. receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion indicates a version of data desired by a user in the distributed query processing results; b. determining from among an available set of distributed query processing solutions, a solution with the minimum overall cost for query execution, the determination comprising applying an integrated cost model to each available query processing solution and thereafter selecting the solution with the lowest overall cost, wherein the integrated cost model is based on at least the following cost model factors; i. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources; ii. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of the computational resources; iii. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and iv. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product comprising a non-transitory computer usable medium having computer readable program code embodied therein for reducing query response time for processing a distributed query in a network of a plurality of computational resources, wherein at least one or more of the plurality of computational resources host one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the computer readable program code adapted to:
-
a. receive a user-defined data freshness criterion, wherein the user-defined data freshness criterion indicates a version of data desired by a user in the distributed query processing results; b. determine from among an available set of distributed query processing solutions, a solution with the minimum overall cost for query execution, the determination comprising applying an integrated cost model to each available query processing solution and thereafter selecting the solution with the lowest overall cost, wherein the integrated cost model is based on at least the following cost model factors; i. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources; ii. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of computational resources; iii. or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and iv. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource. - View Dependent Claims (11, 12, 13, 14)
-
Specification