INTEGRATED DISTRIBUTED QUERY PROCESSOR FOR DATA GRIDS
First Claim
1. A method for processing a distributed query in a network of a plurality of computational resources, wherein at least one of the plurality of computational resources hosts one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the method comprising:
- a. receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion is based on the version of the distributed query processing results desired by a user;
b. formulating an integrated cost model for optimizing the execution of the distributed query, the formulation being based on one or more integrated cost model factors, the one or more integrated cost model factors comprising;
i. the user-defined data freshness criterion;
ii. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources;
iii. the plurality of computational resources;
iv. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of the computational resources;
v. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and
vi. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for processing a distributed query in a network of computational resources is provided. The method includes receiving a user-defined freshness criterion and a distributed query from a user. The user-defined data freshness criterion is based on the version of the distributed query results desired by the user. An integrated cost model is formulated to optimize the execution of the distributed query. The integrated cost model is based on one or more integrated cost model factors. Thereafter, an objective function is constructed, based on the processing cost for each of the one or more copies of the one or more relations, and a data transmission cost for the transfer of the one or more copies of the one or more relations from a first to a second computational resource. Subsequently, an optimal solution of the objective function is calculated by using one or more heuristic approaches.
-
Citations
12 Claims
-
1. A method for processing a distributed query in a network of a plurality of computational resources, wherein at least one of the plurality of computational resources hosts one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the method comprising:
-
a. receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion is based on the version of the distributed query processing results desired by a user; b. formulating an integrated cost model for optimizing the execution of the distributed query, the formulation being based on one or more integrated cost model factors, the one or more integrated cost model factors comprising; i. the user-defined data freshness criterion; ii. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources; iii. the plurality of computational resources; iv. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of the computational resources; v. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and vi. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource. - View Dependent Claims (2, 3, 4, 5, 6)
-
- 7. The method according to clam 1 further comprising determination of an optimal solution of the integrated cost model by evaluation of one or more heuristic approaches, the one or more heuristic approaches comprising one or more parallel plan construction heuristics approaches and one or more computational node heuristic approaches.
-
9. A computer program product for processing a distributed query in a network of a plurality of computational resources, wherein at least one or more of the plurality of computational resources host one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the computer program product comprising:
-
a. program instruction means for receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion is based on the version of the distributed query processing results desired by a user; b. program instruction means for formulating an integrated cost model for optimizing the execution of the distributed query, the formulation being based on one or more integrated cost model factors, the one or more integrated cost model factors comprising; i. the user-defined data freshness criterion; ii. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources; iii. the plurality of computational resources; iv. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of computational resources; v. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and vi. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource. - View Dependent Claims (10, 11, 12)
-
Specification