Integrated distributed query processor for data grids

US 8,103,656 B2
Filed: 02/20/2009
Issued: 01/24/2012
Est. Priority Date: 02/20/2008
Status: Active Grant

First Claim

Patent Images

1. A method for reducing query response time for processing a distributed query in a network of a plurality of computational resources, wherein at least one of the plurality of computational resources hosts one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the method comprising:

a. receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion indicates a version of data desired by a user in the distributed query processing results;

b. determining from among an available set of distributed query processing solutions, a solution with the minimum overall cost for query execution, the determination comprising applying an integrated cost model to each available query processing solution and thereafter selecting the solution with the lowest overall cost, wherein the integrated cost model is based on at least the following cost model factors;

i. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources;

ii. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of the computational resources;

iii. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and

iv. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for processing a distributed query in a network of computational resources is provided. The method includes receiving a user-defined freshness criterion and a distributed query from a user. The user-defined data freshness criterion is based on the version of the distributed query results desired by the user. An integrated cost model is formulated to optimize the execution of the distributed query. The integrated cost model is based on one or more integrated cost model factors. Thereafter, an objective function is constructed, based on the processing cost for each of the one or more copies of the one or more relations, and a data transmission cost for the transfer of the one or more copies of the one or more relations from a first to a second computational resource. Subsequently, an optimal solution of the objective function is calculated by using one or more heuristic approaches.

15 Citations

View as Search Results

14 Claims

1. A method for reducing query response time for processing a distributed query in a network of a plurality of computational resources, wherein at least one of the plurality of computational resources hosts one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the method comprising:
- a. receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion indicates a version of data desired by a user in the distributed query processing results;
  
  b. determining from among an available set of distributed query processing solutions, a solution with the minimum overall cost for query execution, the determination comprising applying an integrated cost model to each available query processing solution and thereafter selecting the solution with the lowest overall cost, wherein the integrated cost model is based on at least the following cost model factors;
  
  i. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources;
  
  ii. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of the computational resources;
  
  iii. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and
  
  iv. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, wherein the lazy replication technique enables periodic updating of the one or more relations in the plurality of computational resources.
  - 3. The method according to claim 1, wherein applying the integrated cost model to a query processing solution comprises construction of an objective function, wherein the objective function is based on a processing cost for each of the one or more copies of the one or more relations and a data transmission cost for transfer of the one or more copies of the one or more relations from the first computational resource and the second computational resource.
  - 4. The method according to claim 3, wherein the processing cost is calculated based on a join processing cost and a local processing cost at each of the plurality of computational resources.
  - 5. The method according to claim 4, wherein the join processing cost and the local processing cost are calculated based on the cost model factors.
  - 6. The method according to claim 3, wherein the data transmission cost is calculated based on the cost model factors.
  - 7. The method according to claim 1 wherein selecting the solution with the lowest overall cost comprises evaluation of one or more heuristic approaches, the one or more heuristic approaches comprising one or more parallel plan construction heuristics approaches and one or more computational node heuristic approaches.
  - 8. The method according to claim 7, wherein selecting the solution with the lowest overall cost further comprises:
    - a. identifying a join ordering plan from one or more join ordering plans generated by the one or more heuristics approaches, the identification being based on the integrated cost model factors;
      
      b. selecting a copy from the one or more copies of the one or more relations based on the one or more integrated cost model factors;
      
      c. determining a parallel plan using the plurality of computational resources hosting the one or more relations referenced in the distributed query; and
      
      d. identifying a fast parallel execution plan for selecting one or more of the plurality of computational resources that host zero copies of the one or more relations referenced in the distributed query.
  - 9. The method according to claim 1, wherein the integrated cost model is additionally based on at least one of the following cost model factors:
    - i. the user defined data freshness criterion; and
      
      ii. the plurality of computational resources.

10. A computer program product comprising a non-transitory computer usable medium having computer readable program code embodied therein for reducing query response time for processing a distributed query in a network of a plurality of computational resources, wherein at least one or more of the plurality of computational resources host one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the computer readable program code adapted to:
- a. receive a user-defined data freshness criterion, wherein the user-defined data freshness criterion indicates a version of data desired by a user in the distributed query processing results;
  
  b. determine from among an available set of distributed query processing solutions, a solution with the minimum overall cost for query execution, the determination comprising applying an integrated cost model to each available query processing solution and thereafter selecting the solution with the lowest overall cost, wherein the integrated cost model is based on at least the following cost model factors;
  
  i. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources;
  
  ii. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of computational resources;
  
  iii. or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and
  
  iv. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The computer program product according to claim 10, wherein the computer readable program code adapted to apply the integrated cost model to a query processing solution comprises instructions for construction of an objective function, the objective function is based on a processing cost for each of the one or more copies and a data transmission cost for transfer of one or more relations from the first computational resource and the second computational resource.
  - 12. The computer program product according to claim 10 further comprising computer readable program code adapted to select the solution with the lowest overall cost comprises instructions for evaluation of one or more heuristic approaches, the one or more heuristic approaches include one or more parallel plan construction heuristics approaches and one or more computational node heuristic approaches.
  - 13. The computer program product according to claim 12, wherein the computer readable program code adapted to select the solution with the lowest overall cost further comprises:
    - a. instructions for identifying a join ordering plan from one or more join ordering plans generated by the one or more heuristics approaches, the identification being based on the one or more integrated cost model factors;
      
      b. instructions for selecting a copy from the one or more copies of the one or more relations based on the one or more integrated cost model factors;
      
      c. instructions for determining a parallel plan using the plurality of computational resources that host one or more relations referenced in the distributed query; and
      
      d. instructions for identifying a fast parallel execution plan for selecting one or more of the plurality of computational resources that host zero copies of the one or more relations referenced in the distributed query.
  - 14. The computer program product according to claim 10, wherein the integrated cost model is additionally based on at least one of the following cost model factors:
    - i. the user defined data freshness criterion; and
      
      ii. the plurality of computational resources.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Infosys Limited
Original Assignee
Infosys Technologies Limited (Infosys Limited)
Inventors
Krishnamoorthy, Srikumar, Saple, Avdhoot Kishore, Achutharao, Prahalad Haldhoderi
Primary Examiner(s)
Mofiz, Apu
Assistant Examiner(s)
Nguyen, Thu Nga

Application Number

US12/389,473
Publication Number

US 20090281987A1
Time in Patent Office

1,068 Days
Field of Search

None
US Class Current

707/713
CPC Class Codes

G06F 16/24542 Plan optimisation

G06F 16/2471 Distributed queries

Integrated distributed query processor for data grids

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

15 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Integrated distributed query processor for data grids

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links