INTEGRATED DISTRIBUTED QUERY PROCESSOR FOR DATA GRIDS

US 20090281987A1
Filed: 02/20/2009
Published: 11/12/2009
Est. Priority Date: 02/20/2008
Status: Active Grant

First Claim

Patent Images

1. A method for processing a distributed query in a network of a plurality of computational resources, wherein at least one of the plurality of computational resources hosts one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the method comprising:

a. receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion is based on the version of the distributed query processing results desired by a user;

b. formulating an integrated cost model for optimizing the execution of the distributed query, the formulation being based on one or more integrated cost model factors, the one or more integrated cost model factors comprising;

i. the user-defined data freshness criterion;

ii. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources;

iii. the plurality of computational resources;

iv. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of the computational resources;

v. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and

vi. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for processing a distributed query in a network of computational resources is provided. The method includes receiving a user-defined freshness criterion and a distributed query from a user. The user-defined data freshness criterion is based on the version of the distributed query results desired by the user. An integrated cost model is formulated to optimize the execution of the distributed query. The integrated cost model is based on one or more integrated cost model factors. Thereafter, an objective function is constructed, based on the processing cost for each of the one or more copies of the one or more relations, and a data transmission cost for the transfer of the one or more copies of the one or more relations from a first to a second computational resource. Subsequently, an optimal solution of the objective function is calculated by using one or more heuristic approaches.

Citations

12 Claims

1. A method for processing a distributed query in a network of a plurality of computational resources, wherein at least one of the plurality of computational resources hosts one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the method comprising:
- a. receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion is based on the version of the distributed query processing results desired by a user;
  
  b. formulating an integrated cost model for optimizing the execution of the distributed query, the formulation being based on one or more integrated cost model factors, the one or more integrated cost model factors comprising;
  
  i. the user-defined data freshness criterion;
  
  ii. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources;
  
  iii. the plurality of computational resources;
  
  iv. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of the computational resources;
  
  v. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and
  
  vi. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method according to claim 1, wherein the lazy replication technique enables periodic updating of the one or more relations in the plurality of computational resources.
  - 3. The method according to claim 1, wherein the formulation of the integrated cost model comprises construction of an objective function, the objective function is based on a processing cost for each of the one or more copies of the one or more relations and a data transmission cost for transfer of the one or more copies of the one or more relations from the first computational resource and the second computational resource.
  - 4. The method according to claim 3, wherein the processing cost is calculated based on a join processing cost and a local processing cost at each of the plurality of computational resources.
  - 5. The method according to claim 4, wherein the join processing cost and the local processing cost are calculated based on the one or more integrated cost model factors.
  - 6. The method according to claim 3, wherein the data transmission cost is calculated based on the one or more integrated cost model factors.

7. The method according to clam 1 further comprising determination of an optimal solution of the integrated cost model by evaluation of one or more heuristic approaches, the one or more heuristic approaches comprising one or more parallel plan construction heuristics approaches and one or more computational node heuristic approaches.
- View Dependent Claims (8)
- - 8. The method according to claim 7, wherein the determination of the optimal solution further comprises:
    - a. identifying a join ordering plan from one or more join ordering plans generated by the one or more heuristics approaches, the identification being based on the one or more integrated cost model factors;
      
      b. selecting a copy from the one or more copies of the one or more relations based on the one or more integrated cost model factors;
      
      c. determining a parallel plan using the plurality of computational resources hosting the one or more relations referenced in the distributed query; and
      
      d. identifying a fast parallel execution plan for selecting one or more of the plurality of computational resources that host zero copies of the one or more relations referenced in the distributed query.

9. A computer program product for processing a distributed query in a network of a plurality of computational resources, wherein at least one or more of the plurality of computational resources host one or more relations, the one or more relations belonging to one or more databases, the one or more relations being replicated using a lazy replication technique to form one or more copies of the one or more relations, the computer program product comprising:
- a. program instruction means for receiving a user-defined data freshness criterion, wherein the user-defined data freshness criterion is based on the version of the distributed query processing results desired by a user;
  
  b. program instruction means for formulating an integrated cost model for optimizing the execution of the distributed query, the formulation being based on one or more integrated cost model factors, the one or more integrated cost model factors comprising;
  
  i. the user-defined data freshness criterion;
  
  ii. an information freshness measure of the one or more relations, the information freshness measure being based on the version of the one or more copies of the one or more relations across the plurality of computational resources;
  
  iii. the plurality of computational resources;
  
  iv. one or more computational resource parameters, the one or more computational resource parameters comprising available memory of each of the plurality of computational resources and processing speed of each of the plurality of computational resources;
  
  v. one or more database related parameters, the one or more database related parameters comprising one or more index access paths, one or more join algorithm types, size of one or more copies of the one or more relations, and selectivity of one or more local and join predicates; and
  
  vi. one or more cost parameters, the one or more cost parameters comprising communication cost for a link between a first computational resource and a second computational resource.
- View Dependent Claims (10, 11, 12)
- - 10. The computer program product according to claim 9, wherein the program instruction means for formulation of the integrated cost model comprises program instruction means for construction of an objective function, the objective function is based on a processing cost for each of the one or more copies and a data transmission cost for transfer of one or more relations from the first computational resource and the second computational resource.
  - 11. The computer program product according to claim 9 further comprising program instruction means for determination of an optimal solution of the integrated cost model by evaluation of one or more heuristic approaches, the one or more heuristic approaches include one or more parallel plan construction heuristics approaches and one or more computational node heuristic approaches.
  - 12. The computer program product according to claim 11, wherein the program instruction means for determination of the optimal solution further comprises:
    - a. program instruction means for identifying a join ordering plan from one or more join ordering plans generated by the one or more heuristics approaches, the identification being based on the one or more integrated cost model factors;
      
      b. program instruction means for selecting a copy from the one or more copies of the one or more relations based on the one or more integrated cost model factors;
      
      c. program instruction means for determining of a parallel plan using the plurality of computational resources that host one or more relations referenced in the distributed query; and
      
      d. program instruction means for identifying a fast parallel execution plan for selecting one or more of the plurality of computational resources that host zero copies of the one or more relations referenced in the distributed query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Infosys Limited
Original Assignee
Infosys Technologies Limited (Infosys Limited)
Inventors
Saple, Avdhoot Kishore, Krishnamoorthy, Srikumar, Achutharao, Prahalad Haldhoderi

Granted Patent

US 8,103,656 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/24542 Plan optimisation

G06F 16/2471 Distributed queries

INTEGRATED DISTRIBUTED QUERY PROCESSOR FOR DATA GRIDS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

INTEGRATED DISTRIBUTED QUERY PROCESSOR FOR DATA GRIDS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links