Automated query optimization method using both global and parallel local optimizations for materialization access planning for distributed databases
First Claim
1. A machine implemented method for automatically determining an optimal execution strategy for a request comprising query, update or transaction operations on a distributed database system having a plurality of data site, with each site having a computer and data storage facility, and said sites are interconnected by communication lines, said machine implemented method comprising the steps of:
- A. inputting an ad hoc relational query or update request from a first computer process that formulates the relational query or update request as a compacted tree having lead nodes that are select, project or join operators and non-leaf nodes that are union, intersection, difference, delete, modify or insert operators to a second computer process which thereafter performs step B;
B. for each node in the compacted tree perform the following starting at the root node;
1. if the current node is a leaf node perform the following;
a. materialization planning to choose the data sites from which data is to be accessed;
b. local process planning to determine which operations can be processed locally at each site determined by materialization planning and estimate parameters of resultant data from the local operations;
c. non-local process planning to determine strategy for remaining operations which can not be performed locally without transmission of data between sites;
d. execution strategy building to build an execution strategy for each data site in the distributed database system at which data is processed by access or manipulation;
2.
2 Assignments
0 Petitions
Accused Products
Abstract
In a Distributed Database System (DDS), database management and transaction management are extended to a distributed environment among a plurality of local sites which each have transaction server, file server, and data storage facilities. The Materialization and Access Planning (MAP) method of a distributed query, update, or transaction is an important part of the processing of the query, update, or transaction. Materialization and access planning results in a strategy for processing a query, update, or transaction in the distributed database management system (DSDBMS). Materialization consists of selecting data copies used to process the query, update, or transaction. This step is necessary since data may be stored at more than one site (i.e., computer) on the network. Access planing consists of choosing the execution order of operations and the actual execution site of each operation. Three access planning methods are used: General (Response), General (Total) and Initial Feasible Solution (IFS). For a distributed query, General (Response) and General (Total) decrease the communication cost and increase the local processing costs as compared to the IFS.
-
Citations
19 Claims
-
1. A machine implemented method for automatically determining an optimal execution strategy for a request comprising query, update or transaction operations on a distributed database system having a plurality of data site, with each site having a computer and data storage facility, and said sites are interconnected by communication lines, said machine implemented method comprising the steps of:
-
A. inputting an ad hoc relational query or update request from a first computer process that formulates the relational query or update request as a compacted tree having lead nodes that are select, project or join operators and non-leaf nodes that are union, intersection, difference, delete, modify or insert operators to a second computer process which thereafter performs step B; B. for each node in the compacted tree perform the following starting at the root node; 1. if the current node is a leaf node perform the following; a. materialization planning to choose the data sites from which data is to be accessed; b. local process planning to determine which operations can be processed locally at each site determined by materialization planning and estimate parameters of resultant data from the local operations; c. non-local process planning to determine strategy for remaining operations which can not be performed locally without transmission of data between sites; d. execution strategy building to build an execution strategy for each data site in the distributed database system at which data is processed by access or manipulation; 2. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
2. if the current node is a non-leaf node perform the following:
-
a. for each child node make the child a root of a subtree and perform step 1 above for the subtree; and b. having processed all child nodes of the current non-leaf node perform the following for the current non-leaf node; 1. materialization planning to choose the data sites from which data is to be accessed; 2. local process planning to determine which operations can be processed locally at each site determined by materialization planning and estimate parameters of resultant data from the local operations; 3. non-local process planning to determine strategy for remaining operations which can not be performed locally without transmisson of data between sites; 4. execution strategy building to build an execution strategy for each data site in the distributed database system at which data is processed by access or manipulation; and
thenC. outputting execution strategy commands from said second computer process to a third computer process that coordinates the execution of the execution strategy commands at sites within the distributed database
-
Specification