Calibration of logical cost formulae for queries in a heterogeneous DBMS using synthetic database
First Claim
1. A database access system for optimizing database queries in a heterogeneous distributed database system, the system comprising:
- a first database machine incorporating a first relational database management system and accompanying first database;
a second database machine incorporating a relational database management system and accompanying second database;
the first and second relational database management systems being different but conforming at least to a predetermined structured query language (SQL);
communication means for electronic bidirectional communications between the different database machines;
means coupled to the communication means for sending and receiving an electronic message to and from any of the database machines, the message containing data defining a database query;
a data access logical cost model comprising logical cost formulae for optimizing queries in each database in the system;
a synthetic database for use in calibrating the data access logical cost model for each relational database management system in the distributed database system;
means for querying the synthetic data base on each database machine to determine cost coefficients for use in said logical costs formula to calibrate the data access logical cost model; and
means responsive to a database query for accessing each of the first and second databases of said first and second database machines in accordance with a least cost index obtained from said data access logical cost model.
2 Assignments
0 Petitions
Accused Products
Abstract
A programmable machine system and method for managing electronic data access among multiple different relational databases in a network distributed database environment. The machine is programmed so that it can construct cost-effective access strategies for any of the participating databases absent any DBMS-specific cost models. The system provides query optimization across different database management systems in a network distributed database environment based on a calibrating database relying only on typical relational database statistics and cost data is developed by running queries in the various databases against the calibrating database. A logical cost model is constructed using the resulting cost data and is used to estimate the cost of a given query based on logical characteristics of the DBMS, the relations, and the query itself. The cost of a complex query is estimated using primitive queries. Optimal query access strategies are thereby designed and used to control execution of the queries across relational databases controlled by two or more different database management systems.
-
Citations
20 Claims
-
1. A database access system for optimizing database queries in a heterogeneous distributed database system, the system comprising:
-
a first database machine incorporating a first relational database management system and accompanying first database; a second database machine incorporating a relational database management system and accompanying second database; the first and second relational database management systems being different but conforming at least to a predetermined structured query language (SQL); communication means for electronic bidirectional communications between the different database machines; means coupled to the communication means for sending and receiving an electronic message to and from any of the database machines, the message containing data defining a database query; a data access logical cost model comprising logical cost formulae for optimizing queries in each database in the system; a synthetic database for use in calibrating the data access logical cost model for each relational database management system in the distributed database system; means for querying the synthetic data base on each database machine to determine cost coefficients for use in said logical costs formula to calibrate the data access logical cost model; and means responsive to a database query for accessing each of the first and second databases of said first and second database machines in accordance with a least cost index obtained from said data access logical cost model. - View Dependent Claims (2, 3, 4, 5)
-
-
6. In a system for accessing data in a plurality of relational computer databases on distributed network of database machines, a method of structuring access strategies based on derived cost models for at least two participating database management systems (DBMS), each DBMS having a structured query language (SQL), but differing associated cost models, the method comprising:
-
constructing a model database wherein all rows, columns and relational structures are known and controlled; conducting a series of access tests on each participating DBMS using said model database; deriving access cost data for each participating DBMS according to said access tests; storing the access cost data as a logical cost model in a datadictionary/catalog; determining an optimum application plan for subsequent distributed database queries relying on the logical cost model stored in said data-dictionary/catalog; and executing subsequent queries to return data from the distributed databases in accordance with said optimum application plan. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A computer system for querying a plurality of databases on a network of database computers comprising:
-
a data storage means in each of the computers for holding database datadictionaries/catalogs and database access mechanisms; input and display means connected to each of the computers for inputting database queries to the system and displaying results in human readable format; communications means for transmitting and receiving queries and results by and between the plurality of database computers; each computer including a database management system (DBMS) having a DBMS query mechanism for accessing a database in response to a database query; a cost model including a data-dictionary/catalog containing cost data based on coefficients of composite costs including at least CPU and I/O operations relative to a database query, derived by running each individual database computer'"'"'s DBMS query mechanism against a model database for each of the tested DBMSs in the network; and means for building query access strategies for each database computer'"'"'s DBMS based on the cost data stored in said data-dictionary/catalog. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A method of using a programmable system to perform electronic data management among a plurality of electronic relational databases and corresponding DBMSs in a network distributed environment, said programmable system having a plurality of machine components each including a data storage device, a display device, and a communications means for interconnecting the machine components for bidirectional data communications therebetween, the method comprising:
-
constructing a model database in which all relational structures and components are known and controlled; running a series of controlled access tests against the model database by each of the plurality of DBMSs in the network; tracking and recording resulting cost data for each access test; storing said cost data as a logical cost model in a network datadictionary/catalog file in the data storage device; determining an optimum application plan for subsequent distributed database queries relying on the logical cost model stored in said data-dictionary/catalog; and executing subsequent queries to return data from the distributed databases in accordance with said optimum application plan. - View Dependent Claims (19, 20)
-
Specification