Dynamic query optimization using partial information
First Claim
1. A method applied post-compilation time for operating a data processing system having a relational database to dynamically optimize and execute a set of operations forming all or part of a query, based on the state of the data at the time of execution, said set comprising a plurality of operations and said method comprising, in sequential order,at the time said set of operations is to be executed, statistically sampling at least one relation resident in said relational database of said data processing system on which said set of operations is to be executed,first executing an algorithm utilizing the statistical samples obtained as a result of said sampling step to determine the order in which said plurality of operations are to be executed based on the samples obtained during said sampling step to optimize said set of operations based on the state of the data at the time of said sampling step, said first executing step including executing the set of operations on the samples in at least one order of execution and selecting the order of execution that results in the best performance for said data processing system, andsecond executing on said data processing system said set of operations on said at least one relation in an order selected by said first executing step.
9 Assignments
0 Petitions
Accused Products
Abstract
A method for executing a query comprising a sequence of operations to be performed on one or more relational databases comprises statistically sampling the relational databases at the times the operations are to be executed and then dynamically optimizing the performance of the operations based on the statistical samples obtained as a result of the sampling step.
234 Citations
4 Claims
-
1. A method applied post-compilation time for operating a data processing system having a relational database to dynamically optimize and execute a set of operations forming all or part of a query, based on the state of the data at the time of execution, said set comprising a plurality of operations and said method comprising, in sequential order,
at the time said set of operations is to be executed, statistically sampling at least one relation resident in said relational database of said data processing system on which said set of operations is to be executed, first executing an algorithm utilizing the statistical samples obtained as a result of said sampling step to determine the order in which said plurality of operations are to be executed based on the samples obtained during said sampling step to optimize said set of operations based on the state of the data at the time of said sampling step, said first executing step including executing the set of operations on the samples in at least one order of execution and selecting the order of execution that results in the best performance for said data processing system, and second executing on said data processing system said set of operations on said at least one relation in an order selected by said first executing step.
-
2. A method applied post-compilation time for operating a data processing system having a relational database and including a plurality of nodes to dynamically optimize and execute sets of operations forming all or part of a query, based on the state of the data at the time of execution, said method comprising, in sequential order,
at each of a plurality of times said sets of operations are to be executed, statistically sampling at least one relation on which said set of operations is to be executed, said at least one relation being distributed over a plurality of nodes and said sampling comprising obtaining at least one sample of each of said at least one relation from each of said nodes, first executing an algorithm utilizing the statistical samples obtained as a result of said sampling step, said step of executing an algorithm including a step of ascertaining a plurality of ranges for the values of at least one attribute to said samples, observing the values of the ranges obtained from said samples to determine whether redistribution is necessary in order to result in the best performance for the data processing system, and, in response to such determining if indicated by the values obtained from said samples, redistributing all or part of said at least one relation among said nodes so that the tuples from said at least one of said relations at each node have a value of said at least one attribute which falls in a corresponding one of said ranges resulting in a better resource utilization, thereby to optimize said set of operations based on the state of the data at the time of said sampling step, and second executing on said data processing system said set of operations on said at least one relation, as optimized by said first executing step.
-
3. A method applied post-compilation time for dynamically optimizing and executing a query of data in a relational database system, which query is divisible into a plurality of query fragments each of which comprises a sequence of at least one operation to be performed by at least one processor on at least one input relational database resident in said at least one processor, the query to be executed to produce at least one resulting relational database, said method comprising the sequential steps of
first statistically sampling in at least one processor at least a first input relational database of a first of said query fragments at the time the first query fragment is to be executed and, based on the first statistical sampling of the data then existing in said first input relational database, executing an algorithm to determine an optimized order of execution of the operations of said first query fragment, executing the operations of said first query fragment in the optimized order to produce at least a first resulting relational database, said first resulting relational database of said first query fragment serving as a second input relational database for a second of said query fragments, and second statistically sampling in at least one processor said second input database at the time the second input query fragment is to be executed and, based on the second statistical sampling of the data then existing in said second input relational database, executing an algorithm by at least one data processor to determine an optimized order of execution of the operations of said second query fragment, and executing the operations of the second query fragment in the optimized order to produce at least a second resulting relational database, wherein the step of executing an algorithm with respect to each of the first and second query fragments includes the steps of executing the operations in the respective query fragment in at least one order of execution on the samples obtained in the respective one of the first and second sampling steps, and selecting an order of execution of operations in the respective query fragments that results in the best performance of the at least one processor.
-
4. A method applied post-compilation time for dynamically optimizing and executing a plurality of operations forming part of a query by a data processing system having a relational database on at least one relation based on the state of the data at the time of execution, said method comprising, in sequential order,
at the time said operations are to be executed, statistically sampling said at least one relation resident in said relational database of said data processing system on which said set of operations is to be executed, utilizing the statistical samples obtained as a result of said sampling step to determine the order in which said operations are to be performed, based on the then existing state of said relations, said utilizing step including the steps of executing the set of operations on the samples in at least one order of execution and selecting the order of execution that results in the best performance for the data processing system, and executing said set of operations on said at least one relation by said data processing system in the order selected by said utilizing step.
Specification