Database configuration analysis
First Claim
1. A method of configuring a database through a database configuration system, the database configuration system having a computing device, the method comprising:
- a) determining at least two configurations of a database system, each configuration containing one or more tables, indices, and/or views;
b) sampling a first configuration and a second configuration from the at least two configurations;
c) determining a representative workload containing at least one query executable against the database system;
d) sampling a sample workload from at least a portion of the representative workload;
e) estimating a cost of executing the sample workload based on the first configuration and the second configuration;
f) determining a probability of correct selection of the first configuration and the second configuration;
g) determining a probability of correct selection of the sample workload based at least in part on the probability of correct selection of the first configuration and the second configuration and a sensitivity parameter that represents an estimated difference in cost between the first configuration and the second configuration;
h) updating the sample workload when the probability of correct selection of the sample workload does not exceed a target probability threshold; and
i) when the probability of correct selection of the sample workload exceeds the target probability threshold, configuring the data store based on a best configuration of the first configuration and the second configuration based on the estimated cost.
3 Assignments
0 Petitions
Accused Products
Abstract
To determine a configuration for a database system, a plurality of queries may be sampled from a representative workload using statistical inference to compute the probability of correctly selecting one of a plurality of evaluation configurations. The probability of correctly selecting may determine which and/or how many queries to sample, and/or may be compared to a target probability threshold to determine if more queries must be sampled. The configuration from the plurality of configurations with the lowest estimated cost of executing the representative workload may be determined based on the probability of selecting correctly. Estimator variance may be reduced through a stratified sampling scheme that leverages commonality, such as an average cost of execution, between queries based on query templates. The applicability of the Central Limit Theorem may be verified and used to determine which and/or how many queries to sample.
-
Citations
20 Claims
-
1. A method of configuring a database through a database configuration system, the database configuration system having a computing device, the method comprising:
-
a) determining at least two configurations of a database system, each configuration containing one or more tables, indices, and/or views; b) sampling a first configuration and a second configuration from the at least two configurations; c) determining a representative workload containing at least one query executable against the database system; d) sampling a sample workload from at least a portion of the representative workload; e) estimating a cost of executing the sample workload based on the first configuration and the second configuration; f) determining a probability of correct selection of the first configuration and the second configuration; g) determining a probability of correct selection of the sample workload based at least in part on the probability of correct selection of the first configuration and the second configuration and a sensitivity parameter that represents an estimated difference in cost between the first configuration and the second configuration; h) updating the sample workload when the probability of correct selection of the sample workload does not exceed a target probability threshold; and i) when the probability of correct selection of the sample workload exceeds the target probability threshold, configuring the data store based on a best configuration of the first configuration and the second configuration based on the estimated cost. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer readable storage medium having computer executable components comprising:
-
a) a data store component configured to store a data store against which queries may be executed; b) an optimizer component configured to determine a cost of executing a query against the data store component having a given configuration; c) a workload module configured to determine a sample workload containing a plurality of queries sampled from a representative workload for the data store component; d) a configuration module configured to determine a plurality of evaluation configurations from within a configuration space to be evaluated in configuring the data store component; e) a cost module configured to determine an estimated cost of executing the sample workload based on the evaluation configurations and a sampling method; and f) a probability module configured to determine a probability of correct selection of the sample workload and to output the best configuration, that satisfies a sensitivity parameter, from the evaluation configurations without determining a true difference in costs between the evaluation configurations, such that the probability of correct selection of the sample workload is greater than a target probability threshold. - View Dependent Claims (16, 17, 18)
-
-
19. A database configuration system comprising:
-
a) a data store for storing query information; b) a memory in which machine instructions are stored; and c) a processor that is coupled to the memory and the data store, the processor executing the machine instructions to carry out a plurality of functions, the machine instructions including; i) means for determining a sample workload from a representative workload containing a plurality of queries; ii) means for determining a plurality of evaluation configurations from a configuration space for a database system, individual evaluation configurations including any combination of one or more tables, indices, and/or views; iii) means for estimating a cost of executing the sample workload against the database system having an individual evaluation configuration, the cost of executing the sample workload is a time to execute the plurality of queries and includes an upper bound and a lower bound, the means for estimating further configured to employ a sensitivity parameter to avoid sampling a large fraction of the sample workload for evaluation configurations with similar costs; and iv) means for determining a probability of selecting a correct workload, based on the upper and lower bounds of the cost of executing, such that the probability of selecting a correct workload exceeds a target probability threshold. - View Dependent Claims (20)
-
Specification