DATABASE CONFIGURATION ANALYSIS
First Claim
1. A method comprising:
- a) determining at least two configurations of a database system, each configuration containing one or more tables, indices, and/or views;
b) sampling a first configuration and a second configuration from the at least two configurations;
c) determining a representative workload containing at least one query executable against the database system;
d) sampling a sample workload from at least a portion of the representative workload;
e) estimating a cost of executing the configuration workload based on the first configuration and the second configuration;
f) determining a probability of correct selection of the first configuration and the second configuration;
g) determining a probability of correct selection of the sample workload based at least in part on the probability of correct selection of the first configuration and the second configuration;
h) updating the sample workload if the probability of correct selection of the sample workload does not exceed a target probability threshold; and
i) if the probability of correct selection of the sample workload exceeds the target probability threshold, configuring the data store based on a best configuration of the first configuration and the second configuration based on the estimated cost.
3 Assignments
0 Petitions
Accused Products
Abstract
To determine a configuration for a database system, a plurality of queries may be sampled from a representative workload using statistical inference to compute the probability of correctly selecting one of a plurality of evaluation configurations. The probability of correctly selecting may determine which and/or how many queries to sample, and/or may be compared to a target probability threshold to determine if more queries must be sampled. The configuration from the plurality of configurations with the lowest estimated cost of executing the representative workload may be determined based on the probability of selecting correctly. Estimator variance may be reduced through a stratified sampling scheme that leverages commonality, such as an average cost of execution, between queries based on query templates. The applicability of the Central Limit Theorem may be verified and used to determine which and/or how many queries to sample.
55 Citations
20 Claims
-
1. A method comprising:
-
a) determining at least two configurations of a database system, each configuration containing one or more tables, indices, and/or views;
b) sampling a first configuration and a second configuration from the at least two configurations;
c) determining a representative workload containing at least one query executable against the database system;
d) sampling a sample workload from at least a portion of the representative workload;
e) estimating a cost of executing the configuration workload based on the first configuration and the second configuration;
f) determining a probability of correct selection of the first configuration and the second configuration;
g) determining a probability of correct selection of the sample workload based at least in part on the probability of correct selection of the first configuration and the second configuration;
h) updating the sample workload if the probability of correct selection of the sample workload does not exceed a target probability threshold; and
i) if the probability of correct selection of the sample workload exceeds the target probability threshold, configuring the data store based on a best configuration of the first configuration and the second configuration based on the estimated cost. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer readable medium having computer executable components comprising:
-
a) a data store component for storing a data store against which queries may be executed;
b) an optimizer component for determining a cost of executing a query against the data store component having a given configuration;
c) a workload module for determining a sample workload containing a plurality of queries sampled from a representative workload for the data store component;
d) a configuration module for determining a plurality of evaluation configurations from within a configuration space to be evaluated in configuring the data store component;
e) a cost module for determining an estimated cost of executing the sample workload based on the evaluation configurations and a sampling method; and
f) a probability module for determining a probability of correct selection of the sample workload and outputting the best configuration from the evaluation configurations such that the probability of correct selection of the sample workload is greater than a target probability threshold. - View Dependent Claims (16, 17, 18)
-
-
19. A database configuration system comprising:
-
a) a data store for storing query information;
b) a memory in which machine instructions are stored; and
c) a processor that is coupled to the memory and the data store, the processor executing the machine instructions to carry out a plurality of functions, the machine instructions including;
i) means for determining a sample workload from a representative workload containing a plurality of queries;
ii) means for determining a plurality of evaluation configurations from a configuration space for a database system, each evaluation configuration including any combination of one or more tables, indices, and/or views;
iii) means for estimating a cost of executing the representative workload against the database system having an indicated configuration from the evaluation configurations; and
iv) means for determining a probability of selecting a correct workload such that the probability of selecting a correct workload exceeds a target probability threshold. - View Dependent Claims (20)
-
Specification