Method for estimating the cost of query processing
First Claim
1. A computer-based method of determining a cost model, said cost model modeling the cost of executing a query operator, said query operator in a user query to be executed against a database, said method comprising:
- a. determining at least one input feature of a plurality of training queries, said at least one input feature, for which values are either known or unknown, which is assumed to be relevant to said cost model, said input features automatically determined using profiling techniques and said at least one input feature either derived or observed from statistics obtained by traversing a tree representation of data stored in said database;
b. estimating values for said at least one input feature based on statistics Sp stored by said database, andwhen values for said at least one input feature are unknown;
said Sp is a function of a cardinality of a path query p under said tree T, number of p'"'"'s children under T;
number of p'"'"'s descendants under T; and
number of pages requested in order to answer said path query p, and said statistics Sp stored by said database are simple path statistics comprising the cardinality of a simple path under an XML tree, the number of simple path children under an XML tree, the number of simple path descendants under an XML tree, and the number of pages requested in order to answer a simple path query, said simple path expressions defining a linear chain of node tests that are connected by child axes;
c. executing said plurality of training queries containing said query operator and said at least one input feature and determining whether cost of executing said query operator changes as values of said at least one input feature are varied and if so, statistically fitting a model to correlate observed costs of executing said query operator with said at least one input feature; and
d. estimating, based on said model and said at least one input feature, a cost for executing said query operator in said user query.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided is a method for modeling the cost of XML as well as relational operators. As with traditional relational cost estimation, a set of system catalog statistics that summarizes the XML data is exploited; however, the novel use of a set of simple path statistics is also proposed. A new statistical learning technique called transform regression is utilized instead of detailed analytical models to predict the overall cost of an operator. Additionally, a query optimizer in a database is enabled to be self-tuning, automatically adapting to changes over time in the query workload and in the system environment.
-
Citations
15 Claims
-
1. A computer-based method of determining a cost model, said cost model modeling the cost of executing a query operator, said query operator in a user query to be executed against a database, said method comprising:
-
a. determining at least one input feature of a plurality of training queries, said at least one input feature, for which values are either known or unknown, which is assumed to be relevant to said cost model, said input features automatically determined using profiling techniques and said at least one input feature either derived or observed from statistics obtained by traversing a tree representation of data stored in said database; b. estimating values for said at least one input feature based on statistics Sp stored by said database, and when values for said at least one input feature are unknown; said Sp is a function of a cardinality of a path query p under said tree T, number of p'"'"'s children under T;
number of p'"'"'s descendants under T; and
number of pages requested in order to answer said path query p, and said statistics Sp stored by said database are simple path statistics comprising the cardinality of a simple path under an XML tree, the number of simple path children under an XML tree, the number of simple path descendants under an XML tree, and the number of pages requested in order to answer a simple path query, said simple path expressions defining a linear chain of node tests that are connected by child axes;c. executing said plurality of training queries containing said query operator and said at least one input feature and determining whether cost of executing said query operator changes as values of said at least one input feature are varied and if so, statistically fitting a model to correlate observed costs of executing said query operator with said at least one input feature; and d. estimating, based on said model and said at least one input feature, a cost for executing said query operator in said user query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An article of manufacture comprising a computer usable medium having computer readable program code embodied therein which implements the determination of a cost model, said cost model modeling the cost of executing a query operator, said query operator in a user query to be executed against a database, said medium comprising modules implementing:
-
a. determining at least one input feature of a plurality of training queries, said at least one input feature, for which values are either known or unknown, which is assumed to be relevant to said cost model, said input features automatically determined using profiling techniques and said at least one input feature either derived or observed from statistics obtained by traversing a tree representation of data stored in said database; b. estimating values for said at least one input feature based on statistics Sp stored by said database, and when values for said at least one input feature are unknown; said Sp is a function of a cardinality of a path query p under a tree T, number of p'"'"'s children under T;
number of p'"'"'s descendants under T; and
number of pages requested in order to answer said path query p, said statistics Sp stored by said database are simple path statistics comprising the cardinality of a simple path under an XML tree, the number of simple path children under an XML tree, the number of simple path descendants under an XML tree, and the number of pages requested in order to answer a simple path query, said simple path expressions defining a linear chain of node tests that are connected by child axes;c. executing said plurality of training queries containing said query operator and said at least one input feature and determining whether cost of executing said query operator changes as values of said at least one input feature are varied and if so, statistically fitting a model to correlate observed costs of executing said query operator with said at least one input feature; and d. estimating, based on said model and said at least one input feature, a cost for executing said query operator in said user query. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification