System and method for database query optimization
First Claim
1. A computer-implemented method for generating a plan for executing a database query, comprising:
- (a) representing said database query as a query tree including one or more levels of logical expressions, each logical expression including zero or more logical expressions as inputs, a subset of said inputs representing one or more subtrees, each subtree having a top logical expression and zero or more logical expressions as inputs, each level other than a top level having one or more logical expressions that are input to a higher level logical expression at a preceding level, one of said logical expressions representing a root expression, each logical expression including zero or more inputs and zero or more outputs;
(b) for each said logical expression in said query tree, determining group attributes for each said logical expression, said group attributes including a set of characteristic inputs and a set of characteristic outputs, said characteristic inputs representing inputs required for a particular logical expression and zero or more logical expressions at one or more subsequent levels to said particular expression, said characteristic outputs representing outputs generated by a specified logical expression and zero or more logical expressions at one or more preceding levels to said specified logical expression;
(c) storing in a memory a search data structure used to store a plurality of groups, each said group representing equivalent expressions having common group attributes, each said group including at least one logical expression, zero or more equivalent expressions associated therewith, and one or more plans, each of said plans implementing at least one expression associated with each said group;
(d) for each said logical expression in said query tree, storing each said logical expression in a corresponding one of said groups in said search data structure associated with each said logical expression'"'"'s group attributes;
(e) obtaining a plurality of rules for use in generating one or more equivalent expressions or one or more plans;
(f) partitioning said query tree into one or more levels of subproblems, each subproblem including one or more of said logical expressions stored in said search data structure, a first level having one of said subproblems representing said root expression, each subproblem at each subsequent level representing one of said inputs to a corresponding logical expression at a preceding level, each subproblem associated with one of said groups in said search data structure;
(g) determining at least one plan for each of said subproblems, said determining step further comprising the steps of;
(1) selecting a set of one or more rules for application to one or more expressions of a particular one of said subproblems;
(2) generating zero or more equivalent expressions from applying said set of rules to one or more expressions associated with said particular subproblem;
(3) for each said generated equivalent expressions,(i) providing an associated set of group attributes for each said generated equivalent expression,(ii) using said group attributes and said generated equivalent expression to determine if a duplicate expression matching said generated equivalent expression is currently stored in said search data structure, and(iii) storing said generated equivalent expression in a group of said search data structure having equivalent group attributes when no duplicate expression as said generated equivalent expression exists in said search data structure; and
(4) forming one or more plans from one or more expressions associated with said particular subproblem; and
(h) generating a plan for said database query from said plans associated with each of said subproblems.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and method for optimizing a database query with improved performance enhancements is herein disclosed. The database query consists of one or more logical expressions. Through the repeated application of one or more rules, the logical expressions are transformed into physical expressions and in some cases, execution plans that implement the database query. Each expression has associated with it a set of group attributes that specifies its characteristic inputs and outputs and a cost that estimates the computational expense for executing the expression. The group attributes are used to categorize similar expressions into groups that are stored in a search data structure. They are also used to track duplicate expressions. The cost associated with an expression is used to guide the search process to consider those expressions that will produce low cost plans. The cost is estimated in accordance with a six-fold criteria with each criterion weighted to account for the context of the expression and the application'"'"'s particular computing environment. The query optimizer is rule-based including transformation and implementation rules that are used to perform transformations on the logical expressions in a subproblem in order to produce a plan. A OnceGuidance guidance method is used to select a set of rules in certain cases that prevent the regeneration of an existing expression.
188 Citations
22 Claims
-
1. A computer-implemented method for generating a plan for executing a database query, comprising:
-
(a) representing said database query as a query tree including one or more levels of logical expressions, each logical expression including zero or more logical expressions as inputs, a subset of said inputs representing one or more subtrees, each subtree having a top logical expression and zero or more logical expressions as inputs, each level other than a top level having one or more logical expressions that are input to a higher level logical expression at a preceding level, one of said logical expressions representing a root expression, each logical expression including zero or more inputs and zero or more outputs; (b) for each said logical expression in said query tree, determining group attributes for each said logical expression, said group attributes including a set of characteristic inputs and a set of characteristic outputs, said characteristic inputs representing inputs required for a particular logical expression and zero or more logical expressions at one or more subsequent levels to said particular expression, said characteristic outputs representing outputs generated by a specified logical expression and zero or more logical expressions at one or more preceding levels to said specified logical expression; (c) storing in a memory a search data structure used to store a plurality of groups, each said group representing equivalent expressions having common group attributes, each said group including at least one logical expression, zero or more equivalent expressions associated therewith, and one or more plans, each of said plans implementing at least one expression associated with each said group; (d) for each said logical expression in said query tree, storing each said logical expression in a corresponding one of said groups in said search data structure associated with each said logical expression'"'"'s group attributes; (e) obtaining a plurality of rules for use in generating one or more equivalent expressions or one or more plans; (f) partitioning said query tree into one or more levels of subproblems, each subproblem including one or more of said logical expressions stored in said search data structure, a first level having one of said subproblems representing said root expression, each subproblem at each subsequent level representing one of said inputs to a corresponding logical expression at a preceding level, each subproblem associated with one of said groups in said search data structure; (g) determining at least one plan for each of said subproblems, said determining step further comprising the steps of; (1) selecting a set of one or more rules for application to one or more expressions of a particular one of said subproblems; (2) generating zero or more equivalent expressions from applying said set of rules to one or more expressions associated with said particular subproblem; (3) for each said generated equivalent expressions, (i) providing an associated set of group attributes for each said generated equivalent expression, (ii) using said group attributes and said generated equivalent expression to determine if a duplicate expression matching said generated equivalent expression is currently stored in said search data structure, and (iii) storing said generated equivalent expression in a group of said search data structure having equivalent group attributes when no duplicate expression as said generated equivalent expression exists in said search data structure; and (4) forming one or more plans from one or more expressions associated with said particular subproblem; and (h) generating a plan for said database query from said plans associated with each of said subproblems. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A database query optimization system, comprising:
-
a memory for storing a query tree representing a database query, said query tree including one or more levels of logical expressions, each logical expression including zero or more logical expressions as inputs, a subset of said inputs representing one or more subtrees, each level other than a top level having one or more logical expressions that are input to a higher level logical expression at a preceding level, one of said logical expressions representing a root expression, a search data structure including a plurality of groups, each said group representing equivalent expressions having common group attributes, each said group including at least one logical expression from said query tree, zero or more equivalent expressions associated therewith, one or more plans, and one or more contexts, each of said plans implementing at least one expression associated with said group and having an optimization goal and a cost, each of said contexts having an associated optimization goal and representing ones of said plans that are compatible with said context'"'"'s associated optimization goal, and a plurality of rules for use in generating one or more equivalent expressions or one or more plans; a search engine for generating one or more plans that execute said database query, said search engine including instructions that determine group attributes for each said logical expression in said query tree, said group attributes including a set of characteristic inputs and a set of characteristic outputs, said characteristic inputs representing inputs required for a particular logical expression and zero or more logical expressions at one or more subsequent levels to said particular expression, said characteristic outputs representing outputs generated by a specified logical expression and zero or more logical expressions at one or more preceding levels to said specified logical expression, store each said logical expression of said query tree in a corresponding one of said groups in said search data structure associated with said logical expression'"'"'s group attributes, partition said database query into one or more subproblems, each subproblem including one or more of said logical expressions and an optimization goal, a first level having one of said subproblems representing said root expression, each subproblem at each subsequent level representing one of said inputs to a corresponding top logical expression at a preceding level, each of said subproblems associated with one of said groups corresponding to said top logical expression in said subproblem, formulate at least one plan for each of said subproblems from one or more expressions associated with each said subproblems, each said plan formulated by generating zero or more equivalent expressions from applying a set of rules to one or more expressions associated with a particular subproblem, each said generated expression having a set of group attributes associated therewith, said group attributes used to determine if said generated expression exists in said search data structure and to store said generated expression into an appropriate group in said search data structure, and generate a plan for said database query as a combination of each of said subproblem'"'"'s plans. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer readable storage medium for storing data for access by programs being executed on a data processing system, said medium comprising:
-
a query tree representing a database query, said query tree including one or more levels of logical expressions, each logical expression including zero or more logical expressions as inputs, a subset of said inputs representing one or more subtrees, each level other than a top level having one or more logical expressions that are input to a higher level logical expression at a preceding level, one of said logical expressions representing a root expression; a search data structure including a plurality of groups, each said group representing equivalent expressions having common group attributes, each said group including at least one logical expression from said query tree, zero or more equivalent expressions associated therewith, one or more plans, and one or more contexts, each of said plans implementing at least one expression associated with said group and having an optimization goal and a cost, each of said contexts having an associated optimization goal and representing ones of said plans that are compatible with said context'"'"'s associated optimization goal; a plurality of rules for use in generating one or more equivalent expressions or one or more plans; a search engine for generating one or more plans that execute said database query, said search engine including instructions that determine group attributes for each said logical expression in said query tree, said group attributes including a set of characteristic inputs and a set of characteristic outputs, said characteristic inputs representing inputs required for a particular logical expression and zero or more logical expressions at one or more subsequent levels to said particular expression, said characteristic outputs representing outputs generated by a specified logical expression and zero or more logical expressions at one or more preceding levels to said specified logical expression, store each said logical expression of said query tree in a corresponding one of said groups in said search data structure associated with said logical expression'"'"'s group attributes, partition said database query into one or more subproblems, each subproblem including one or more of said logical expressions and an optimization goal, a first level having one of said subproblems representing said root expression, each subproblem at each subsequent level representing one of said inputs to a corresponding top logical expression at a preceding level, each of said subproblems associated with one of said groups corresponding to said top logical expression in said subproblem, formulate at least one plan for each of said subproblems from one or more expressions associated with each said subproblems, each said plan formulated by generating zero or more equivalent expressions from applying a set of rules to one or more expressions associated with a particular subproblem, each said generated expression having a set of group attributes associated therewith, said group attributes used to determine if said generated expression exists in said search data structure and to store said generated expression into an appropriate group in said search data structure, and generate a plan for said database query as a combination of each of said subproblem'"'"'s plans. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
Specification