System and method for optimizing database queries
First Claim
1. A method for generating a plan for executing a database query, comprising:
- (a) representing said database query as a query tree including one or more levels of expressions, each expression including zero or more expressions as inputs, each level of expression other than a top level expression having one or more expressions that are input to a higher level expression at a preceding level, the top level expression representing a root expression;
(b) storing in a memory a plurality of implementation and transformation rules, an implementation rule transforming a logical expression into a physical expression, a transformation rule transforming a logical expression into an equivalent logical expression;
(c) partitioning said query tree into one or more levels of subproblems, each subproblem including at least one logical expressions, a top subproblem level having one of said subproblems representing said root expression, each subproblem at each subsequent subproblem level representing one of said inputs to a corresponding higher level expression at a preceding level;
(d) determining an order for analyzing each of said subproblems, said order analyzing said subproblems from a lowest subproblem level to a top subproblem level such that each input is analyzed prior to its corresponding higher level expression;
(e) determining a first plan and a base cost for each subproblem, wherein the first plan comprises only physical expressions; and
(f) using said order, starting at said lowest subproblem level,(1) generating equivalent physical expressions for least one logical expression in each subproblem by applying at least one transformation rule and at least one implementation rule,(i) for each equivalent physical expression determine an associated equivalent-expression cost,(ii) if the equivalent-expression cost is less than the base cost, then denoting the associated equivalent physical expression as a plan and the equivalent-expression cost as the plan cost,(iii) selecting the plan with the lowest plan cost for use in a next higher subproblem level, and(2) repeatedly performing said generating step for higher subproblem levels until a plan is selected for said top subproblem level.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and method for optimizing a database query is herein disclosed. The system consists of a search engine and a database implementor that determines an optimal plan for executing a SQL query. The SQL query is represented as a query tree consisting of a number of nested expressions. The search engine generates a number of plans from which an optimal plan is selected. Plans are generated through the application of a set of rules consisting of implementation and transformation rules. Implementation rules are used to obtain plans. Transformation rules are used to determine equivalent expressions. A plan for the query tree entails finding plans for each expression within the tree where each plan is generated in accordance with a prescribed set of rules. The database implementor selects the set of rules such that more promising plans are generated rather than generating all possible plans. In a preferred embodiment of the invention, multiple passes are made by the search engine in order to determine the optimal plan. In a first pass, implementation rules are used in order to generate a first plan having a cost that is used as a threshold when generating for additional plans. In each subsequent pass, a set of implementation and transformation rules is used to generate one or more plans whose cost does not exceed the threshold. An optimal plan is selected from the generated plans as the one having the lowest cost.
-
Citations
28 Claims
-
1. A method for generating a plan for executing a database query, comprising:
-
(a) representing said database query as a query tree including one or more levels of expressions, each expression including zero or more expressions as inputs, each level of expression other than a top level expression having one or more expressions that are input to a higher level expression at a preceding level, the top level expression representing a root expression; (b) storing in a memory a plurality of implementation and transformation rules, an implementation rule transforming a logical expression into a physical expression, a transformation rule transforming a logical expression into an equivalent logical expression; (c) partitioning said query tree into one or more levels of subproblems, each subproblem including at least one logical expressions, a top subproblem level having one of said subproblems representing said root expression, each subproblem at each subsequent subproblem level representing one of said inputs to a corresponding higher level expression at a preceding level; (d) determining an order for analyzing each of said subproblems, said order analyzing said subproblems from a lowest subproblem level to a top subproblem level such that each input is analyzed prior to its corresponding higher level expression; (e) determining a first plan and a base cost for each subproblem, wherein the first plan comprises only physical expressions; and (f) using said order, starting at said lowest subproblem level, (1) generating equivalent physical expressions for least one logical expression in each subproblem by applying at least one transformation rule and at least one implementation rule, (i) for each equivalent physical expression determine an associated equivalent-expression cost, (ii) if the equivalent-expression cost is less than the base cost, then denoting the associated equivalent physical expression as a plan and the equivalent-expression cost as the plan cost, (iii) selecting the plan with the lowest plan cost for use in a next higher subproblem level, and (2) repeatedly performing said generating step for higher subproblem levels until a plan is selected for said top subproblem level. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for generating at least one plan for executing a database query, said method comprising the steps of:
-
(a) providing a task scheduler that manages execution of tasks placed in a task data structure, said task scheduler selecting one or more tasks for execution from said task data structure; (b) storing in a memory a query tree representing said database query, said query tree containing a plurality of logical expressions, each logical expression including an operator having zero or more inputs, each input being a logical expression, a subset of said expressions representing subtrees having one or more inputs, one such logical expression being a root expression; (c) storing in a memory a set of rules, a first subset of said rules including zero or more transformation rules, a second subset of said rules including zero or more implementation rules, each of said transformation rules transforming a logical expression into an equivalent expression, each of said implementation rules transforming a logical expression into a physical expression; (d) storing in a memory a search data structure for storing a plurality of groups, each group including at least one logical expression from said query tree, zero or more equivalent expressions associated therewith and at least one plan, said equivalent expressions including zero or more equivalent logical expressions and zero or more equivalent physical expressions; (e) placing onto said task data structure an OPTIMIZE-- GROUP task for the group associated with said root expression; (f) when executing an OPTIMIZE-- GROUP task for a specific group; determining if a suitable plan for said specific group exists in said search data structure, and if no suitable plan exists for said specific group, for each logical expression in said specific group, placing onto said task data structure an OPTIMIZE-- EXPRESSION task for each of said logical expressions; (g) when executing an OPTIMIZE-- EXPRESSION task for a specific logical expression; for at least one rule in the first and second subsets of rules, placing on said task data structure an APPLY-- RULE task for said specific logical expression; (h) when executing an APPLY-- RULE task for a p articular expression and a specific rule; (i) applying said specific rule zero or more times producing zero or more new equivalent expressions, (ii) storing each of said new equivalent expressions in said search data structure, (iii) for each new equivalent physical expression, placing onto said task data structure an OPTIMIZE-- INPUTS task for said new equivalent physical expression, and (iv) for each new equivalent logical expression, placing onto said task data structure an OPTIMIZE-- EXPRESSION task for s aid new equivalent logical expression; and (i) when executing an OPTIMIZE-- INPUTS task for a designated expression, providing a plan for said designated expression once plans for each input to said designated expression are obtained. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A database query optimization system, comprising:
-
a query tree representing a database query, said query tree including one or more levels of expressions, each expression including zero or more expressions as inputs, each level of expression other than a top level expression having one or more expressions that are input to a higher level expression at a preceding level, the top level expression representing a root expression; a memory for storing a plurality of implementation and transformation rules, each transformation rule transforming a logical expression into an equivalent logical expression, each implementation rule transforming a logical expression into a physical expression; a search engine for generating one or more plans that execute said database query, said search engine including instructions that partition said database query into one or more subproblems, each subproblem including at least one logical expression, a top subproblem level having one of said subproblems representing said root expression, each subproblem at each subsequent subproblem level representing one of said inputs to a corresponding higher level expression at a preceding level, determine an order for analyzing each of said subproblems from a lowest subproblem level to the top subproblem level such that each input is analyzed prior to its corresponding higher level expression, determine a first plan and a base cost for each subproblem, wherein the first plan comprises only physical expressions; using said order, starting at said lowest subproblem level, (1) generate equivalent physical expressions for at least one logical expression in each subproblem by applying at least one transformation rule and at least one implementation rule, (i) for each equivalent physical expression determine an associated equivalent-expression cost. (ii) if the equivalent-expression cost is less than the base cost, then denote the associated equivalent physical expression as a plan and the equivalent-expression cost as the plan cost, (iii) select the plan with the lowest plan cost for use in a next higher subproblem level, and (2) repeatedly perform said generating step for higher subproblem levels until a plan is selected for said top subproblem level; and a database implementor for selecting one or more of said transformation rules for application to said logical expressions in each of said subproblems. - View Dependent Claims (20, 21, 22, 23)
-
-
24. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
-
a query tree representing a database query, said query tree including one or more levels of expressions, each expression including zero or more expressions as inputs, each level of expression other than a top level having one or more expressions that are input to a higher level expression at a preceding level, the top level expression representing a root expression; a plurality of implementation and transformation rules, each transformation rule transforming a logical expression into an equivalent logical expression, each implementation rule transforming a logical expression into a physical expression; a search engine for generating one or more plans that execute said database query, said search engine including instructions that partition said database query into one or more subproblems, each subproblem including at least one logical expressions, a top subproblem level having one of said subproblems representing said root expression, each subproblem at each subsequent subproblem level representing one of said inputs to a corresponding higher level expression at a preceding level, determine an order for analyzing each of said subproblems from a lowest subproblem level to top subproblem level such that each input is analyzed prior to its corresponding higher level expression, determine a first plan and a base cost for each subproblem, wherein the first plan comprises only physical expressions; using said order, starting at said lowest subproblem level, (1) generate equivalent physical expressions for at least one logical expression in each subproblem by applying at least one transformation rule and at least one implementation rule, (i) for each equivalent physical expression determine an associated equivalent-expression cost, (ii) if the equivalent-expression cost is less than the base cost, then denote the associated equivalent physical expression as a plan and the equivalent-expression cost as the plan cost, (iii) select the plan with the lowest plan cost for use in a next higher subproblem level, and (2) repeatedly perform said generating step for higher subproblem levels until a plan is selected for said top subproblem level; and a database implementor for selecting one or more of said transformation rules for application to said logical expressions in each of said subproblems. - View Dependent Claims (25, 26, 27, 28)
-
Specification