Enumerating projections in SQL queries containing outer and full outer joins in the presence of inner joins
First Claim
1. A method of optimizing an SQL query in a computer having a memory, the SQL query being used by the computer to direct information retrieval from a relational database stored in an electronic storage device, the method comprising the steps of:
- (a) generating an expression tree for the query in the memory of the computer, wherein the query comprises at least one outer join;
(b) removing projection operations from the expression tree in the memory of the computer by moving the projection operations to a top node of the expression tree;
(c) generating projection sets and required sets for the expression tree in the memory of the computer; and
(d) enumerating plans for the expression tree in the memory of the computer, wherein the enumerated plans contain reorderings of the projection operations and binary operations in accordance with the projection sets and required sets.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention discloses a method and apparatus for the enumeration of projections (i.e., "SELECT DISTINCT" operations) in SQL queries containing outer and full outer joins in the presence of inner joins without encountering any regression in performance. The present invention removes projections from a given user query by moving the projections to the top of an expression tree representation of the query, wherein the projection removal is performed using algebraic identities rather than rule-based transformations. The present invention also discloses several methods of enumerating different plans or schedules for projection operations and binary operations in the given user query. The present invention can significantly reduce the execution time of a query by selecting the optimal schedule for binary operations and projections between the binary operations. However, the present invention ensures that there is no regression in performance by comparing the cost of the query with the cost of enumerated plans or schedules, thereby ensuring that the optimizations or transformations do not introduce performance penalties.
76 Citations
39 Claims
-
1. A method of optimizing an SQL query in a computer having a memory, the SQL query being used by the computer to direct information retrieval from a relational database stored in an electronic storage device, the method comprising the steps of:
-
(a) generating an expression tree for the query in the memory of the computer, wherein the query comprises at least one outer join; (b) removing projection operations from the expression tree in the memory of the computer by moving the projection operations to a top node of the expression tree; (c) generating projection sets and required sets for the expression tree in the memory of the computer; and (d) enumerating plans for the expression tree in the memory of the computer, wherein the enumerated plans contain reorderings of the projection operations and binary operations in accordance with the projection sets and required sets. - View Dependent Claims (2, 3)
-
-
4. A method of optimizing an SQL query in a computer having a memory, the SQL query being used by the computer to direct information retrieval from a relational database stored in an electronic storage device, the method comprising the steps of:
-
(a) generating an expression tree for the query in the memory of the computer, wherein the query comprises at least one outer join; and (b) removing projection operations from the expression tree by moving the projection operations to the top of the expression tree in the memory of the computer, further comprising the steps of determining whether a first operand of a binary operation in the expression tree has a projection operation specified therein, determining whether a second operand of the binary operation in the expression tree does not have a projection operation specified therein, and retaining virtual attributes from the second operand for use in subsequent operations in the expression tree when the second operand does not have a projection operation specified therein. - View Dependent Claims (5, 6, 7, 8, 9)
-
-
10. A method of optimizing an SQL query in a computer having a memory, the SQL query being used by the computer to direct information retrieval from a relational database stored in an electronic storage device, the method comprising the steps of:
-
(a) accepting an SQL query from an operator into the memory of the computer; (b) translating the SQL query into a expression tree in the memory of the computer, wherein the expression tree is comprised of interior nodes representing operations and leaves representing base relations, and wherein the expression tree is comprised of one or more expression subtrees; (c) generating an exhaustive enumeration of all possible plans for the expression tree in the memory of the computer, wherein the enumerated plans comprise reorderings of binary and projection operations performed by the query, the generating step further comprising the step of constructing the plans incrementally from the bottom up using the expression tree by combining, at each step, two subtrees of the expression tree to obtain a new subtree of the expression tree provided all conditions specified in the two subtrees are satisfied, and by scheduling a distinct projection operation at a root node of the new subtree.
-
-
11. A method of optimizing an SQL query in a computer having a memory, the SQL query being used by the computer to direct information retrieval from a relational database stored in an electronic storage device, the method comprising the steps of:
-
(a) accepting an SQL query from an operator into the memory of the computer; (b) translating the SQL query into an expression tree in the memory of the computer, wherein the expression tree is comprised of interior nodes representing operations and leaves representing base relations, and wherein the expression tree is comprised of one or more expression subtrees; and (c) using a heuristic to enumerate a plurality of plans for the expression tree in the memory of the computer, wherein the enumerated plans comprise all possible schedules for join operations and a limited number of schedules for distinct projection operations, wherein the using step further comprises the steps of; (1) pushing projection operations down the expression tree in the memory of the computer; (2) constructing a projection set in the memory of the computer from the expression tree, wherein the projection set is comprised of distinct projection operations specified in the expression tree; (3) constructing a required set in the memory of the computer for each subtree of the expression tree containing a distinct projection operation at a top of the subtree, wherein the required set is comprised of all relations that must be present in the subtree before the distinct projection operation can be scheduled at a root node of the subtree; and (4) enumerating plans for the expression tree in the memory of the computer, wherein the enumerated plans contain reorderings of the projection operations and binary operations in accordance with the projection sets and required sets. - View Dependent Claims (12, 13)
-
-
14. An apparatus for optimizing an SQL query, comprising:
-
(a) a computer having a memory and an electronic storage device coupled thereto, the data storage device storing a relational database; (b) means, performed by the computer, for accepting the SQL query into the memory of the computer, the SQL query being performed by the computer to retrieve data from a relational database stored in the computer; (c) means, performed by the computer, for generating an expression tree for the query in the memory of the computer, wherein the query comprises at least one outer join; (d) means, performed by the computer, for removing projection operations from the expression tree in the memory of the computer by moving the projection operations to a top node of the expression tree; (e) means, performed by the computer, for generating projection sets and required sets for the expression tree in the memory of the computer; and (f) means, performed by the computer, for enumerating plans for the expression tree in the memory of the computer, wherein the enumerated plans contain reorderings of the projection operations and binary operations in accordance with the projection sets and required sets. - View Dependent Claims (15, 16)
-
-
17. An apparatus for optimizing an SQL query, comprising:
-
(a) a computer having a memory and an electronic storage device coupled thereto, the data storage device storing a relational database; (b) means, performed by the computer, for accepting the SQL query into the memory of the computer, the SQL query being performed by the computer to retrieve data from a relational database stored in the computer; (c) means, performed by the computer, for generating an expression tree for the query in the memory of the computer, wherein the query is comprised of at least one outer join; and (d) means, performed by the computer, for removing projection operations from the expression tree by moving the projection operations to the top of the expression tree in the memory of the computer, further comprising means for determining whether a first operand of a binary operation in the expression tree has a projection operation specified therein, means for determining whether a second operand of the binary operation in the expression tree does not have a projection operation specified therein, and means for retaining virtual attributes from the second operand for use in subsequent operations in the expression tree when the second operand does not have a projection operation specified therein. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. An apparatus for optimizing an SQL query, comprising:
-
(a) a computer having a memory and an electronic storage device coupled thereto, the data storage device storing a relational database; (b) means, performed by the computer, for accepting the SQL query into the memory of the computer, the SQL query being performed by the computer to retrieve data from a relational database stored in the computer; (c) means, performed by the computer, for translating the SQL query into a expression tree in the memory of the computer, wherein the expression tree is comprised of interior nodes representing operations and leaves representing base relations, and wherein the expression tree is comprised of one or more expression subtrees; (d) means, performed by the computer, for generating an exhaustive enumeration of all possible plans for the expression tree in the memory of the computer, wherein the enumerated plans comprise reorderings of binary and projection operations performed by the query, the means for generating further comprising means for constructing the plans incrementally from the bottom up using the expression tree by combining, at each step, two subtrees of the expression tree to obtain a new subtree of the expression tree provided all conditions specified in the two subtrees are satisfied, and means for scheduling a distinct projection operation at a root node of the new subtree.
-
-
24. An apparatus for optimizing an SQL query, comprising:
-
(a) a computer having a memory and an electronic storage device coupled thereto, the data storage device storing a relational database; (b) means, performed by the computer, for accepting the SQL query into the memory of the computer, the SQL query being performed by the computer to retrieve data from a relational database stored in the computer; (c) means, performed by the computer, for translating the SQL query into an expression tree in the memory of the computer, wherein the expression tree is comprised of interior nodes representing operations and leaves representing base relations, and wherein the expression tree is comprised of one or more expression subtrees; and (d) means, performed by the computer, for using a heuristic to enumerate a plurality of plans for the expression tree in the memory of the computer, wherein the enumerated plans comprise all possible schedules for join operations and a limited number of schedules for distinct protection operations, wherein the means for using comprises; (1) means, performed by the computer, for pushing projection operations down the expression tree in the memory of the computer; (2) means, performed by the computer, for constructing a projection set in the memory of the computer from the expression tree, wherein the projection set is comprised of distinct projection operations specified in the expression tree; (3) means, performed by the computer, for constructing a required set in the memory of the computer for each subtree of the expression tree containing a distinct projection operation at a top of the subtree, wherein the required set is comprised of all relations that must be present in the subtree before the distinct projection operation can be scheduled at a root node of the subtree; and (4) means, performed by the computer, for enumerating plans for the expression tree in the memory of the computer, wherein the enumerated plans contain reorderings of the projection operations and binary operations in accordance with the projection sets and required sets. - View Dependent Claims (25, 26)
-
-
27. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform method steps for optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) generating an expression tree for the query in the memory of the computer; (b) removing projection operations from the expression tree in the memory of the computer by moving the projection operations to a top node of the expression tree; (c) generating projection sets and required sets for the expression tree in the memory of the computer; and (d) enumerating plans for the expression tree in the memory of the computer, wherein the enumerated plans contain reorderings of the projection operations and binary operations in accordance with the projection sets and required sets. - View Dependent Claims (28, 29)
-
-
30. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform method steps for optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) generating an expression tree for the query in the memory of the computer, wherein the query comprises at least one outer join; and (b) removing projection operations from the expression tree by moving the projection operations to the top of the expression tree in the memory of the computer, further comprising the steps of determining whether a first operand of a binary operation in the expression tree has a projection operation specified therein, determining whether a second operand of the binary operation in the expression tree does not have a projection operation specified therein, and retaining virtual attributes from the second operand for use in subsequent operations in the expression tree when the second operand does not have a projection operation specified therein. - View Dependent Claims (31, 32, 33, 34, 35)
-
-
36. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform method steps for optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) accepting an SQL query from an operator into the memory of the computer; (b) translating the SQL query into a expression tree in the memory of the computer, wherein the expression tree is comprised of interior nodes representing operations and leaves representing base relations, and wherein the expression tree is comprised of one or more expression subtrees; (c) generating an exhaustive enumeration of all possible plans for the expression tree in the memory of the computer, wherein the enumerated plans comprise reorderings of binary and projection operations performed by the query, the generating step further comprising the step of constructing the plans incrementally from the bottom up using the expression tree by combining, at each step, two subtrees of the expression tree to obtain a new subtree of the expression tree provided all conditions specified in the two subtrees are satisfied, and by scheduling a distinct projection operation at a root node of the new subtree.
-
-
37. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform method steps for optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) accepting an SQL query from an operator into the memory of the computer; (b) translating the SQL query into an expression tree in the memory of the computer, wherein the expression tree is comprised of interior nodes representing operations and leaves representing base relations, and wherein the expression tree is comprised of one or more expression subtrees; and (c) using a heuristic to enumerate a plurality of plans for the expression tree in the memory of the computer, wherein the enumerated plans comprise all possible schedules for join operations and a limited number of schedules for distinct protection operations, wherein the using step comprises the steps of; (1) pushing projection operations down the expression tree in the memory of the computer; (2) constructing a projection set in the memory of the computer from the expression tree, wherein the projection set is comprised of distinct projection operations specified in the expression tree; (3) constructing a required set in the memory of the computer for each subtree of the expression tree containing a distinct projection operation at a top of the subtree, wherein the required set is comprised of all relations that must be present in the subtree before the distinct projection operation can be scheduled at a root node of the subtree; and (4) enumerating plans for the expression tree in the memory of the computer, wherein the enumerated plans contain reorderings of the projection operations and binary operations in accordance with the projection sets and required sets. - View Dependent Claims (38, 39)
-
Specification