Optimization of SQL queries using universal quantifiers, set intersection, and max/min aggregation in the presence of nullable columns
First Claim
1. A method of optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
- (a) propagating column nullability values in the memory of the computer through various SQL operations of the query, wherein the column nullability values are selected from a group comprising null, non-null, and nullable; and
(b) optimizing the SQL operations of the query in the memory of the computer based on the propagated column nullability values.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for optimizing SQL queries by propagating and exploiting column nullability. Column nullability is identified and propagated using a three-valued logic, wherein a column of a table can be identified nullability information is exploited to optimize query operations through transformations. In one aspect of the present invention, quantified predicates (such as ">ALL") are transformed into simple predicates involving singleton subqueries so that indexing can be exploited. In another aspect of the present invention, "is not null" predicates are generated and pushed for certain aggregate queries. In still another aspect of the present invention, intersect operations are transformed into joins. The end result is that the present invention can significantly enhance the performance of the queries.
114 Citations
38 Claims
-
1. A method of optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) propagating column nullability values in the memory of the computer through various SQL operations of the query, wherein the column nullability values are selected from a group comprising null, non-null, and nullable; and (b) optimizing the SQL operations of the query in the memory of the computer based on the propagated column nullability values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) examining the query in the memory of the computer to determine whether it contains a universal quantified predicate between an outer column and an inner column returned by a subquery; (b) examining the query in the memory of the computer to determine nullability values for the outer and inner columns, wherein the nullability values are selected from a group comprising null, non-null, and nullable; (c) creating a derived table in the memory of the computer representing one or more boundary values for the inner column and the nullability value for the inner column, wherein the boundary values are selected from a group comprising a maximum value and a minimum value returned from the subquery, and wherein the derived table contains only one tuple; and (d) transforming the query in the memory of the computer using the derived table to replace the universal quantified predicate with a simple predicate involving a singleton subquery containing the outer column and columns of the derived table. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A method of optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) examining the query in the memory of the computer to determine;
(1) whether the query contains a GROUP BY operation on a table without any GROUP BY columns, (2) whether the query contains only a MAX aggregation function in a SELECT clause of the query, (3) whether a column referenced in the MAX aggregation function has a descending index with a null value as a highest collating value, and (4) whether the column referenced in the MAX aggregation function is nullable; and(b) transforming the query in the memory of the computer to include a new predicate for the column as a Boolean factor in a WHERE clause when the query contains the GROUP BY operation on the table without any GROUP BY columns, the query contains only the MAX aggregation function in the SELECT clause, the column referenced in the MAX aggregation function has a descending index with a null value as a highest collating value, and the column referenced in the MAX aggregation function is nullable, wherein the new predicate is of the form "COLUMN IS NOT NULL."
-
-
20. A method of optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) examining the query in the memory of the computer to determine;
(1) whether the query contains a GROUP BY operation on a table without any GROUP BY columns, (2) whether the query contains only a MIN aggregation function in a SELECT clause of the query, (3) whether a column referenced in the MIN aggregation function has an ascending index with a null value as a lowest collating value, and (4) whether the column referenced in the MIN aggregation function is nullable; and(b) transforming the query in the memory of the computer to include a new predicate for the column as a Boolean factor in a WHERE clause when the query contains the GROUP BY operation on the table without any GROUP BY columns, the query contains only the MIN aggregation function in the SELECT clause, the column referenced in the MIN aggregation function has an ascending index with a null value as a lowest collating value, and the column referenced in the MIN aggregation function is nullable, wherein the new predicate is of the form "COLUMN IS NOT NULL."
-
-
21. A method of optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) examining the query in the memory of the computer to determine whether the query contains an INTERSECT operation and whether an output of the query is distinct; (b) examining the query in the memory of the computer to determine whether each output column of the query has at least one of its corresponding input columns being non-null; and (c) transforming the query in the memory of the computer by replacing the INTERSECT operation with a join query when the query contains an INTERSECT operation, the output of the query is distinct, and each output column of the query has at least one of its corresponding input columns being non-null. - View Dependent Claims (22, 28)
-
-
23. An apparatus for optimizing an SQL query, comprising:
-
(a) a computer having a memory and an electronic storage device coupled thereto, the data storage device storing a relational database; (b) means, performed by the computer, for accepting the SQL query into the memory of the computer, the SQL query being performed by the computer to retrieve data from a relational database stored in the computer; (c) means, performed by the computer, for propagating column nullability values in the memory of the computer through various SQL operations of the query, wherein the column nullability values are selected from a group comprising null, non-null, and nullable; and (d) means, performed by the computer, for optimizing the SQL operations of the query in the memory of the computer based on the propagated column nullability values.
-
-
24. An apparatus for optimizing an SQL query, comprising:
-
(a) a computer having a memory and an electronic storage device coupled thereto, the data storage device storing a relational database; (b) means, performed by the computer, for accepting the SQL query into the memory of the computer, the SQL query being performed by the computer to retrieve data from a relational database stored in the computer; (c) means, performed by the computer, for examining the query in the memory of the computer to determine whether it contains a universal quantified predicate between an outer column and an inner column returned by a subquery; (d) means, performed by the computer, for examining the query in the memory of the computer to determine nullability values for the outer and inner columns, wherein the nullability values are selected from a group comprising null, non-null, and nullable; (e) means, performed by the computer, for creating a derived table in the memory of the computer representing one or more boundary values for the inner column and the nullability value for the inner column, wherein the boundary values are selected from a group comprising a maximum value and a minimum value returned from the subquery, and wherein the derived table contains only one tuple; and (f) means, performed by the computer, for transforming the query in the memory of the computer using the derived table to replace the universal quantified predicate with a simple predicate involving a singleton subquery containing the outer column and columns of the derived table. - View Dependent Claims (25, 26, 27, 29)
-
-
30. An apparatus for optimizing an SQL query, comprising:
-
(a) a computer having a memory and an electronic storage device coupled thereto, the data storage device storing a relational database; (b) means, performed by the computer, for accepting the SQL query into the memory of the computer, the SQL query being performed by the computer to retrieve data from a relational database stored in the computer; (c) means, performed by the computer, for examining the query in the memory of the computer to determine;
(1) whether the query contains a GROUP BY operation on a table without any GROUP BY columns, (2) whether the query contains only a MAX aggregation function in a SELECT clause of the query, (3) whether a column referenced in the MAX aggregation function has a descending index with a null value as a highest collating value, and (4) whether the column referenced in the MAX aggregation function is nullable; and(d) means, performed by the computer, for transforming the query in the memory of the computer to include a new predicate for the column as a Boolean factor in a WHERE clause when the query contains the GROUP BY operation on the table without any GROUP BY columns, the query contains only the MAX aggregation function in the SELECT clause, the column referenced in the MAX aggregation function has a descending index with a null value as a highest collating value, and the column referenced in the MAX aggregation function is nullable, wherein the new predicate is of the form "COLUMN IS NOT NULL."
-
-
31. An apparatus for optimizing an SQL query, comprising:
-
(a) a computer having a memory and an electronic storage device coupled thereto, the data storage device storing a relational database; (b) means, performed by the computer, for accepting the SQL query into the memory of the computer, the SQL query being performed by the computer to retrieve data from a relational database stored in the computer; (c) means, performed by the computer, for examining the query in the memory of the computer to determine;
(1) whether the query contains a GROUP BY operation on a table without any GROUP BY columns, (2) whether the query contains only a MIN aggregation function in a SELECT clause of the query, (3) whether a column referenced in the MIN aggregation function has an ascending index with a null value as a lowest collating value, and (4) whether the column referenced in the MIN aggregation function is nullable; and(d) means, performed by the computer, for transforming the query in the memory of the computer to include a new predicate for the column as a Boolean factor in a WHERE clause when the query contains the GROUP BY operation on the table without any GROUP BY columns, the query contains only the MIN aggregation function in the SELECT clause, the column referenced in the MIN aggregation function has an ascending index with a null value as a lowest collating value, and the column referenced in the MIN aggregation function is nullable, wherein the new predicate is of the form "COLUMN IS NOT NULL."
-
-
32. An apparatus for optimizing an SQL query, comprising:
-
(a) a computer having a memory and an electronic storage device coupled thereto, the data storage device storing a relational database; (b) means, performed by the computer, for accepting the SQL query into the memory of the computer, the SQL query being performed by the computer to retrieve data from a relational database stored in the computer; (c) means, performed by the computer, for examining the query in the memory of the computer to determine whether the query contains an INTERSECT operation and whether an output of the query is distinct; (d) means, performed by the computer, for examining the query in the memory of the computer to determine whether each output column of the query has at least one of its corresponding input columns being non-null; and (e) means, performed by the computer, for transforming the query in the memory of the computer by replacing the INTERSECT operation with a join query when the query contains an INTERSECT operation, the output of the query is distinct, and each output column of the query has at least one of its corresponding input columns being non-null. - View Dependent Claims (33)
-
-
34. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform method steps for optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) propagating column nullability values in the memory of the computer through various SQL operations of the query, wherein the column nullability values are selected from a group comprising null, non-null, and nullable; and (b) optimizing the SQL operations of the query in the memory of the computer based on the propagated column nullability values.
-
-
35. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform method steps for optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) examining the query in the memory of the computer to determine whether it contains a universal quantified predicate between an outer column and an inner column returned by a subquery; (b) examining the query in the memory of the computer to determine nullability values for the outer and inner columns, wherein the nullability values are selected from a group comprising null, non-null, and nullable; (c) creating a derived table in the memory of the computer representing one or more boundary values for the inner column and the nullability value for the inner column, wherein the boundary values are selected from a group comprising a maximum value and a minimum value returned from the subquery, and wherein the derived table contains only one tuple; and (d) transforming the query in the memory of the computer using the derived table to replace the universal quantified predicate with a simple predicate involving a singleton subquery containing the outer column and columns of the derived table.
-
-
36. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform method steps for optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) examining the query in the memory of the computer to determine;
(1) whether the query contains a GROUP BY operation on a table without any GROUP BY columns, (2) whether the query contains only a MAX aggregation function in a SELECT clause of the query, (3) whether a column referenced in the MAX aggregation function has a descending index with a null value as a highest collating value, and (4) whether the column referenced in the MAX aggregation function is nullable; and(b) transforming the query in the memory of the computer to include a new predicate involving the column as a Boolean factor in a WHERE clause when the query contains the GROUP BY operation on the table without any GROUP BY columns, the query contains only the MAX aggregation function in the SELECT clause, the column referenced in the MAX aggregation function has a descending index with a null value as a highest collating value, and the column referenced in the MAX aggregation function is nullable, wherein the new predicate is of the form "COLUMN IS NOT NULL."
-
-
37. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform method steps for optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) examining the query in the memory of the computer to determine;
(1) whether the query contains a GROUP BY operation on a table without any GROUP BY columns, (2) whether the query contains only a MIN aggregation function in a SELECT clause of the query, (3) whether a column referenced in the MIN aggregation function has an ascending index with a null value as a lowest collating value, and (4) whether the column referenced in the MIN aggregation function is nullable; and(b) transforming the query in the memory of the computer to include a new predicate for the column as a Boolean factor in a WHERE clause when the query contains the GROUP BY operation on the table without any GROUP BY columns, the query contains only the MIN aggregation function in the SELECT clause, the column referenced in the MIN aggregation function has an ascending index with a null value as a lowest collating value, and the column referenced in the MIN aggregation function is nullable, wherein the new predicate is of the form "COLUMN IS NOT NULL."
-
-
38. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform method steps for optimizing an SQL query in a computer having a memory, the SQL query being performed by the computer to retrieve data from a relational database stored in an electronic storage device coupled to the computer, the method comprising the steps of:
-
(a) examining the query in the memory of the computer to determine whether the query contains an INTERSECT operation and whether an output of the query is distinct; (b) examining the query in the memory of the computer to determine whether each output column of the query has at least one of its corresponding input columns being non-null; and (c) transforming the query in the memory of the computer by replacing the INTERSECT operation with a join query when the query contains an INTERSECT operation, the output of the query is distinct, and each output column of the query has at least one of its corresponding input columns being non-null.
-
Specification