Method, system, and program for optimizing the processing of queries involving set operators
First Claim
Patent Images
1. A method for processing a query including query operations and a set operation on two input tables, comprising:
- applying two or more distribution rules at one time to transform the query, wherein the distribution rules correspond to the query operations;
performing the query operations on each input table separately to produce two intermediate result tables, wherein the query operations exploit available indexes; and
applying the set operator on the two intermediate result tables to produce a final result table that is a same result table that would have been produced by performing the query operations on a result table derived from the set operation performed on the two input tables.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided is a method, system, and program for processing a query including a query operation on a table derived from a set operation on two result tables. The query operation is performed on each result table separately to produce two intermediate result tables. The set operator is then applied to the two intermediate result tables to produce a final result table that is a same result table that would have been produced by performing the query operation on the table derived from the set operation performed on the two result tables.
70 Citations
49 Claims
-
1. A method for processing a query including query operations and a set operation on two input tables, comprising:
-
applying two or more distribution rules at one time to transform the query, wherein the distribution rules correspond to the query operations;
performing the query operations on each input table separately to produce two intermediate result tables, wherein the query operations exploit available indexes; and
applying the set operator on the two intermediate result tables to produce a final result table that is a same result table that would have been produced by performing the query operations on a result table derived from the set operation performed on the two input tables. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16)
merging the query operations with each subselect query;
performing each merged query operation on the at least one base table subject to the subselect to produce the two intermediate result tables.
-
-
4. The method of claim 3, wherein indexes are available for the base tables subject to the subselect queries, further comprising:
using the indexes when performing each merged query operation on the at least one base table.
-
5. The method of claim 1, wherein the set operation comprises one of a UNION ALL, EXCEPT ALL or INTERSECT ALL operation.
-
6. The method of claim 5, wherein at least one of the query operations comprises a WHERE clause including predicates, and wherein the WHERE clause is applied to each of the input tables separately to generate the two intermediate result tables.
-
7. The method of claim 5, wherein at least one of the query operations comprises a WHERE clause including predicates to apply to the table derived from the set operation on the two input tables, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising:
distributing the WHERE clause to each of the subselect queries to generate intermediate subselect queries, wherein the WHERE clause is distributed to each subselect to generate the intermediate result tables to which the set operation is applied.
-
8. The method of claim 4, wherein the at least one of the query operations comprises a WHERE clause including predicates to apply to the table derived from the set operation on the two input tables and wherein the WHERE clause is part of a parent query block, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising:
-
distributing the WHERE clause to each of the subselect queries to generate first intermediate subselect queries;
merging the parent query block with each of the first intermediate subselect queries to generate second intermediate subselect queries; and
executing the second intermediate subselect queries against the input tables to produce the intermediate result tables to which the set operation is applied.
-
-
9. The method of claim 1, wherein at least one of the query operations comprises a column function and the set operation comprises a UNION ALL, wherein performing the query operations comprises:
-
performing the column function on each of the input tables separately to produce at least one value for each input table subject to the set operation; and
aggregating the at least one value for each of the input tables to produce at least one aggregate value that is a same as the at least one value that would have been produced by applying the column function to the UNION ALL of the input tables.
-
-
10. The method of claim 9, wherein the column function comprises an AVERAGE function on one column, and wherein performing the column function on each of the input tables comprises:
-
determining a sum of all the values in the column in each of the input tables to produce two intermediate sum values for the two input tables;
determining a number of rows in each of the input tables to produce two intermediate count values; and
dividing a total of the two intermediate sum values by a total of the two intermediate count numbers to produce the AVERAGE on the column of the UNION ALL of the two input tables.
-
-
12. The method of claim 4, wherein at least one of the query operations comprises a JOIN of a table to the table derived from the set operation on the two input tables, and wherein the JOIN of the table is applied to each of the input tables separately to produce the two intermediate result tables.
-
13. The method of claim 4, wherein at least one of the query operations comprises a JOIN of a table to the table derived from the set operation on the two input tables, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising:
distributing the JOIN of the table to each of the subselect queries to generate intermediate subselect queries, wherein the distributed JOIN is performed on each subselect to generate the intermediate result tables to which the set operation is applied.
-
14. The method of claim 13, wherein the set operation comprises one of a UNION, INTERSECT, and EXCEPT, wherein an intermediate final result table comprises the tables derived from the set operation on the intermediate result tables, further comprising:
-
determining duplicate rows in the intermediate final result table; and
removing the duplicate rows from the intermediate final result table to generate the final result table.
-
-
16. The method of claim 14, wherein the duplicate rows are removed before the final result table is materialized in a storage device.
-
11. A method for processing a query including a query operation on a table derived from a set operation on two input tables, comprising:
-
performing the query operation on each input table separately to produce two intermediate result tables, wherein the query operation comprises a column function and a set operation comprises a UNION ALL, and wherein performing the query operation comprises;
(i) performing the column function on each of the input tables separately to produce at least one value for each input table subject to the set operation, wherein the column function comprises a STANDARD DEVIATION function on one column, and wherein performing the column function on each of the input tables comprises;
(a) determining a sum of all the values in the column in each of the input tables to produce two intermediate sum values for the two input tables;
(b) determining a number of rows in each of the input tables to produce two count values;
(c) determining a sum of all the values squared in the column in each of the input tables to produce two intermediate sum of squared values for the two intermediate result tables;
(d) setting a total of the two count values to n;
(e) setting a total of the two intermediate sum values to Σ
x ;
(f) setting a total of the two intermediate sums of the squared values to Σ
x2; and
(g) calculating the standard deviation on the column of the UNION ALL of the two input tables as;
(ii) aggregating the at least one value for each of the input tables to produce at least one aggregate value that is a same as the at least one value that would have been produced by applying the column function to the UNION ALL of the input tables; and
applying the set operator on the two intermediate result tables to produce a final result table that is a same result table that would have been produced by performing the query operation on a result table derived from the set operation performed on the two input tables.
-
-
15. A method for processing a query including a query operation on a table derived from a set operation on two input tables, comprising:
-
performing the query operation on each input table separately to produce two intermediate result tables, wherein each input table results from a subselect query on at least one base table, and wherein performing the query operation on each input table to produce the two intermediate result tables comprises;
(i) merging the query operation with each subselect query; and
(ii) performing each merged query operation on the at least one base table subject to the subselect to produce the two intermediate result tables, wherein indexes are available for the base tables subject to the subselect queries, and wherein the indexes are used when performing each merged query operation on the at least one base table;
applying the set operator on the two intermediate result tables to produce a final result table that is a same result table that would have been produced by performing the query operation on a result table derived from the set operation performed on the two input tables;
wherein the query operation comprises a JOIN of a table to the table derived from the set operation on the two input tables, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising;
(i) distributing the JOIN of the table to each of the subselect queries to generate intermediate subselect queries, wherein the distributed JOIN is performed on each subselect to generate the intermediate result tables to which the set operation is applied, wherein distributing the join of the table comprises;
(a) generating an extended join table including one column having row identifiers for each row in the table to join, wherein the extended join table is applied in the JOIN operation to each of the intermediate result tables separately, wherein the determined duplicate rows have the same value for the row identifier column and the same values for the columns from the result tables included in the intermediate final result; and
wherein the set operation comprises one of a UNION, INTERSECT, and EXCEPT, wherein an intermediate final result table comprises the tables derived from the set operation on the intermediate result tables, further comprising;
(i) determining duplicate rows in the intermediate final result table; and
(ii) removing the duplicate rows from the intermediate final result table to generate the final result table.
-
-
17. A system for processing a database query including query operations and a set operation on two input tables, comprising:
-
a computer readable medium including input tables;
means for applying two or more distribution rules at one time to transform the query, wherein the distribution rules correspond to the query operations;
means for performing the query operations on each input table separately to produce two intermediate result tables in the computer readable medium, wherein the query operations exploit available indexes; and
means for applying the set operator on the two intermediate result tables to produce a final result table in the computer readable medium that is a same result table that would have been produced by performing the query operations on a result table derived from the set operation performed on the two input tables. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 32)
a storage device, wherein the final result table is generated in the computer readable medium without having to materialize in the storage device any intermediate result tables.
-
-
19. The system of claim 17, further comprising:
-
a storage device including at least one base table, wherein each input table results from a subselect query on at least one base table, wherein the means for performing the query operations on each input table to produce the two intermediate result tables further performs;
merging the query operations with each subselect query;
performing each merged query operation on the at least one base table subject to the subselect to produce the two intermediate result tables.
-
-
20. The system of claim 19, wherein the computer readable medium further includes indexes available for the base tables subject to the subselect queries, further comprising:
means for using the indexes when performing each merged query operation on the at least one base table.
-
21. The system of claim 17, wherein the set operation comprises one of a UNION ALL, EXCEPT ALL or INTERSECT ALL operation.
-
22. The system of claim 21, wherein at least one of the query operations comprises a WHERE clause including predicates, and wherein the WHERE clause is applied to each of the input tables separately to generate the two intermediate result tables.
-
23. The system of claim 21, wherein at least one of the query operations comprises a WHERE clause including predicates to apply to the table derived from the set operation on the two input tables, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising:
means for distributing the WHERE clause to each of the subselect queries to generate intermediate subselect queries, wherein the WHERE clause is distributed to each subselect to generate the intermediate result tables to which the set operation is applied.
-
24. The system of claim 20, wherein at least one of the query operations comprises a WHERE clause including predicates to apply to the table derived from the set operation on the two input tables and wherein the WHERE clause is part of a parent query block, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising:
-
means for distributing the WHERE clause to each of the subselect queries to generate first intermediate subselect queries;
means for merging the parent query block with each of the first intermediate subselect queries to generate second intermediate subselect queries; and
means for executing the second intermediate subselect queries against the input tables to produce the intermediate result tables to which the set operation is applied.
-
-
25. The system of claim 17, wherein at least one of the query operations comprises a column function and the set operation comprises a UNION ALL, wherein the means for performing the query operations performs:
-
performing the column function on each of the input tables separately to produce at least one value for each input table subject to the set operation; and
aggregating the at least one value for each of the input tables to produce at least one aggregate value that is a same as the at least one value that would have been produced by applying the column function to the UNION ALL of the input tables.
-
-
26. The system of claim 25, wherein the column function comprises an AVERAGE function on one column, and wherein the means for performing the column function on each of the input tables further performs:
-
determining a sum of all the values in the column in each of the input tables to produce two intermediate sum values for the two input tables;
determining a number of rows in each of the input tables to produce two intermediate count values; and
dividing a total of the two intermediate sum values by a total of the two intermediate count numbers to produce the AVERAGE on the column of the UNION ALL of the two input tables.
-
-
28. The system of claim 20, wherein at least one of the query operations comprises a JOIN of a table to the table derived from the set operation on the two input tables, and wherein the JOIN of the table is applied to each of the input tables separately to produce the two intermediate result tables.
-
29. The system of claim 20, wherein at least one of the query operations comprises a JOIN of a table to the table derived from the set operation on the two input tables, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising:
means for distributing the JOIN of the table to each of the subselect queries to generate intermediate subselect queries, wherein the distributed JOIN is performed on each subselect to generate the intermediate result tables to which the set operation is applied.
-
30. The system of claim 29, wherein the set operation comprises one of a UNION, INTERSECT, and EXCEPT, and wherein an intermediate final result table comprises the tables derived from the set operation on the intermediate result tables, further comprising:
-
means for determining duplicate rows in the intermediate final result table; and
means for removing the duplicate rows from the intermediate final result table to generate the final result table.
-
-
32. The system of claim 30, wherein the duplicate rows are removed before the final result table is materialized in a storage device.
-
27. A system for processing a database query including a query operation on a table derived from a set operation on two input tables, comprising:
-
a computer readable medium including input tables;
means for performing the query operation on each input table separately to produce two intermediate result tables in the computer readable medium, wherein the query operation comprises a column function and the set operation comprises a UNION ALL, wherein the means for performing the query operation performs;
(i) performing the column function on each of the input tables separately to produce at least one value for each input table subject to the set operation, wherein the column function comprises a STANDARD DEVIATION function on one column, and wherein the means for performing the column function on each of the input tables performs;
(a) determining a sum of all the values in the column in each of the input tables to produce two intermediate sum values for the two input tables;
(b) determining a number of rows in each of the input tables to produce two count values;
(c) determining a sum of all the values squared in the column in each of the input tables to produce two intermediate sum of squared values for the two intermediate result tables;
(d) setting a total of the two count values to n;
(e) setting a total of the two intermediate sum values to Σ
x; and
(f) setting a total of the two intermediate sums of the squared values to Σ
x2;
calculating the standard deviation on the column of the UNION ALL of the two input tables as;
and (ii) aggregating the at least one value for each of the input tables to produce at least one aggregate value that is a same as the at least one value that would have been produced by applying the column function to the UNION ALL of the input tables; and
means for applying the set operator on the two intermediate result tables to produce a final result table in the computer readable medium that is a same result table that would have been produced by performing the query operation on a result table derived from the set operation performed on the two input tables.
-
-
31. A system for processing a database query including a query operation on a table derived from a set operation on two input tables, comprising:
-
a computer readable medium including input tables;
means for performing the query operation on each input table separately to produce two intermediate result tables in the computer readable medium, a storage device including at least one base table, wherein each input table results from a subselect query on at least one base table, wherein the means for performing the query operation on each input table to produce the two intermediate result tables further performs;
(i) merging the query operation with each subselect query;
(ii) performing each merged query operation on the at least one base table subject to the subselect to produce the two intermediate result tables, wherein the computer readable medium further includes indexes available for the base tables subject to the subselect queries, and wherein the indexes are used when performing each merged query operation on the at least one base table;
means for applying the set operator on the two intermediate result tables to produce a final result table in the computer readable medium that is a same result table that would have been produced by performing the query operation on a result table derived from the set operation performed on the two input tables;
wherein the query operation comprises a JOIN of a table to the table derived from the set operation on the two input tables, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising;
(i) means for distributing the JOIN of the table to each of the subselect queries to generate intermediate subselect queries, wherein the distributed JOIN is performed on each subselect to generate the intermediate result tables to which the set operation is applied,
wherein the means for distributing the join of the table performs;
(a) generating an extended join table including one column having row identifiers for each row in the table to join, wherein the extended join table is applied in the JOIN operation to each of the intermediate result tables separately, wherein the determined duplicate rows have the same value for the row identifier column and the same values for the columns from the result tables included in the intermediate final result; and
wherein the set operation comprises one of a UNION, INTERSECT, and EXCEPT, and wherein an intermediate final result table comprises the tables derived from the set operation on the intermediate result tables, further comprising;
(i) means for determining duplicate rows in the intermediate final result table; and
(ii) means for removing the duplicate rows from the intermediate final result table to generate the final result table.
-
-
33. An article of manufacture including code for processing a query including a query operations and a set operation on two input tables by:
-
applying two or more distribution rules at one time to transform the query, wherein the distribution rules correspond to the query operations;
performing the query operations on each input table separately to produce two intermediate result tables, wherein the query operations exploit available indexes; and
applying the set operator on the two intermediate result tables to produce a final result table that is a same result table that would have been produced by performing the query operations on a result table derived from the set operation performed on the two input tables. - View Dependent Claims (34, 35, 36, 37, 38, 40, 41, 42, 44, 45, 46, 48)
merging the query operations with each subselect query;
performing each merged query operation on the at least one base table subject to the subselect to produce the two intermediate result tables.
-
-
36. The article of manufacture of claim 35, wherein indexes are available for the base tables subject to the subselect queries, further comprising:
using the indexes when performing each merged query operation on the at least one base table.
-
37. The article of manufacture of claim 33, wherein the set operation comprises one of a UNION ALL, EXCEPT ALL or INTERSECT ALL operation.
-
38. The article of manufacture of claim 37, wherein at least one of the query operations comprises a WHERE clause including predicates, and wherein the WHERE clause is applied to each of the input tables separately to generate the two intermediate result tables.
-
40. The article of manufacture of claim 36, wherein at least one of the query operations comprises a WHERE clause including predicates to apply to the table derived from the set operation on the two input tables and wherein the WHERE clause is part of a parent query block, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising:
-
distributing the WHERE clause to each of the subselect queries to generate first intermediate subselect queries;
merging the parent query block with each of the first intermediate subselect queries to generate second intermediate subselect queries; and
executing the second intermediate subselect queries against the input tables to produce the intermediate result tables to which the set operation is applied.
-
-
41. The article of manufacture of claim 33, wherein at least one of the query operations comprises a column function and the set operation comprises a UNION ALL, wherein performing the query operations comprises:
-
performing the column function on each of the input tables separately to produce at least one value for each input table subject to the set operation; and
aggregating the at least one value for each of the input tables to produce at least one aggregate value that is a same as the at least one value that would have been produced by applying the column function to the UNION ALL of the input tables.
-
-
42. The article of manufacture of claim 41, wherein the column function comprises an AVERAGE function on one column, and wherein performing the column function on each of the input tables comprises:
-
determining a sum of all the values in the column in each of the input tables to produce two intermediate sum values for the two input tables;
determining a number of rows in each of the input tables to produce two intermediate count values; and
dividing a total of the two intermediate sum values by a total of the two intermediate count numbers to produce the AVERAGE on the column of the UNION ALL of the two input tables.
-
-
44. The article of manufacture of claim 36, wherein at least one of the query operations comprises a JOIN of a table to the table derived from the set operation on the two input tables, and wherein the JOIN of the table is applied to each of the input tables separately to produce the two intermediate result tables.
-
45. The article of manufacture of claim 36, wherein at least one of the query operations comprises a JOIN of a table to the table derived from the set operation on the two input tables, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further compromising:
distributing the JOIN of the table to each of the subselect queries to generate intermediate subselect queries, wherein the distributed JOIN is performed on each subselect to generate the intermediate result tables to which the set operation is applied.
-
46. The article of manufacture of claim 45, wherein the set operation comprises one of a UNION, INTERSECT, and EXCEPT, wherein an intermediate final result table comprises the tables derived from the set operation on the intermediate result tables, further comprising:
-
determining duplicate rows in the intermediate final result table; and
removing the duplicate rows from the intermediate final result table to generate the final result table.
-
-
48. The article of manufacture of claim 46, wherein the duplicate rows are removed before the final result table is materialized in a storage device.
-
39. The article of manufacture of clam 37, wherein at least one of the query operations comprises a WHERE clause including predicates to apply to the table derived from the set operation on the two input tables, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising:
distributing the WHERE clause to each of the subselect queries to generate intermediate subselect queries, wherein the WHERE clause is distributed to each subselect to generate the intermediate result tables to which the set operation is applied.
-
43. An article of manufacture including code for processing a query including a query operation on a table derived from a set operation on two input tables by:
-
performing the query operation on each input table separately to produce two intermediate result tables, wherein the query operation comprises a column function and the set operation comprises a UNION ALL, wherein performing the query operation comprises;
(i) performing the column function on each of the input tables separately to produce at least one value for each input table subject to the set operation, wherein the column function comprises a STANDARD DEVIATION function on one column, and wherein performing the column function on each of the input tables comprises;
(a) determining a sum of all the values in the column in each of the input tables to produce two intermediate sum values for the two input tables;
(b) determining a number of rows in each of the input tables to produce two count values;
(c) determining a sum of all the values squared in the column in each of the input tables to produce two intermediate sum of squared values for the two intermediate result tables;
(d) setting a total of the two count values to n;
(e) setting a total of the two intermediate sum values to Σ
x; and
(f) setting a total of the two intermediate sums of the squared values to Σ
x2;
calculating the standard deviation on the column of the UNION ALL of the two input tables as;
and (ii) aggregating the at least one value for each of the input tables to produce at least one aggregate value that is a same as the at least one value that would have been produced by applying the column function to the UNION ALL of the input tables; and
applying the set operator on the two intermediate result tables to produce a final result table that is a same result table that would have been produced by performing the query operation on a result table derived from the set operation performed on the two input tables.
-
-
47. An article of manufacture including code for processing a query including a query operation on a table derived from a set operation on two input tables by:
-
performing the query operation on each input table separately to produce two intermediate result tables, wherein each input table results from a subselect query on at least one base table, wherein performing the query operation on each input table to produce the two intermediate result tables comprises;
(i) merging the query operation with each subselect query;
(ii) performing each merged query operation on the at least one base table subject to the subselect to produce the two intermediate result tables, wherein indexes are available for the base tables subject to the subselect queries, and wherein the indexes are used when performing each merged query operation on the at least one base table; and
applying the set operator on the two intermediate result tables to produce a final result table that is a same result table that would have been produced by performing the query operation on a result table derived from the set operation performed on the two input tables;
wherein the query operation comprises a JOIN of a table to the table derived from the set operation on the two input tables, wherein two subselect queries are provided to generate the two intermediate result tables to which the set operation is applied, further comprising;
(i) distributing the JOIN of the table to each of the subselect queries to generate intermediate subselect queries, wherein the distributed JOIN is performed on each subselect to generate the intermediate result tables to which the set operation is applied, wherein distributing the join of the table comprises;
(a) generating an extended join table including one column having row identifiers for each row in the table to join, wherein the extended join table is applied in the JOIN operation to each of the intermediate result tables separately, wherein the determined duplicate rows have the same value for the row identifier column and the same values for the columns from the result tables included in the intermediate final result; and
wherein the set operation comprises one of a UNION, INTERSECT, and EXCEPT, wherein an intermediate final result table comprises the tables derived from the set operation on the intermediate result tables, further comprising;
(i) determining duplicate rows in the intermediate final result table; and
(ii) removing the duplicate rows from the intermediate final result table to generate the final result table.
-
-
49. A method for processing a query including query operations and a set operation on two input tables, comprising:
-
applying two or more distribution rules at one time to transform the query, wherein the distribution rules correspond to the query operations;
performing the query operations on each input table separately to produce two intermediate result tables, wherein the query operations exploit available indexes;
applying the set operator on the two intermediate result tables to produce a first result table;
when one of the query operations is an aggregation operation, applying the aggregation operation to the first result table to produce a second result table that is a same result table that would have been produced by performing the query operations on a result table derived from the set operation performed on the two input tables; and
when none of the query operations includes an ALL suffix, removing duplicate rows from the first result table to produce a third result table that is a same result table that would have been produced by performing the query operations on a result table derived from the set operation performed on the two input tables.
-
Specification