Analyzing, optimizing and rewriting queries using matching and compensation between query and automatic summary tables
First Claim
1. A method of optimizing a query in a computer system, the query being performed by the computer system to retrieve data from a database stored on the computer system, the method comprising the steps of:
- (a) analyzing the query using math and compensation between the query and one or more automatic summary tables to determine whether expressions occurring in the query, but not in the automatic summary table, can be derived using the automatic summary table, wherein the automatic sum table is generated using a full SELECT statement involving one or more nested GROUP BY operations and HAVING clauses; and
(b) rewriting the query to use the automatic summary table when the expressions occurring in the query, but not in the automatic summary, can be derived using the automatic summary table.
3 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus, and article of manufacture for optimizing database queries using a derived summary table, wherein a definition of the summary table is based on a full select statement, including, but not limited to, a derived table involving nested GROUP BY operations and complex HAVING clauses with subqueries or joins, that is materialized in the table and describes how the summary table was derived. A query is analyzed using matching/compensation tests between the query and the definition of the summary table (that is, a query by itself) to determine whether expressions occurring anywhere in the query, but not in the summary table, can be derived using either the content in the summary table alone, or after combining (through some relational operator) the content of the summary table with other base tables, and hence the query is subsumed by or overlaps with the summary table definition.
101 Citations
198 Claims
-
1. A method of optimizing a query in a computer system, the query being performed by the computer system to retrieve data from a database stored on the computer system, the method comprising the steps of:
-
(a) analyzing the query using math and compensation between the query and one or more automatic summary tables to determine whether expressions occurring in the query, but not in the automatic summary table, can be derived using the automatic summary table, wherein the automatic sum table is generated using a full SELECT statement involving one or more nested GROUP BY operations and HAVING clauses; and
(b) rewriting the query to use the automatic summary table when the expressions occurring in the query, but not in the automatic summary, can be derived using the automatic summary table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66)
-
2. The method of claim 1, wherein the query and the statement that generates the automatic summary tables include complex scalar expressions.
-
3. The method of claim 1, wherein the query and the statement that generates the automatic summary tables include subqueries.
-
4. The method of claim 1, wherein the analyzing step comprises the step of analyzing the query using compensation tests between the query and the automatic summary table to determine whether expressions occurring anywhere in the query, but not in the automatic summary table, can be derived after combining the automatic summary table with one or more other base tables through relational operators.
-
5. The method of claim 1, wherein the analyzing step comprises the step of determining whether there is a match between the query and the automatic summary table using a bottom-up traversal of boxes in query graph models (QGMs) for the query and the automatic summary table that tries to establish matches between query and automatic summary table, until it reaches a top of the QGM for the automatic summary table.
-
6. The method of claim 5, wherein the data determining step further comprises the steps of:
-
performing a navigator function to identify candidate subsumee and subsumer pairs from the QGMs for the query and the automatic summary table in an order such that the bottom-up traversal of the QGMs for the query and the automatic summary table is satisfied; and
performing a match function that takes as input the candidate subsumee and subsumer pair from the QGMs for the query and the automatic summary table, and runs information on whether the subsumee matches with the subsumer.
-
-
7. The method of claim 6, wherein the match function returns information that describes a compensation for the match, if a match exists.
-
8. The method of claim 6, wherein the match function assumes that children of the subsumee and the subsumer have been matched already.
-
9. The method of claim 6, wherein the analyzing step comprises the step of determining whether there is a match between the query and the automatic summary table using a top-down recursive traversal of the QGMs for the query and the automatic summary table that tries to establish matches between query and automatic summary table.
-
10. The method of claim 9, wherein the determining step further comprises the step of:
performing a match function that takes as input at least one candidate subsumee and subsumer pair from the QGMs for the query and the automatic summary table, and returns information on whether the subsumee matches with the subsumer.
-
11. The method of claim 10, wherein the match function returns information that describes a compensation for the match, if a match exists.
-
12. The method of claim 10, wherein the match function recursively invokes itself using as input children of the candidate subsumee and subsumer pair from the QGMs for the query and the automatic summary table to attempt to match the children of candidate subsumee and subsumer pair before matching the candidate subsumee and subsumer pair.
-
13. The method of claim 12, wherein the match function considers patterns of subsumee and subsumer in the QGMs, as well as compensation for the matches between the children of the candidate subsumee and subsumer.
-
14. The method of claim 13, wherein a rejoin child is a subsumee child that does not match with any subsumer child.
-
15. The method of claim 13, wherein an extra child is a subsumer child that does not match with any subsumee child.
-
16. The method of claim 13, wherein an extra join is a join between an extra child and the rest of the subsumer.
-
17. The method of claim 13, wherein the match function must satisfy the following conditions:
-
(1) there must be at least one subsumee child that matches with some subsumer child, and (2) the subsumee and the subsumer must be of a same operation type.
-
-
18. The method of claim 17, wherein the patterns are selected from a group of patterns comprising:
-
(1) exact child matches, including;
(1.1) SELECT boxes with one to-one child matches, (1.2) self joins, and (1.3) GROUP-BY boxes, (2) non-exact child marches, including;
(2.1) GROUP-BY boxes with SELECT-only child compensation, (2.2) GROUP BY boxes with GROUP BY child compensation, (2.3) SELECT boxes with SELECT-only child compensation, and (2.4) SELECT boxes with a single child matching a GROUP BY child compensation.
-
-
19. The method of claim 18, wherein the exact child matches comprises the step of assuming that any matches that exist among the children of the candidate subsumee and subsumer pair are exact matches.
-
20. The method of claim 18, wherein the SELECT boxes with one-to-one child matches require that the candidate subsumee and subsumer arc SELECT boxes, and that (i) each child of the subsumee matches with at most one child of the subsumer, and (ii) no two children of the subsumee match to a same child of the subsumer.
-
21. The method of claim 20, wherein a match is established when the following conditions are satisfied:
(1) each subsumer predicate that is not an extra join predicate is semantically equivalent with, some subsumee predicate;
(2) any join between an extra and a non-extra child in the subsumer is a loss-less join;
(3) any subsumee predicate that does not have a matching subsumer predicate is derivable from the subsumer'"'"'s output columns and/or the rejoin columns; and
(4) each subsumee output column is derivable from the subsumer'"'"'s output columns and/or the rejoin columns.
-
22. The method of claim 20, wherein the compensation for such a match is a SELECT box that (i) joins back all rejoin children among themselves and with the subsumer, (ii) contains all the subsumee predicates that do not appear in the subsumer, and (iii) recomputes the subsumee output columns from the subsumer output columns and/or the rejoin columns.
-
23. The method of claim 20, wherein the candidate subsumee and subsumer are SELECT boxes and the conditions (i) and/or (ii) of claim 20 are not satisfied.
-
24. The method of claim 23, wherein the match function invokes itself multiple times trying to establish a match between the candidate subsumee and subsumer.
-
25. The method of claim 24, where the match function satisfies conditions (i) and (ii) of claim 20 every time it invokes itself.
-
26. The method of claim 25, wherein the match function satisfies conditions (i) and (ii) of claim 20 by taking into account some of the alternative child matches and disregarding others.
-
27. The method of claim 18, wherein the candidate subsumee and subsumer are GROUP-BY boxes.
-
28. The method of claim 27, wherein a match is established when the following conditions are satisfied:
-
(1) every subsumee grouping column must be semantically equivalent with some subsumer grouping column;
(2) if the subsumee and subsumer grouping sets are the same, then each subsumee aggregate output column must be semantically equivalent with some subsumer aggregate output column; and
(3) if the subsumee and subsumer grouping sets are not the same, then each subsumee aggregate output column must be derivable from the subsumer'"'"'s output columns.
-
-
29. The method of claim 28, wherein compensation is necessary only if the subsumee'"'"'s grouping items are a proper subset of the subsumer'"'"'s grouping items.
-
30. The method of claim 29, wherein the compensation comprises a GROUP-BY box that has a set of same grouping items as the subsumee, and derives the subsumee'"'"'s output columns from the subsumer'"'"'s output columns.
-
31. The method of claim 27, wherein derivations for aggregate functions are selected from a group comprising:
-
(1) COUNT(*)>
SUM(cnt), where “
cnt”
is the COUNT(*) column of the subsumer,(2) COUNT(*)>
SUM(cnt), where “
cnt”
is the COUNT(z) column of the subsumer and z is any non-nullable column used in the subsumer,(3) COUNT(x)>
SUM(cnt), where “
cnt”
is the COUNT(y) column of the subsumer,(4) COUNT(x)>
SUM(cnt), where “
cnt”
is the COUNT(*) column of the subsumer if x is non-nullable,(5) SUM(x)>
SUM(sm), where “
sm”
is the SUM(y) column of the subsumer,(6) SUM(x)>
SUM(y*cnt), where x is non-nullable, y is in the subsumer'"'"'s grouping set and “
cnt”
is the COUNT(*) column of the subsumer,(7) AVG(x)>
SUM(sm)/SUM(cnt), where “
sm”
is the SUM(y) column of the subsumer, and “
cnt”
is the COUNT(y) column of the subsumer or the COUNT(*) column of the subsumer if the x column is non-nullable,(8) MAX(x)>
MAX(mx), where “
mx”
is the MAX(y) column of the subsumer or MAX(y), where y is in the subsumer'"'"'s grouping set,(9) MIN(x)>
MIN(mn), where “
mn”
is the MIN(y) column of the subsumer, or MIN(y), where y is in the subsumer'"'"'s grouping set,(10) COUNT(distinct x)>
COUNT(y), where y is in the subsumer'"'"'s grouping set,(11) SUM(distinct x)>
SUM(y), where y is in the subsumer'"'"'s grouping set,(12) VARIANCE(x), which is AVG(x2)−
AVG(x)2, and(13) STDDEV(x), which is sqrt(VARIANCE(x)).
-
-
32. The method of claim 31, wherein x is a column used in the subsumee, y is a column used in the subsumer, and that x and y are semantically equivalent.
-
33. The method of claim 18, wherein the non-exact child matches further comprise the step of matching boxes in the QGMs whose children do not match exactly, by considering the boxes in the QGMs that comprise the compensations for the non-exact could matches.
-
34. The method of claim 33, wherein child compensation boxes in the QGMs are included in a compensation that is required for the match.
-
35. The method of claim 34, further comprising the step of pulling up the child compensation boxes in the QGMs.
-
36. The method of claim 35, wherein the pulling up step requires that the following condition is met:
a general pullup condition, wherein the columns referenced by expressions inside a child compensation must be either rejoin columns or be derivable from the subsumer'"'"'s output columns.
-
37. The method of claim 18, wherein the subsumee and subsumer are GROUP-BY boxes and the child compensation comprises a single SELECT box, and the SELECT box must be pulled up above the subsumer GROUP-BY box, and the general pullup condition must be satisfied.
-
38. The method of claim 37, wherein the child compensation may perform rejoins.
-
39. The method of claim 38, wherein a match is established when the following conditions are satisfied:
-
(1) each grouping item in the subsumee must be derivable from the subsumer'"'"'s grouping items and/or the rejoin columns; and
(2) each subsumee output column that is an aggregate function must be derivable from the subsumer'"'"'s output columns.
-
-
40. The method of claim 37, wherein the compensation includes the pulled up SELECT box, which may be followed by a GROUP BY box.
-
41. The method of claim 40, wherein, if the SELECT box does not perform any rejoins, the GROUP BY compensation is necessary only if the subsumee'"'"'s grouping set is not the same as the subsumer'"'"'s grouping set.
-
42. The method of claim 40, wherein, if the SELECT box performs any rejoins, then regrouping is required in the compensation only if the subsumee'"'"'s grouping set is not the same as the subsumer'"'"'s grouping set, or the rejoin changes the duplicity of the non-rejoin rows.
-
43. The method of claim 18, wherein the subsumee and subsumer are GROUP BY boxes and the child compensation contains at least one GROUP BY box and a zero or more SELECT boxes.
-
44. The method of claim 43, wherein a match is established when the following condition is satisfied:
the lowest GROUP-BY box in the child compensation matches with the subsumer.
-
45. The method of claim 44, wherein the match between the lowest GROUP-BY box in the child compensation and the subsumer comprises the steps of claims 37 through 42.
-
46. The method of claim 43, wherein the compensation includes (i) the compensation for the match between the lowest GROUP-BY box in the child compensation and the subsumer (ii) a copy of each child-compensation box above the lowest GROUP-BY box in the child compensation, and (iii) a copy of the subsumee.
-
47. The method of claim 18, wherein the subsumee and subsumer are SELECT boxes and the child compensation comprises a single SELECT box that may perform rejoins.
-
48. The method of claim 47, wherein a match is established when the following conditions are satisfied:
-
(1) conditions 2 and 4 of claim 21 must be satisfied, (2) each subsumer predicate that is not an extra join predicate must be semantically equivalent with some predicate that appears either in the subsumee or in one of the child compensation boxes, and (3) any subsumee or child-compensation predicate that does not have a matching subsumer predicate is derivable from the subsumer'"'"'s output columns and/or the rejoin columns.
-
-
49. The method of claim 47, wherein the compensation includes any predicates that appear in the subsumee and the child compensation boxes but not in the subsumer.
-
50. The method of claim 18, wherein the subsumee and subsumer are SELECT boxes and only one of the subsumee'"'"'s children matches with one of the subumer'"'"'s children and all the rest of the subsumee children are rejoin children.
-
51. The method of claim 50, wherein the compensation for the single child match contains at least one GROUP BY box and any number of SELECT boxes.
-
52. The method of claim 51, wherein the conditions for a match are the same as SELECT boxes with SELECT-only child compensation, with the addition of the general pullup condition for the GROUP by boxes in the child compensation, such that any grouping column in the child-compensation GROUP BY boxes must be derivable from the subsumer'"'"'s output columns.
-
53. The method of claim 18, wherein the matching or compensation expressions further comprise the step of testing for expression equivalence.
-
54. The method of claim 53, wherein the testing step further comprises the step of testing for semantic equivalence between a subsumee expression and a subsumer expression.
-
55. The method of claim 54, wherein the testing step further comprises the step of translating the subsumee expression into an equivalent expression that is valid within a context of the subsumer.
-
56. The method of claim 55, wherein the translated subsumee expression uses columns that are generated by the subsumer'"'"'s children.
-
57. The method of claim 56, wherein when the subsumee and subsumer children match exactly, then for each column that is produced by a non-rejoin subsumee child, there is an equivalent column that is produced by the subsumer'"'"'s matching child, and replacing each column in the subsumee expression with its equivalent subsumer column.
-
58. The method of claim 56, wherein when the subsumee and subsumer children do not match exactly, translating a column that appears within a subsumee expression.
-
59. The method of claim 58, further comprising following a derivation of the column from the subsumee to the subsumee child that produces the column, and then jumping to an equivalent column inside a top box of the corresponding child compensation, and following a derivation of a compensation column down the compensation stack until the matching subsumer child is reached, so that a subsumee column is translated to an expression that uses the columns of some subsumer children, and the expression is valid within the context of the subsumer.
-
60. The method of claim 56, further comprising the step of using the expression translation to compensate predicates and to derive subsumee expressions out of the subsumer output columns.
-
61. The method of claim 18, wherein predicate subsumption requires that each subsumer predicate “
- subsumes”
some subsumee predicate.
- subsumes”
-
62. The method of claim 61, wherein a first predicate subsumes a second predicate if the set of rows that satisfy the second predicate is a subset of the set of rows that satisfy the first predicate.
-
63. The method of claim 62, wherein a subsumee predicate that is properly subsumed by a subsumer predicate must be included in the compensation.
-
64. The method of claim 62, wherein if the two predicates are equivalent, then the subsumee predicate does not need to be compensated for.
-
65. The method of claim 18, wherein the date predicates further comprise the step of recognizing date predicates that are on a year or month boundary and, if necessary, compensating such predicates using the year and month columns produced by the subsumer.
-
66. The method of claim 18, further comprising the step of handling structural inequivalence between the automatic summary table and the query in the navigator.
-
2. The method of claim 1, wherein the query and the statement that generates the automatic summary tables include complex scalar expressions.
-
-
67. A computer-implemented apparatus for optimizing a query, comprising:
-
(a) a computer system;
(b) logic, performed by the computer system for (1) analyzing the query using matching and compensation between the query and one or more automatic summary tables to determine whether expressions occurring in the query, but not in the automatic summary table, can be derived using the automatic summary table, wherein the automatic summary table is generated using a fill SELECT statement involving one or more nested GROUP BY operations and HAVING clauses; and
(2) rewriting the query to use the automatic summary table when the expressions occurring in the query, but not in the automatic summary table, can be derived using the automatic summary table. - View Dependent Claims (68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 177, 197, 198)
-
68. The apparatus of claim 67, wherein the query and the statement that generates the automatic summary tables include complex scalar expressions.
-
69. The apparatus of claim 67, wherein the query and the statement that generates the automatic summary tables include subqueries.
-
70. The apparatus of claim 67, wherein the analyzing logic comprises logic for analyzing the query using compensation tests between the query and the automatic summary cable to determine whether expressions occurring anywhere in the query, but not in the automatic summary table, can be derived after combining the automatic summary table with one or more other base tables through relational operators.
-
71. The apparatus of claim 67, wherein the analyzing logic comprises logic for determining whether there is a match between the query and the automatic summary table using a bottom-up traversal of boxes in query graph models (QGMs) for the query and the automatic summary table that tries to establish matches between query and automatic summary table, until it reaches a top of the QGM for the automatic summary table.
-
72. The apparatus of claim 71, wherein the determining logic further comprises logic for:
-
performing a navigator function to identify candidate subsumee and subsumer pairs from the QGMs for the query and the automatic summary table in an order such that the bottom-up traversal of the QGMs for the query and the automatic summary table is satisfied; and
performing a match function that takes as input the candidate subsumee and subsumer pair from the QGMs for the query and the automatic summary table, and returns information on whether the subsumee matches with the subsumer.
-
-
73. The apparatus of claim 72, wherein the match function returns information that describes a compensation for the match, if a match exists.
-
74. The apparatus of claim 72, wherein the match function assumes that children of the subsumee and the subsumer have been matched already.
-
75. The apparatus of claim 72, wherein the analyzing logic comprises logic for determining whether there is a match between the query and the automatic summary table using a top-down recursive traversal of the QGMs for the query and the automatic summary table that tries to establish matches between query and automatic summary table.
-
76. The apparatus of claim 75, wherein the determining logic further comprises logic for:
performing a match function that takes as input at least one candidate subsumee and subsumer pair from the QGMs for the query and the automatic summary table, and returns information on whether the subsumee matches with the subsumer.
-
77. The apparatus of claim 76, wherein the match function returns information that describes a compensation for the match, if a match exists.
-
78. The apparatus of claim 76, wherein the match function recursively invokes itself using as input children of the candidate subsumee and subsumer pair from the QGMs for the query and the automatic summary table to attempt to match the children of candidate subsumee and subsumer pair before matching the candidate subsumee and subsumer pair.
-
79. The apparatus of claim 78, wherein the match function considers patterns of subsumee and subsumer in the QGMs, as well as compensation for the matches between the children of the candidate subsumee and subsumer.
-
80. The apparatus of claim 79, wherein a rejoin child is a subsumee child that does not match with any subsumer child.
-
81. The apparatus of claim 79, wherein an extra child is a subsumer child that does not match with any subsumee child.
-
82. The apparatus of claim 79, wherein an extra join is a join between an extra child and the rest of the subsumer.
-
83. The apparatus of claim 79, wherein the match function must satisfy the following conditions:
-
(1) there must be at least one subsumee child that matches with some subsumer child, and (2) the subsumee and the subsumer must be of a same operation type.
-
-
84. The apparatus of claim 83, wherein the patterns are selected from a group of patterns comprising:
-
(1) exact child matches, including;
(1.1) SELECT boxes with one-to-one child matches, (1.2) self joins, and (1.3) GROUP-BY boxes, (2) non-exact child matches, including;
(2.1) GROUP-BY boxes with SELECT-only child compensation, (2.2) GROUP BY boxes with GROUP BY child compensation, (2.3) SELECT boxes with SELECT-only child compensation, and (2.4) SELECT boxes with a single child matching a GROUP BY child compensation.
-
-
85. The apparatus of claim 84, wherein the exact child matches comprises logic for assuming that any matches that exist among the children of the candidate subsumee and subsumer pair are exact matches.
-
93. The apparatus of claim 84, wherein the candidate subsumee and subsumer are GROUP-BY boxes.
-
94. The apparatus of claim 93, wherein a match is established when the following conditions are satisfied:
-
(1) every subsumee grouping column must be semantically equivalent with some subsumer grouping column;
(2) if the subsumee and subsumer grouping sets are the same, then each subsumee aggregate output column must be semantically equivalent with some subsumer aggregate output column; and
(3) if the subsumee and subsumer grouping sets are not the same, then each subsumee aggregate output column must be derivable from the subsumer'"'"'s output columns.
-
-
95. The apparatus of claim 94, wherein compensation is necessary only if the subsumee'"'"'s grouping items are a proper subset of the subsumer'"'"'s grouping items.
-
96. The apparatus of claim 95, wherein the compensation comprises a GROUP-BY box that has a set of same grouping items as the subsumee, and derives the subsumee'"'"'s output columns from the subsumer'"'"'s output columns.
-
97. The apparatus of claim 93, wherein derivations for aggregate functions are selected from a group comprising:
-
(1) COUNT(*)>
SUM(cnt), where “
cnt”
is the COUNT(*) column of the subsumer,(2) COUNT(*)>
SUM(cnt), where “
cnt”
is the COUNT(z) column of the subsumer and z is any non-nullable column used in the subsumer,(3) COUNT(x)>
SUM(cnt), where “
cnt”
is the COUNT(y) column of the subsumer,(4) COUNT(x)>
SUM(cnt), where “
cnt”
is the COUNT(*) column of the subsumer if x is non-nullable,(5) SUM(x)>
SUM(sm), where “
sm”
is the SUM(y) column of the subsumer,(6) SUM(x)>
SUM(y*cnt), where x is non-nullable, y is in the subsumer'"'"'s grouping set and “
cnt”
is the COUNT(*) column of the subsumer,(7) AVG(x)>
SUM(sm)/SUM(cnt), where “
sm”
is the SUM(y) column of the subsumer, and “
cnt”
is the COUNT(y) column of the subsumer or the COUNT(*) column of the subsumer if the x column is non-nullable,(8) MAX(x)>
MAX(mx), where “
mx”
is the MAX(y) column of the subsumer or MAX(y), where y is in the subsumer'"'"'s grouping set,(9) MIN(x)>
MIN(mn), where “
mn”
is the MIN(y) column of the subsumer, or MIN(y), where y is in the subsumer'"'"'s grouping set,(10) COUNT(distinct x)>
COUNT(y), where y is in the subsumer'"'"'s grouping set,(11) SUM(distinct x)>
SUM(y), where y is in the subsumer'"'"'s grouping set,(12) VARIANCE(x), which is AVG(x2)−
AVG(x)2, and(13) STDDEV(x), which is sqrt(VARIANCE(x)).
-
-
98. The apparatus of claim 97, wherein x is a column used in the subsumee, y is a column used in the subsumer, and that x and y are semantically equivalent.
-
99. The apparatus of claim 84, wherein the non-exact child matches further comprise the logic of matching boxes in the QGMs whose children do not match exactly, by considering the boxes in the QGMs that comprise the compensations for the non-exact child matches.
-
100. The apparatus of claim 99, wherein child compensation boxes in the QGMs are included in a compensation that is required for the match.
-
101. The apparatus of claim 100, further comprising logic for pulling up the child compensation boxes in the QGMs.
-
102. The apparatus of claim 101, wherein the pulling up logic requires that the following condition is met:
a general pullup condition, wherein the columns referenced by expressions inside a child compensation must be either rejoin columns or be derivable from the subsumer'"'"'s output columns.
-
103. The apparatus of claim 84, wherein the subsumee and subsumer are GROUP-BY boxes and the child compensation comprises a single SELECT box, and the SELECT box must be pulled up above the subsumer GROUP-BY box, and the general pullup condition must be satisfied.
-
104. The apparatus of claim 103, wherein the child compensation may perform rejoins.
-
105. The apparatus of claim 104, wherein a match is established when the following conditions are satisfied:
-
(1) each grouping item in the subsumee must be derivable from the subsumer'"'"'s grouping items and/or the rejoin columns; and
(2) each subsumee output column that is an aggregate function must be derivable from the subsumer'"'"'s output columns.
-
-
106. The apparatus of claim 103, wherein the compensation includes the pulled up SELECT box, which may be followed by a GROUP BY box.
-
107. The apparatus of claim 106, wherein, if the SELECT box does not perform any rejoins, the GROUP BY compensation is necessary only if the subsumee'"'"'s grouping set is not the same as the subsumer'"'"'s grouping set.
-
108. The apparatus of claim 106, wherein, if the SELECT box perform any rejoins, then regrouping is required in the compensation only if the subsumee'"'"'s grouping set is not the same as the subsumer'"'"'s grouping set, or the rejoin changes the duplicity of the non-rejoin rows.
-
109. The apparatus of claim 84, wherein the subsumee and subsumer are GROUP BY boxes and t child compensation contains at least one GROUP BY box and a zero or more SELECT boxes.
-
110. The apparatus of claim 109, wherein a match is established when the following condition is satisfied:
the lowest GROUP-BY box in the child compensation matches with the subsumer.
-
111. The apparatus of claim 110, wherein the match between the lowest GROUP-BY box in the child compensation and the subsumer comprises logic for claims 103 through 108.
-
112. The apparatus of claim 109, wherein the compensation includes (i) the compensation for the match between the lowest GROUP-BY box in the child compensation and the subsumer, (ii) a copy of each child-compensation box above the lowest GROUP-BY box in the child compensation, and (iii) a copy of the subsumee.
-
113. The apparatus of claim 84, wherein the subsumee and subsumer are SELECT boxes and the child compensation comprises a single SELECT box that may perform rejoins.
-
114. The apparatus of claim 113, wherein a match is established when the following conditions are satisfied:
-
(1) conditions 2 and 4 of claim 87 must be satisfied, (2) each subsumer predicate that is not an extra join predicate must be semantically equivalent with some predicate that appears either in the subsumee or in one of the child compensation boxes, and (3) any subsumee or child-compensation predicate that does not have a matching subsumer predicate is derivable from the subsumer'"'"'s output columns and/or the rejoin columns.
-
-
115. The apparatus of claim 113, wherein the compensation includes any predicates that appear in the subsumee and the child compensation boxes but not in the subsumer.
-
116. The apparatus of claim 84, wherein the subsumee and subsumer are SELECT boxes and only one of the subsumee'"'"'s children matches with one of the subsumer'"'"'s children and all the rest of the subsumee children are rejoin children.
-
117. The apparatus of claim 116, wherein the compensation for the single child match contains at least one GROUP BY box and any number of SELECT boxes.
-
118. The apparatus of claim 117, wherein the conditions for a match are the same as SELECT boxes with SELECT-only child compensation, with the addition of the general pullup condition for the GROUP by boxes in the child compensation, such that any grouping column in the child-compensation GROUP BY boxes must be derivable from the subsumer'"'"'s output columns.
-
119. The apparatus of claim 84, wherein the matching or compensation expressions further comprise logic for testing for expression equivalence.
-
120. The apparatus of claim 119, where the resting logic further comprises logic for testing for semantic equivalence between a subsumee expression and a subsumer expression.
-
121. The apparatus of claim 120, wherein the testing logic further comprises logic for translating the subsumee expression into an equivalent expression that is valid within a context of the subsumer.
-
122. The apparatus of claim 121, wherein the translated subsumee expression uses columns that are generated by the subsumer'"'"'s children.
-
123. The apparatus of claim 122, wherein when the subsumee and subsumer children match exactly, then for each column that is produced by a non-rejoin subsumee child, there is an equivalent column that is produced by the subsumer'"'"'s matching child, and replacing each column in the subsumee expression with its equivalent subsumer column.
-
124. The apparatus of claim 122, wherein when the subsumee and subsumer children do not match exactly, translating a column that appears within a subsumee expression.
-
125. The apparatus of claim 124, farther comprising following a derivation of the column from the subsumee to the subsumee child that produces the column, and then jumping to an equivalent column inside a top box of the corresponding child compensation, and following a derivation of a compensation column down the compensation stack until the matching subsumer child is reached, so that a subsumee column is translated to an expression that uses the columns of some subsumer children, and the expression is valid within the context of the subsumer.
-
126. The apparatus of claim 122, further comprising logic for using the expression translation to compensate predicates and to derive subsumee expressions out of the subsumer output columns.
-
127. The apparatus of claim 84, wherein predicate subsumption requires that each subsumer predicate “
- subsumes”
some subsumee predicate.
- subsumes”
-
128. The apparatus of claim 127, wherein a first predicated subsumes a second predicate if the set of rows that satisfy the second predicate is a subset of the set of rows that satisfy the first predicate.
-
129. The apparatus of claim 128, wherein a subsumee predicate that is properly subsumed by a subsumer predicate must be included in the compensation.
-
130. The apparatus of claim 128, wherein if the two predicates are equivalent, then the subsumee predicate does not need to be compensated for.
-
131. The apparatus of claim 84, wherein the date predicates further comprise logic for recognizing date predicates that are on a year or month boundary and, if necessary, compensating such predicates using the year and month columns produced by the subsumer.
-
132. The apparatus of claim 84, further comprising logic for handling structural inequivalence between the automatic summary table and the query in the navigator.
-
177. The method of claim 176, wherein the match between the lowest GROUP-BY box in the child compensation and the subsumer comprises the steps of claim 103 through 108.
-
197. The method of claim 131, wherein the date predicates further comprise the step of recognizing date predicates that are on a year or month boundary and, if necessary, compensating such predicates using the year and month columns produced by the subsumer.
-
198. The method of claim 132, further comprising the step of handling structural inequivalence between the automatic summary table and the query in the navigator.
-
68. The apparatus of claim 67, wherein the query and the statement that generates the automatic summary tables include complex scalar expressions.
-
-
86. The apparatus of clam 84, wherein the SELECT boxes with one-to-one child matches require that the candidate subsumee and subsumer are SELECT boxes, and that (i) each child of the subsumee matches with at most one child of the subsumer, and (ii) no two children of the subsumee match to a same child of the subsumer.
- View Dependent Claims (87, 88, 89, 90, 91, 92)
-
87. The apparatus of claim 86, wherein a match is established when the following conditions are satisfied:
(1) each subsumer predicate that is not an extra join predicate is semantically equivalent with some subsumee predicate;
(2) any join between an extra and a non-extra child in the subsumer is a loss-less join;
(3) any subsumee predicate that does not have a matching subsumer predicate is derivable from the subsumer'"'"'s output columns and/or the rejoin columns; and
(4) each subsumee output column is derivable from the subsumer'"'"'s output columns and/or the rejoin columns.
-
88. The apparatus of claim 86, wherein the compensation for such a match is a SELECT box that (i) joins back all rejoin children among themselves and with the subsumer (ii) contains all the subsumee predicates that do not appear in the subsumer, and (iii) recomputes the subsumee output column from the subsumer output columns and/or the rejoin columns.
-
89. The apparatus of claim 86, wherein the candidate subsumee and subsumer are SELECT boxes and the conditions (i) and/or (ii) of claim 86 are not satisfied.
-
90. The apparatus of claim 89, wherein the match function invokes itself multiple times trying to establish a match between the candidate subsumee and subsumer.
-
91. The apparatus of claim 90, wherein the match function satisfies conditions (i) and (ii) of claim 86 every time it invokes itself.
-
92. The apparatus of claim 91, wherein the match function satisfies conditions (i) and (ii) of claim 86 by taking into account some of die alternative child matches and disregarding others.
-
87. The apparatus of claim 86, wherein a match is established when the following conditions are satisfied:
-
133. An article of manufacture embodying logic for performing method steps for optimizing a query, the query being performed by a computer system to retrieve data from a database stored in a data storage device coupled to the computer system, the method comprising the steps of:
-
(a) analyzing the query using matching and compensation between the query and one or more automatic summary tables to determine whether expressions occurring in the query, but not in the automatic summary table, can be derived using the automatic summary table, wherein the automatic summary table is generated using a full SELECT statement involving one or more nested GROUP BY operations and HAVING clauses; and
(b) rewriting the query to use the automatic summary table when the expressions occurring in the query, but not in the automatic summary table, can be derived using the automatic summary table. - View Dependent Claims (134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196)
-
134. The method of claim 133, wherein the query and the statement that generates the automatic summary tables include complex scalar expressions.
-
135. The method of claim 133, wherein the query and the statement that generates the automatic summary tables include subqueries.
-
136. The method of claim 133, wherein the analyzing step comprises the step of analyzing the query using compensation tests between the query and the automatic summary table to determine whether expressions occurring anywhere in the query, but not in the automatic summary table, can be derived after combining the automatic summary table with one or more other base tables through relational operators.
-
137. The method of claim 133, wherein the analyzing step comprises the step of determining whether there is a match between the query and the automatic summary table using a bottom-up traversal of boxes in query graph models (QGMs) for the query and the automatic summary table that tries to establish matches between query and automatic summary table, until it reaches a top of the QGM for the automatic summary table.
-
138. The method of claim 137, wherein the determining step further comprises the steps of:
-
performing a navigator function to identify candidate subsumee and subsumer pairs from the QGMS for the query and the automatic summary table in an order such that the bottom-up traversal of the QGMs for the query and the automatic summary table is satisfied; and
performing a match function that takes as input the candidate subsumee and subsumer pair from the QGMs for the query and the automatic summary table, and returns information on whether the subsumee matches with the subsumer.
-
-
139. The method of claim 138, wherein the match function returns information that describes a compensation for the match, if a match exists.
-
140. The method of claim 138, wherein the match function assumes that children of the subsumee and the subsumer have been matched already.
-
141. The method of claim 138, wherein the analyzing step comprises the step of determining whether there is a match between the query and the automatic summary table using a top-down recursive traversal of the QGMs for the query and the automatic summary table that tries to establish matches between query and automatic summary table.
-
142. The method of claim 141, wherein the determining step further comprises the step of:
performing a match function that takes as input at least one candidate subsumee and subsumer pair from the QGMs for the query and the automatic summary table, and returns information on whether the subsumee matches with the subsumer.
-
143. The method of claim 142, wherein the match function returns information that describes a compensation for the match, if a match exists.
-
144. The method of claim 142, wherein the match function recursively invokes itself using as input children of the candidate subsumee and subsumer pair from the QGMs for the query and the automatic summary table to attempt to match the children of candidate subsumee and subsumer pair before matching the candidate subsumee and subsumer pair.
-
145. The method of claims 144, wherein the match function considers patterns of subsumee and subsumer in the QGMs, as well as compensation for the matches between the children of the candidate subsumee and subsumer.
-
146. The method of claim 145, wherein a rejoin child is a subsumee child that does not match with any subsumer child.
-
147. The method of claim 145, wherein an extra child is a subsumer child that does not match with any subsumee child.
-
148. The method of claim 145, wherein an extra join is a join between an extra child and the rest of the subsumer.
-
149. The method of claim 145, wherein the match function must satisfy the following conditions:
-
(1) there must be at least one subsumee child that matches with some subsumer child, and (2) the subsumee and the subsumer must be of a same operation type.
-
-
150. The method of claim 149, wherein the patterns are selected from a group of patterns comprising:
-
(1) exact child matches, including;
(1.1) SELECT boxes with one-to-one child matches, (1.2) self joins, and (1.3) GROUP-BY boxes, (2) non-exact child matches, including;
(2.1) GROUP-BY boxes with SELECT-only child compensation, (2.2) GROUP BY boxes with GROUP BY child compensation, (2.3) SELECT boxes with SELECT-only child compensation, and (2.4) SELECT boxes with a single child matching a GROUP BY child compensation.
-
-
151. The method of claim 150, wherein the exact child matches comprises the step of assuming that any matches that exist among the children of the candidate subsumee and subsumer pair are exact matches.
-
152. The method of claim 150, wherein the SELECT boxes with one-to-one child matches require that the candidate subsumee and subsumer are SELECT boxes, and that (i) each child of the subsumee matches with at most one child of the subsumer, and (ii) no two children of the subsumee match to a same child of the subsumer.
-
153. The method of claim 152, where a match is established when the following conditions are satisfied:
-
(1) each subsumer predicate that is not an extra join predicate is semantically equivalent with some subsumee predicate;
(2) any join between an extra and a non-extra child in the subsumer is a loss-less join;
(3) any subsumee predicate that does not have a matching subsumer predicate is derivable from the subsumer'"'"'s output columns and/or the rejoin columns; and
(4) each subsumee output column is derivable from the subsumer'"'"'s output columns and/or the rejoin columns.
-
-
154. The method of claim 152, wherein the compensation for such a match is a SELECT box that (i) joins back all rejoin children among themselves and with the subsumer, (ii) contains all the subsumee predicates that do not appear in the subsumer, and (iii) recomputes the subsumee output columns from the subsumer output columns and/or the rejoin columns.
-
155. The method of claim 152, wherein the candidate subsumee and subsumer are SELECT boxes and the conditions (i) and/or (ii) of claim 152 are not satisfied.
-
156. The method of claim 155, wherein the match function invokes itself multiple times trying to establish a match between the candidate subsumee and subsumer.
-
157. The method of claim 156, wherein the match function satisfies conditions (i) and (ii) of claim 152 every time it invokes itself.
-
158. The method of claim 157, wherein the match function satisfies conditions (i) and (ii) of claim 152 by taking into account some of the alternative child matches and disregarding others.
-
159. The method of claim 150, wherein the candidate subsumee and subsumer are GROUP-BY boxes.
-
160. The method of claim 159, wherein a match is established when the following conditions are satisfied:
-
(1) every subsumee grouping column must be semantically equivalent with some subsumer grouping column;
(2) if the subsumee and subsumer grouping sets are the same, then each subsumee aggregate output column must be semantically equivalent with some subsumer aggregate output column; and
(3) if the subsumee and subsumer grouping sets are not the same, then each subsumee aggregate output column must be derivable from the subsumer'"'"'s output columns.
-
-
161. The method of claim 160, when compensation is necessary only if the subsumee'"'"'s grouping items are a proper subset of the subsumer'"'"'s grouping items.
-
162. The method of claim 161, wherein the compensation comprises a GROUP-BY box that has a set of same grouping items as the subsumee, and derives the subsumee'"'"'s output columns from the subsumer'"'"'s output columns.
-
163. The method of claim 159, wherein derivations for aggregate functions are selected from a group comprising:
-
(1) COUNT(*)>
SUM(cnt), where “
cnt”
is the COUNT(*) column of the subsumer,(2) COUNT(*)>
SUM(cnt), where “
cnt”
is the COUNT(z) column of the subsumer and z is any non-nullable column used in the subsumer,(3) COUNT(x)>
SUM(cnt), where “
cnt”
is the COUNT(y) column of the subsumer,(4) COUNT(x)>
SUM(cnt), where “
cnt”
is the COUNT(*) column of the subsumer if x is non-nullable,(5) SUM(x)>
SUM(sm), where “
sm”
is the SUM(y) column of the subsumer,(6) SUM(x)>
SUM(y*cnt), where x is non-nullable, y is in the subsumer'"'"'s grouping set and “
cnt”
is the COUNT(*) column of the subsumer,(7) AVG(x)>
SUM(sm)/SUM(cnt), where “
sm”
is the SUM(y) column of the subsumer, and “
cnt”
is the COUNT(y) column of the subsumer or the COUNT(*) column of the subsumer if the x column is non-nullable,(8) MAX(x)>
MAX(mx), where “
mx”
is the MAX(y) column of the subsumer or MAX(y), where y is in the subsumer'"'"'s grouping set,(9) MIN(x)>
MIN(mn), where “
mn”
is the MIN(Y) column of the subsumer, or MIN(y), where y is in the subsumer'"'"'s grouping set,(10) COUNT(distinct x)>
COUNT(y), where y is in the subsumer'"'"'s grouping set,(11) SUM(distinct x)>
SUM(y), where y is in the subsumer'"'"'s grouping set,(12) VARIANCE(x), which is AVG(x2)−
AVG(x)2, and(13) STDDEV(x), which is sqrt(VARIANCE(x)).
-
-
164. The method of claim 163, wherein x is a column used in the subsumee, y is a column used in the subsumer, and that x and y are semantically equivalent.
-
165. The method of claim 150, wherein the non-exact child matches further comprise the step of matching boxes in the QGMs whose children do not match exactly, by considering the boxes in the QGMs that comprise the compensations for the non-exact child matches.
-
166. The method of claim 165, wherein child compensation boxes in the QGMs are included in a compensation that is required for the match.
-
167. The method of claim 166, further comprising the step of pulling up the child compensation boxes in the QGMs.
-
168. The method of claim 167, wherein the pulling up step requires that the following condition is met:
a general pullup condition, wherein the columns referenced by expressions inside a child compensation must be either rejoin columns or be derivable from the subsumer'"'"'s output columns.
-
169. The method of claim 150, wherein the subsumee and subsumer are GROUP-BY boxes and the child compensation comprises a single SELECT box, and the SELECT box must be pulled up above the subsumer GROUP-BY box, and the general pullup condition must be satisfied.
-
170. The method of claim 169, wherein the child compensation may perform rejoins.
-
171. The method of claim 170, wherein a match is established when the following conditions are satisfied:
-
(1) each grouping item in the subsumee must be derivable from the subsumer'"'"'s grouping items and/or the rejoin columns; and
(2) each subsumee output column that is an aggregate function must be derivable from the subsumer'"'"'s output columns.
-
-
172. The method of claim 169, wherein the compensation includes the pulled up SELECT box, which may be followed by a GROUP BY box.
-
173. The method of claim 172, wherein if the SELECT box does not perform any rejoins, the GROUP BY compensation is necessary only if the subsumee'"'"'s grouping set is not the same as the subsumer'"'"'s grouping set.
-
174. The method of claim 172, wherein, if the SELECT box performs any rejoins, then regrouping is required in the compensation only if the subsumee'"'"'s grouping set is not the same as the subsumer'"'"'s grouping set, or the rejoin changes the duplicity of the non-rejoin rows.
-
175. The method of claim 150, wherein the subsumee and subsumer are GROUP BY boxes and the child compensation contains at least one GROUP BY box and a zero or more SELECT boxes.
-
176. The method of claim 175, wherein a match is established when the following condition is satisfied:
the lowest GROUP-BY box in the child compensation matches with the subsumer.
-
178. The method of claim 175, wherein the compensation includes (i) the compensation for the match between the lowest GROUP-BY box in the child compensation and the subsumer, (ii) a copy of each child-compensation box above the lowest GROUP-BY box in the child compensation, and (ii) a copy of the subsumee.
-
179. The method of claim 150, wherein the subsumee and subsumer are SELECT boxes and the child compensation comprises a single SELECT box that may perform rejoins.
-
180. The method of claim 179, wherein a match is established when the following conditions are satisfied:
-
(1) conditions 2 and 4 of claim 153 must be satisfied, (2) each subsumer predicate that is not an extra join predicate must be semantically equivalent with some predicate that appears either in the subsumee or in one of the child compensation boxes, and (3) any subsumee or child-compensation predicate that does not have a matching subsumer predicate is derivable from the subsumer'"'"'s output columns and/or the rejoin columns.
-
-
181. The method of claim 179, wherein the compensation includes any predicates that appear in the subsumee and the child compensation boxes but not in the subsumer.
-
182. The method of claim 150, wherein the subsumee and subsumer are SELECT boxes and only one of the subsumee'"'"'s children matches with one of the subsumer'"'"'s children and all the rest of the subsumee child are rejoin children.
-
183. The method of claim 182, wherein the compensation for the single child match contains at least one GROUP BY box and any number of SELECT boxes.
-
184. The method of claim 183, wherein the conditions for a match are the same as SELECT boxes with SELECT-only child compensation, with the addition of the general pullup condition for the GROUP by boxes in the child compensation, such that any grouping column in the child-compensation GROUP BY boxes must be derivable from the subsumer'"'"'s output columns.
-
185. The method of claim 150, wherein the matching or compensation expressions further comprise the step of testing for expression equivalence.
-
186. The method of claim 185, wherein the testing step further comprises the step of testing for semantic equivalence between a subsumee expression and a subsumer expression.
-
187. The method of claim 186, wherein the testing step further comprises the step of translating the subsumee expression into an equivalent expression that is valid within a context of the subsumer.
-
188. The method of claim 187, wherein the translated subsumee expression uses columns that are generated by the subsumer'"'"'s children.
-
189. The method of claim 188, wherein when the subsumee and subsumer children match exactly, then for each column that is produced by a non-rejoin subsumee child, there is an equivalent column that is produced by the subsumer'"'"'s matching child, and replacing each column in the subsumee expression with its equivalent subsumer column.
-
190. The method of claim 188, wherein when the subsumee and subsumer children do not match exactly, translating a column that appears within a subsumee expression.
-
191. The method of claim 190, further comprising following a derivation of the column from the subsumee to the subsumee child that produces the column, and then jumping to an equivalent column inside a top box of the corresponding child compensation, and following a derivation of a compensation column down the compensation stack until the matching subsumer child is reached, so that a subsumee column is translated to an expression that uses the columns of some subsumer children, and the expression is valid within the context of the subsumer.
-
192. The method of claim 188, further comprising the step of using the expression translation to compensate predicates ad to derive subsumee expressions out of the subsumer output columns.
-
193. The method of claim 150, wherein predicate subsumption requires that each subsumer predicate “
- subsumes”
some subsumee predicate.
- subsumes”
-
194. The method of claim 193, wherein a first predicate subsumes a second predicate if the set of rows that satisfy the second predicate is a subset of the set of rows that satisfy the first predicate.
-
195. The method of claim 194, wherein a subsumee predicate that is properly subsumed by a subsumer predicate must be included in the compensation.
-
196. The method of claim 194, wherein if the two predicates are equivalent, then the subsumee predicate does not need to be compensated for.
-
134. The method of claim 133, wherein the query and the statement that generates the automatic summary tables include complex scalar expressions.
-
Specification
- Resources
-
Current AssigneeGoogle LLC (Alphabet Inc.)
-
Original AssigneeInternational Business Machines Corporation
-
InventorsZaharioudakis, Markos, Urata, Monica Sachiye, Paskin, Mark A., Sun, Yang, Pirahesh, Mir Hamid, Leung, Ting Yu, Lapis, George, Cochrane, Roberta Jo
-
Primary Examiner(s)CHANNAVAJJALA, SRIRAMA T
-
Application NumberUS09/502,821Time in Patent Office1,810 DaysField of Search707 1- 10, 707100-1041, 707200-206US Class Current1/1CPC Class CodesG06F 16/24539 using cached or materialise...Y10S 707/99932 Access augmentation or opti...Y10S 707/99933 Query processing, i.e. sear...Y10S 707/99934 Query formulation, input pr...Y10S 707/99935 Query augmenting and refini...