Aggregations performance estimation in database systems
First Claim
1. A method for estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database to execute a set of queries, the method comprising:
- determining a minimum cost TCm and a maximum cost TCf associated with executing the set of queries;
determining a cost TCa associated with executing the set of queries using the set of proposed aggregations; and
calculating the potential performance gain as a function of the minimum cost TCm, the maximum cost TCf, and the cost TCa.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus, and software are disclosed for efficient estimation of the performance gain associated with using a set of proposed aggregations, or summaries of data, in a database. This estimate is used in selecting which aggregations are materialized in an OLAP system or relational database. The estimate is based on the minimum and maximum costs of executing a given set of queries, as well as the cost of executing the given set of queries using the set of proposed aggregations. By expressing the estimate as a unitless constant with known upper and lower limits, the system conveys information as to the performance gain in a form that is readily understood by the user.
59 Citations
48 Claims
-
1. A method for estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database to execute a set of queries, the method comprising:
-
determining a minimum cost TCm and a maximum cost TCf associated with executing the set of queries;
determining a cost TCa associated with executing the set of queries using the set of proposed aggregations; and
calculating the potential performance gain as a function of the minimum cost TCm, the maximum cost TCf, and the cost TCa. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
for each query of the set of queries, determining a best aggregation of the proposed set of aggregations for answering the query; and
summing sizes of the determined best aggregations and, for each query for which no aggregation of the proposed set of aggregations is sufficiently detailed to answer, a size of the detailed data.
-
-
7. The method, according to claim 1, wherein the minimum cost TCm is determined by:
-
for each query of the set of queries, determining a best theoretical aggregation for answering the query; and
summing sizes of the determined best theoretical aggregations.
-
-
8. The method, according to claim 7, wherein determining the best theoretical aggregation for answering a query comprises:
-
identifying a granularity of the query; and
generating, as the best theoretical aggregation, an aggregation having the identified granularity.
-
-
9. The method, according to claim 1, wherein the minimum cost TCm is determined by:
-
generating a random set As of sample aggregations from a known aggregation space;
determining a total estimated size TSS of the sample aggregations;
sorting the sample aggregations in order of size;
determining a largest fanned aggregation al;
interpolating a total size TSL of aggregations in the random set As that are no larger than the largest fanned aggregation al; and
calculating the minimum cost TCm as a product of the total estimated size TSS of the sample aggregations and a ratio TKS/TSL of a total size TKS of the fanned aggregations to the total size TSL of the aggregations in the random set As that are no larger than the largest fanned aggregation al.
-
-
10. A method for estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database to execute a set of queries, the method comprising:
-
determining a minimum cost TCm and a maximum cost TCf associated with executing the set of queries;
determining a benefit of using each aggregation of the set of proposed aggregations to execute the set of queries;
summing the determined benefits over all of the aggregations of the set of proposed aggregations; and
calculating the potential performance gain as a ratio of the sum of the determined benefits to a difference between the maximum cost TCf and the minimum cost TCm. - View Dependent Claims (11, 12, 13, 14, 15)
for each query of the set of queries, determining a best aggregation of the proposed set of aggregations for answering the query; and
summing sizes of the determined best aggregations and, for each query for which no aggregation of the proposed set of aggregations is sufficiently detailed to answer, a size of the detailed data.
-
-
13. The method, according to claim 10, wherein the minimum cost TCm is determined by:
-
for each query of the set of queries, determining a best theoretical aggregation for answering the query; and
summing sizes of the determined best theoretical aggregations.
-
-
14. The method, according to claim 13, wherein determining the best theoretical aggregation for answering a query comprises:
-
identifying a granularity of the query; and
generating, as the best theoretical aggregation, an aggregation having the identified granularity.
-
-
15. The method, according to claim 10, wherein the minimum cost TCm is determined by:
-
generating a random set As of sample aggregations from a known aggregation space;
determining a total estimated size TSS of the sample aggregations;
sorting the sample aggregations in order of size;
determining a largest fanned aggregation al;
interpolating a total size TSL of aggregations in the random set As that are no larger than the largest fanned aggregation al; and
calculating the minimum cost TCm as a product of the total estimated size TSS of the sample aggregations and a ratio TKS/TSL of a total size TKS of the fanned aggregations to the total size TSL of the aggregations in the random set As that are no larger than the largest fanned aggregation al.
-
-
16. A method for estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database to execute a set of queries, the method comprising:
-
determining a maximum cost TCf associated with executing the set of queries as a product of a size of the detailed data and a number of queries in the set of queries;
determining a minimum cost TCm associated with executing the set of queries;
determining a cost TCa associated with executing the set of queries using the set of proposed aggregations by, for each query of the set of queries, determining a best aggregation of the proposed set of aggregations for answering the query, and summing sizes of the determined best aggregations and, for each query for which no aggregation of the proposed set of aggregations is sufficiently detailed to answer, a size of the detailed data; and
calculating the potential performance gain as a ratio of (TCf−
TCa) to (TCf−
TCm).
-
-
17. A computer-readable medium for use in estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database, the computer-readable medium having computer-executable instructions for:
-
determining a minimum cost TCm and a maximum cost TCf associated with executing the set of queries;
determining a cost TCa associated with executing the set of queries using the set of proposed aggregations; and
calculating the potential performance gain as a function of the minimum cost TCm, the maximum cost TCf, and the cost TCa. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
for each query of the set of queries, determining a best aggregation of the proposed set of aggregations for answering the query; and
summing sizes of the determined best aggregations and, for each query for which no aggregation of the proposed set of aggregations is sufficiently detailed to answer, a size of the detailed data.
-
-
23. The computer-readable medium, according to claim 17, having further computer-executable instructions for determining the minimum cost TCm by:
-
for each query of the set of queries, determining a best theoretical aggregation for answering the query; and
summing sizes of the determined best theoretical aggregations.
-
-
24. The computer-readable medium, according to claim 23, having further computer-executable instructions for determining the best theoretical aggregation for answering a query by:
-
identifying a granularity of the query; and
generating, as the best theoretical aggregation, an aggregation having the identified granularity.
-
-
25. The computer-readable medium, according to claim 17, having further computer-executable instructions for determining the minimum cost TCm by:
-
generating a random set As of sample aggregations from a known aggregation space;
determining a total estimated size TSS of the sample aggregations;
sorting the sample aggregations in order of size;
determining a largest fanned aggregation al;
interpolating a total size TSL of aggregations in the random set As that are no larger than the largest fanned aggregation al; and
calculating the minimum cost TCm as a product of the total estimated size TSS of the sample aggregations and a ratio TKS/TSL of a total size TKS of the fanned aggregations to the total size TSL of the aggregations in the random set As that are no larger than the largest fanned aggregation al.
-
-
26. A computer-readable medium for use in estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database, the computer-readable medium having computer-executable instructions for:
-
determining a minimum cost TCm and a maximum cost TCf associated with executing the set of queries;
determining a benefit of using each aggregation of the set of proposed aggregations to execute the set of queries;
summing the determined benefits over all of the aggregations of the set of proposed aggregations; and
calculating the potential performance gain as a ratio of the sum of the determined benefits to a difference between the maximum cost TCf and the minimum cost TCm. - View Dependent Claims (27, 28, 29, 30, 31)
for each query of the set of queries, determining a best aggregation of the proposed set of aggregations for answering the query; and
summing sizes of the determined best aggregations and, for each query for which no aggregation of the proposed set of aggregations is sufficiently detailed to answer, a size of the detailed data.
-
-
29. The computer-readable medium, according to claim 26, having further computer-executable instructions for determining the minimum cost TCm by:
-
for each query of the set of queries, determining a best theoretical aggregation for answering the query; and
summing sizes of the determined best theoretical aggregations.
-
-
30. The computer-readable medium, according to claim 29, having further computer-executable instructions for determining the best theoretical aggregation for answering a query by:
-
identifying a granularity of the query; and
generating, as the best theoretical aggregation, an aggregation having the identified granularity.
-
-
31. The computer-readable medium, according to claim 26, having further computer-executable instructions for determining the minimum cost TCm by:
-
generating a random set As of sample aggregations from a known aggregation space;
determining a total estimated size TSS of the sample aggregations;
sorting the sample aggregations in order of size;
determining a largest fanned aggregation al;
interpolating a total size TSL of aggregations in the random set As that are no larger than the largest fanned aggregation al; and
calculating the minimum cost TCm as a product of the total estimated size TSS of the sample aggregations and a ratio TKS/TSL of a total size TKS of the fanned aggregations to the total size TSL of the aggregations in the random set As that are no larger than the largest fanned aggregation al.
-
-
32. A computer-readable medium for use in estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database, the computer-readable medium having computer-executable instructions for:
-
determining a maximum cost TCf associated with executing the set of queries as a product of a size of the detailed data and a number of queries in the set of queries;
determining a minimum cost TCm associated with executing the set of queries;
determining a cost TCa associated with executing the set of queries using the set of proposed aggregations by, for each query of the set of queries, determining a best aggregation of the proposed set of aggregations for answering the query, and summing sizes of the determined best aggregations and, for each query for which no aggregation of the proposed set of aggregations is sufficiently detailed to answer, a size of the detailed data; and
calculating the potential performance gain as a ratio of (TCf−
TCa) to (TCf−
TCm).
-
-
33. For use in estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database, a computer arrangement configured to:
-
determine a minimum cost TCm and a maximum cost TCf associated with executing the set of queries;
determine a cost TCa associated with executing the set of queries using the set of proposed aggregations; and
calculate the potential performance gain as a function of the minimum cost TCm, the maximum cost TCf, and the cost TCa. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41)
for each query of the set of queries, determining a best aggregation of the proposed set of aggregations for answering the query; and
summing sizes of the determined best aggregations and, for each query for which no aggregation of the proposed set of aggregations is sufficiently detailed to answer, a size of the detailed data.
-
-
39. The computer arrangement, according to claim 33, further configured to determine the minimum cost TCm by:
-
for each query of the set of queries, determining a best theoretical aggregation for answering the query; and
summing sizes of the determined best theoretical aggregations.
-
-
40. The computer arrangement, according to claim 39, further configured to determine the best theoretical aggregation for answering a query by:
-
identifying a granularity of the query; and
generating, as the best theoretical aggregation, an aggregation having the identified granularity.
-
-
41. The computer arrangement, according to claim 33, further configured to determine the minimum cost TCm by:
-
generating a random set As of sample aggregations from a known aggregation space;
determining a total estimated size TSS of the sample aggregations;
sorting the sample aggregations in order of size;
determining a largest fanned aggregation al;
interpolating a total size TSL of aggregations in the random set As that are no larger than the largest fanned aggregation al; and
calculating the minimum cost TCm as a product of the total estimated size TSS of the sample aggregations and a ratio TKS/TSL of a total size TKS of the fanned aggregations to the total size TSL of the aggregations in the random set As that are no larger than the largest fanned aggregation al.
-
-
42. For use in estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database, a computer arrangement configured to:
-
determine a minimum cost TCm and a maximum cost TCf associated with executing the set of queries;
determine a benefit of using each aggregation of the set of proposed aggregations to execute the set of queries;
sum the determined benefits over all of the aggregations of the set of proposed aggregations; and
calculate the potential performance gain as a ratio of the sum of the determined benefits to a difference between the maximum cost TCf and the minimum cost TCm. - View Dependent Claims (43, 44, 45, 46, 47)
for each query of the set of queries, determining a best aggregation of the proposed set of aggregations for answering the query; and
summing sizes of the determined best aggregations and, for each query for which no aggregation of the proposed set of aggregations is sufficiently detailed to answer, a size of the detailed data.
-
-
45. The computer arrangement, according to claim 42, further configured to determine the minimum cost TCm by:
-
for each query of the set of queries, determining a best theoretical aggregation for answering the query; and
summing sizes of the determined best theoretical aggregations.
-
-
46. The computer arrangement, according to claim 45, further configured to determine the best theoretical aggregation for answering a query by:
-
identifying a granularity of the query; and
generating, as the best theoretical aggregation, an aggregation having the identified granularity.
-
-
47. The computer arrangement, according to claim 42, further configured to determine the minimum cost TCm by:
-
generating a random set As of sample aggregations from a known aggregation space;
determining a total estimated size TSS of the sample aggregations;
sorting the sample aggregations in order of size;
determining a largest fanned aggregation al;
interpolating a total size TSL of aggregations in the random set As that are no larger than the largest fanned aggregation al; and
calculating the minimum cost TCm as a product of the total estimated size TSS of the sample aggregations and a ratio TKS/TSL of a total size TKS of the fanned aggregations to the total size TSL of the aggregations in the random set As that are no larger than the largest fanned aggregation al.
-
-
48. For use in estimating a potential performance gain of using a set of proposed aggregations that aggregate detailed data in a database, a computer arrangement configured to:
-
determine a maximum cost TCf associated with executing the set of queries as a product of a size of the detailed data and a number of queries in the set of queries;
determine a minimum cost TCm associated with executing the set of queries;
determine a cost TCa associated with executing the set of queries using the set of proposed aggregations by, for each query of the set of queries, determining a best aggregation of the proposed set of aggregations for answering the query, and summing sizes of the determined best aggregations and, for each query for which no aggregation of the proposed set of aggregations is sufficiently detailed to answer, a size of the detailed data; and
calculate the potential performance gain as a ratio of (TCf−
TCa) to (TCf−
TCm).
-
Specification