APPARATUS, SYSTEM, AND METHOD FOR PERFORMING FAST APPROXIMATE COMPUTATION OF STATISTICS ON QUERY EXPRESSIONS
First Claim
1. A computer program product comprising a computer readable medium having:
- computer usable program code programmed to perform fast approximate computation of statistics on query expressions within a database management system (DBMS) by accurately estimating the sizes of intermediate query results based on frequency statistics, the operations of the computer program product comprising;
analyzing a query expression for join instructions;
identifying a fact table and a dimension table from the join instructions;
retrieving frequency statistics from a catalog table corresponding to distinct values within one or more join columns of the fact table;
generating a frequency statistics table comprising the frequency statistics retrieved from the catalog table;
estimating frequency statistics corresponding to each column of a join result between the fact table and the dimension table by generating a statistical view comprising a join of the generated frequency statistics table and the dimension table by using the generated frequency statistics table to simulate the fact table; and
populating the statistical view with the estimated frequency statistics.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus, system, and method are disclosed for performing fast approximate computation of statistics on query expressions in order to improve query optimization within a database management system by accurately and quickly estimating the sizes of intermediate query results. This is accomplished by analyzing a query for join instruction and identifying a fact table and a dimension table within those join instructions. Then, frequency statistics corresponding to distinct values of within the fact table are retrieved from a catalog table. Those frequency statistics are used in combination with a full scan of the dimension table accurately and quickly estimate frequency statistics for an expected join between the fact table and dimension table. The estimated frequency statistics corresponding to the expected join may then be used in such operations as query optimization.
-
Citations
6 Claims
-
1. A computer program product comprising a computer readable medium having:
- computer usable program code programmed to perform fast approximate computation of statistics on query expressions within a database management system (DBMS) by accurately estimating the sizes of intermediate query results based on frequency statistics, the operations of the computer program product comprising;
analyzing a query expression for join instructions; identifying a fact table and a dimension table from the join instructions; retrieving frequency statistics from a catalog table corresponding to distinct values within one or more join columns of the fact table; generating a frequency statistics table comprising the frequency statistics retrieved from the catalog table; estimating frequency statistics corresponding to each column of a join result between the fact table and the dimension table by generating a statistical view comprising a join of the generated frequency statistics table and the dimension table by using the generated frequency statistics table to simulate the fact table; and populating the statistical view with the estimated frequency statistics. - View Dependent Claims (2, 3, 4)
- computer usable program code programmed to perform fast approximate computation of statistics on query expressions within a database management system (DBMS) by accurately estimating the sizes of intermediate query results based on frequency statistics, the operations of the computer program product comprising;
-
5. An apparatus to perform fast approximate computation of statistics on database query expressions by accurately estimating the sizes of intermediate query results based on frequency statistics, the apparatus comprising:
-
a query analysis module configured to analyze a query expression for join instructions; an identification module configured to identify a fact table and a dimension table from the join instructions and from the relative sizes of the fact table and the dimension table; a statistics retrieval module configured to retrieve frequency statistics from a catalog table, the frequency statistics corresponding to distinct values within a join column of the fact table; an estimation module configured to estimate frequency statistics corresponding to each column of a join result between the fact table and the dimension table by generating a statistical view comprising a join of the generated frequency statistics table and the dimension table by using the generated frequency statistics table to simulate the fact table; a population module configured to populate the statistical view with the estimated frequency statistics; and an output module configured to make available the estimated frequency statistics to a query optimizer. - View Dependent Claims (6)
-
Specification