Group-by size result estimation
First Claim
1. A method for estimating a result size of a Group-By operation comprising:
- receiving, by at least one computing device, the Group-By operation comprising inputs referencing database tables stored on a computer readable medium;
calculating, by the at least one computing device, a cumulative selectivity in the Group-By operation by aggregating a normalized selectivity of a first column and a normalized selectivity of a second column,wherein the first column and the second column are referenced in the inputs of the Group-By operation,wherein the normalized selectivity for the first column is calculated using a catalogued selectivity for the first column, a size of a largest table of the database tables referenced by the inputs of the Group-By operation, and a size of a database table in which the first column resides, wherein the catalogued selectivity for the first column is a statistical value, representing a degree of uniqueness of the first column, stored in catalog tables different and distinct from the database tables,wherein the normalized selectivity for the second column is calculated using a catalogued selectivity for the second column, the size of the largest table of the database tables referenced by the inputs of the Group-By operation, and a size of a database table in which the second column resides, wherein the catalogued selectivity for the second column is a statistical value, representing a degree of uniqueness of the second column, stored in the catalog tables, andwherein a size of a particular table is a quantity of rows in the particular table; and
allocating, by the at least one computing device, memory for the Group-By operation based upon a memory requirement responsive to the cumulative selectivity.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for accurately estimating a result size of a Group-By operation in a relational database. The estimate utilizes the probability of union of the columns involved in the operation, as well as the relative cardinality of each column with respect to the other columns in the operation. In addition, the estimate incorporates the use of table filters when indicated such that table filters are applied prior to determining the size of the tables in the operation, as well as including equivalent columns into the list of columns that are a part of the Group-By operation. Accordingly, the estimate of the result size of the operation includes influencing factors that provide an accurate estimation of system memory requirements.
16 Citations
12 Claims
-
1. A method for estimating a result size of a Group-By operation comprising:
-
receiving, by at least one computing device, the Group-By operation comprising inputs referencing database tables stored on a computer readable medium; calculating, by the at least one computing device, a cumulative selectivity in the Group-By operation by aggregating a normalized selectivity of a first column and a normalized selectivity of a second column, wherein the first column and the second column are referenced in the inputs of the Group-By operation, wherein the normalized selectivity for the first column is calculated using a catalogued selectivity for the first column, a size of a largest table of the database tables referenced by the inputs of the Group-By operation, and a size of a database table in which the first column resides, wherein the catalogued selectivity for the first column is a statistical value, representing a degree of uniqueness of the first column, stored in catalog tables different and distinct from the database tables, wherein the normalized selectivity for the second column is calculated using a catalogued selectivity for the second column, the size of the largest table of the database tables referenced by the inputs of the Group-By operation, and a size of a database table in which the second column resides, wherein the catalogued selectivity for the second column is a statistical value, representing a degree of uniqueness of the second column, stored in the catalog tables, and wherein a size of a particular table is a quantity of rows in the particular table; and allocating, by the at least one computing device, memory for the Group-By operation based upon a memory requirement responsive to the cumulative selectivity. - View Dependent Claims (2, 3, 4)
-
-
5. A system for estimating a result size of a Group-By operation comprising:
a memory configured to store modules comprising; a first module configured to organize database tables stored on a computer readable medium with data records maintained in the database tables; a second module configured to calculate a cumulative selectivity in the Group-By operation by aggregating a normalized selectivity of a first column and a normalized selectivity of a second column, wherein the first column and the second column are referenced in the inputs of the Group By operation, wherein the normalized selectivity for the first column is calculated using a catalogued selectivity for the first column, a size of a largest table of the database tables referenced by the inputs of the Group-By operation, and a size of a database table in which the first column resides, wherein the catalogued selectivity for the first column is a statistical value, representing a degree of uniqueness of the first column, stored in catalog tables different and distinct from the database tables, wherein the normalized selectivity for the second column is calculated using a catalogued selectivity for the second column, the size of the largest table of the database tables referenced by the inputs of the Group-By operation, and a size of a database table in which the second column resides, wherein the catalogued selectivity for the second column is a statistical value, representing a degree of uniqueness of the second column, stored in the catalog tables, and wherein a size of a particular table is a quantity of rows in the particular table; and a third module configured to allocate memory for the Group-By operation based upon a memory requirement responsive to the cumulative selectivity; and one or more processors implemented at least partially in hardware configured to process the modules. - View Dependent Claims (6, 7, 8)
-
9. A non-transitory computer-readable storage medium having computer-executable instructions stored thereon that, in response to being executed by a computing device, cause the computing device to perform operations for estimating a result size of a Group-By operation, the operations comprising:
-
receiving the Group-By operation comprising inputs referencing database tables stored on a computer readable medium; calculating a cumulative selectivity in the Group-By operation by aggregating a normalized selectivity of a first column and a normalized selectivity of a second column, wherein the first column and the second column are referenced in the inputs of the Group By operation, wherein the normalized selectivity for the first column is calculated using a catalogued selectivity for the first column, a size of a largest table of the database tables referenced by the inputs of the Group-By operation, and a size of a database table in which the first column resides, wherein the catalogued selectivity for the first column is a statistical value, representing a degree of uniqueness of the first column, stored in catalog tables different and distinct from the database tables, wherein the normalized selectivity for the second column is calculated using a catalogued selectivity for the second column, the size of the largest table of the database tables referenced by the inputs of the Group-By operation, and a size of a database table in which the second column resides, wherein the catalogued selectivity for the second column is a statistical value, representing a degree of uniqueness of the second column, stored in the catalog tables, and wherein a size of a particular table is a quantity of rows in the particular table; and allocating memory for the Group-By operation based upon a memory requirement responsive to the cumulative selectivity. - View Dependent Claims (10, 11, 12)
-
Specification