Evaluation of rollups with distinct aggregates by using sequence of sorts and partitioning by measures
First Claim
1. A method for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, the method comprising the steps of:
- for the lowest level of grouping, generating output by performing the aggregation on data retrieved from one or more base tables; and
for each level of grouping other than the lowest level, generating output by performing the aggregation on a distinct sorted set of records from the previous level, wherein the distinct sorted set of records includes only unique records.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods are provided for efficiently evaluating database queries including a rollup operator and a distinct aggregate function. Using a sequence of sorts, duplicate record elimination performed on previous sorts at lower, or finer, levels of the rollup operator is taken advantage of by performing subsequent sorts on the preceding sort. Hence, when moving from one rollup level to the next higher level, there are fewer data records to sort with respect to the relevant grouping columns for that level, and thus also fewer duplicate data records to eliminate for purposes of computing the distinct aggregate. Using parallel evaluation, processing of aggregate functions is split among different processing slaves, and the measure of an aggregate function is included as a partitioning key when sending data from one data flow operation to the next data flow operation of a query execution plan. Using parallel evaluation for a query that includes two or more aggregate functions, a measure code corresponding with each aggregate function and associated measure values are included as partitioning keys for enhanced load balancing and parallelization.
-
Citations
28 Claims
-
1. A method for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, the method comprising the steps of:
-
for the lowest level of grouping, generating output by performing the aggregation on data retrieved from one or more base tables; and
for each level of grouping other than the lowest level, generating output by performing the aggregation on a distinct sorted set of records from the previous level, wherein the distinct sorted set of records includes only unique records. - View Dependent Claims (2, 3, 4, 5)
defining a first data flow operation of a query evaluation plan as a step of eliminating duplicates in the field specified by the aggregation argument from the one or more base tables;
processing by two or more producer processing slaves the first data flow operation;
partitioning results from each of the two or more producer processing slaves;
defining a second data flow operation of the query evaluation plan as a step of eliminating duplicates in the field specified by the aggregation argument for each level of the rollup operator using a sequence of sorts, wherein a first duplicate elimination is performed on a result produced from the first data flow operation and each other duplicate elimination is performed on a result produced from a preceding sort;
a step of aggregating on each sort corresponding to each level of the rollup operator according to the aggregate function;
processing by two or more consumer processing slaves the second data flow operation based on partitioned results; and
defining a third data flow operation of the query evaluation plan as a step of aggregating for each level of the rollup operator by combining results produced by the step of aggregating from the second data flow operation.
-
-
3. The method of claim 2 wherein the step of partitioning results is based on the aggregation argument of the aggregate function.
-
4. The method of claim 2 comprising the step of:
partitioning results from each of the two or more consumer processing slaves based on the one or more grouping field keys specified by the rollup operator and on a grouping identifier.
-
5. The method of claim 2 wherein the query comprises two or more aggregate functions and wherein the step of partitioning results is based on the aggregation arguments of the two or more aggregate functions and values in fields corresponding to the aggregation arguments.
-
6. A method for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, the method comprising the steps of:
-
for the lowest level of grouping, eliminating one or more records that are duplicates of a record with respective values in one or more respective grouping fields and aggregation argument field, producing a distinct sort, wherein the grouping fields are specified by the grouping field keys and the aggregation argument field is specified by the aggregation argument;
aggregating, from the distinct sort, the data in the aggregation argument field according to the aggregate function, producing a distinct aggregated sort;
outputting the distinct aggregated sort according to the query; and
for each level of grouping other than the lowest level, iterating the steps of eliminating, aggregating, and outputting on the distinct sort of the previous level.
-
-
7. A method for evaluating a database query that includes two or more aggregate functions specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, the method comprising the steps of:
-
for records of one or more tables that are a subject of the query, generating a set of expanded records by expanding each record into a plurality of records wherein each expanded record of the set of expanded records corresponds to a particular aggregate function;
eliminating from the set of expanded records one or more records that are duplicates of an expanded record with respective values in one or more respective grouping fields and aggregation argument field, producing a distinct sort, wherein the grouping fields are specified by the grouping field keys and the aggregation argument field is specified by the aggregation argument; and
computing from the distinct sort the two or more aggregate functions with results grouped according to the rollup operator. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
assigning to each expanded record a measure code associated with the aggregate function to which the expanded record corresponds.
-
-
9. The method of claim 8 wherein the step of eliminating records comprises sorting the set of expanded records at least in part on the measure code.
-
10. The method of claim 7 wherein a query execution plan is divided into two or more data flow operations, each of two or more processing slaves are assigned part of the processing associated with at least one of the data flow operations and the two or more processing slaves process, in parallel, the associated data flow operation, and the processing comprises:
-
assigning to each expanded record a measure code associated with the aggregate function to which the expanded record corresponds;
partitioning results from each of two or more first processing slaves that process a first data flow operation, wherein partitioning is based on the measure codes;
assigning to each of two or more second processing slaves the processing associated with a second data flow operation; and
processing by the two or more second processing slaves the second data flow operation based on partitioned results from the two or more first processing slaves.
-
-
11. The method of claim 10, wherein the step of partitioning results is based additionally on the values in a field specified by the aggregation argument corresponding to each of the two or more aggregate functions.
-
12. The method of claim 10, further comprising the steps of:
-
partitioning results from the two or more second processing slaves processing the second data flow operation based on the one or more grouping field keys of the rollup operator;
defining a third data flow operation for generating output rows by combining outputs for each of the levels of grouping from the two or more second processing slaves; and
processing the third data flow operation based on partitioned results from the two or more second processing slaves.
-
-
13. The method of claim 12, wherein the step of partitioning results from the two or more second processing slaves is based on a bit-vector that uniquely identifies each result grouping specified by the one or more grouping field keys corresponding to the rollup grouping function.
-
14. The method of claim 7 wherein the step of computing comprises:
-
for the lowest level of grouping, for each of the two or more aggregate functions, aggregating data from the distinct sort that is in a field specified by the aggregation argument corresponding to the particular aggregate function, producing distinct aggregated sorts;
outputting the distinct aggregated sorts according to the query; and
for each level of grouping other than the lowest level, iterating the steps of aggregating and outputting on the distinct sorts of the previous level.
-
-
15. The method of claim 14 wherein the step of generating a set of expanded records comprises:
assigning to each expanded record a measure code associated with the aggregate function to which the expanded record corresponds.
-
16. A computer-readable medium carrying one or more sequences of instructions for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
-
for the lowest level of grouping, generating output by performing the aggregation on data retrieved from one or more base tables; and
for each level of grouping other than the lowest level, generating output by performing the aggregation on a distinct sorted set of records from the previous level, wherein the distinct sorted set of records includes only unique records. - View Dependent Claims (17, 18, 19, 20)
defining a first data flow operation of a query evaluation plan as a step of eliminating duplicates in the field specified by the aggregation argument from the one or more base tables;
processing by two or more first processing slaves the first data flow operation;
partitioning results from each of the two or more first processing slaves;
defining a second data flow operation of the query evaluation plan asa step of eliminating duplicates in the field specified by the aggregation argument for each level of the rollup operator using a sequence of sorts, wherein a first duplicate elimination is performed on a result produced from the first data flow operation and each other duplicate elimination is performed on a result produced from a preceding sort;
a step of aggregating on each sort corresponding to each level of the rollup operator according to the aggregate function;
processing by two or more second processing slaves the second data flow operation based on partitioned results; and
defining a third data flow operation of the query evaluation plan as a step of aggregating for each level of the rollup operator by combining results produced by the step of aggregating from the second data flow operation.
-
-
18. The computer-readable medium of claim 17 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step of partitioning results by partitioning results based on the aggregation argument of the aggregate function.
-
19. The computer-readable medium of claim 17 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step:
partitioning results from each of the two or more consumer processing slaves based on the one or more grouping field keys specified by the rollup operator and on a grouping identifier.
-
20. The computer-readable medium of claim 17 wherein the query comprises two or more aggregate functions, and wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step of partitioning results based on the aggregation arguments of the two or more aggregate functions and values in fields corresponding with the aggregation arguments.
-
21. A computer-readable medium carrying one or more sequences of instructions for evaluating a database query that includes two or more aggregate functions specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
-
for records of one or more tables that are a subject of the query, generating a set of expanded records by expanding each record into a plurality of records wherein each expanded record of the set of expanded records corresponds to a particular aggregate function;
eliminating from the set of expanded records one or more records that are duplicates of an expanded record with respective values in one or more respective grouping fields and aggregation argument field, producing a distinct sort, wherein the grouping fields are specified by the grouping field keys and the aggregation argument field is specified by the aggregation argument; and
computing from the distinct sort the two or more aggregate functions with results grouped according to the rollup operator. - View Dependent Claims (22, 23, 24, 25, 26)
perform the step of generating a set of expanded records by assigning to each expanded record a measure code associated with the aggregate function to which the expanded record corresponds; and
perform the step of eliminating records by sorting the set of expanded records at least in part on the measure code.
-
-
23. The computer-readable medium of claim 21 wherein a query execution plan for evaluating the database query is divided into two or more data flow operations, each of two or more processing slaves are assigned part of the processing associated with one of the data flow operations and the two or more processing slaves process, in parallel, the associated data flow operation, and wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
-
assigning to each expanded record a measure code associated with the aggregate function to which the expanded record corresponds;
partitioning results from each of two or more first processing slaves that process the first data flow operation, wherein partitioning is based on the measure codes;
assigning to each of two or more second processing slaves the processing associated with the second data flow operation; and
processing by the two or more second processing slaves the second data flow operation based on partitioned results from the two or more first processing slaves.
-
-
24. The computer-readable medium of claim 23, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step of partitioning results based additionally on the values in a field specified by the aggregation argument corresponding to each of the two or more aggregate functions.
-
25. The computer-readable medium of claim 23 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
-
partitioning results from the two or more consumer processing slaves processing the second data flow operation based on the one or more grouping field keys of the rollup operator;
defining a third data flow operation for generating output rows by combining outputs for each of the levels of grouping from the two or more consumer processing slaves; and
processing the third data flow operation based on partitioned results from the two or more consumer processing slaves.
-
-
26. The computer-readable medium of claim 25 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step of partitioning results from the two or more consumer processing slaves based on a bit-vector that uniquely identifies each result grouping specified by the one or more grouping field keys corresponding to the rollup grouping function.
-
27. A computer apparatus comprising:
-
a memory; and
one or more processors coupled to the memory and configured to execute one or more sequence of instructions for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for groupings of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of;
for the lowest level of grouping, generating output by performing the aggregation on data retrieved from one or more base tables; and
for each level of grouping other than the lowest level, generating output by performing the aggregation on a distinct sorted set of records from the previous level, wherein the distinct sorted set of records includes only unique records.
-
-
28. A computer apparatus comprising:
-
a memory; and
one or more processors coupled to the memory and configured to execute one or more sequence of instructions for evaluating a database query that includes two or more aggregate functions specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for groupings of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of;
for records of one or more tables that are a subject of the query, generating a set of expanded records by expanding each record into a plurality of records wherein each expanded record of the set of expanded records corresponds to a particular aggregate function;
eliminating from the set of expanded records one or more records that are duplicates of an expanded record with respective values in one or more respective grouping fields and aggregation argument field, producing a distinct sort, wherein the grouping fields are specified by the grouping field keys and the aggregation argument field is specified by the aggregation argument; and
computing from the distinct sort the two or more aggregate functions with results grouped according to the rollup operator.
-
Specification