Evaluation of rollups with distinct aggregates by using sequence of sorts and partitioning by measures

US 6,775,682 B1
Filed: 02/26/2002
Issued: 08/10/2004
Est. Priority Date: 02/26/2002
Status: Active Grant

First Claim

Patent Images

1. A method for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, the method comprising the steps of:

for the lowest level of grouping, generating output by performing the aggregation on data retrieved from one or more base tables; and

for each level of grouping other than the lowest level, generating output by performing the aggregation on a distinct sorted set of records from the previous level, wherein the distinct sorted set of records includes only unique records.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods are provided for efficiently evaluating database queries including a rollup operator and a distinct aggregate function. Using a sequence of sorts, duplicate record elimination performed on previous sorts at lower, or finer, levels of the rollup operator is taken advantage of by performing subsequent sorts on the preceding sort. Hence, when moving from one rollup level to the next higher level, there are fewer data records to sort with respect to the relevant grouping columns for that level, and thus also fewer duplicate data records to eliminate for purposes of computing the distinct aggregate. Using parallel evaluation, processing of aggregate functions is split among different processing slaves, and the measure of an aggregate function is included as a partitioning key when sending data from one data flow operation to the next data flow operation of a query execution plan. Using parallel evaluation for a query that includes two or more aggregate functions, a measure code corresponding with each aggregate function and associated measure values are included as partitioning keys for enhanced load balancing and parallelization.

Citations

28 Claims

1. A method for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, the method comprising the steps of:
- for the lowest level of grouping, generating output by performing the aggregation on data retrieved from one or more base tables; and
  
  for each level of grouping other than the lowest level, generating output by performing the aggregation on a distinct sorted set of records from the previous level, wherein the distinct sorted set of records includes only unique records.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein the steps of generating outputs are performed using a parallel evaluation process, further comprising the steps of:
3. The method of claim 2 wherein the step of partitioning results is based on the aggregation argument of the aggregate function.
4. The method of claim 2 comprising the step of:
- partitioning results from each of the two or more consumer processing slaves based on the one or more grouping field keys specified by the rollup operator and on a grouping identifier.
5. The method of claim 2 wherein the query comprises two or more aggregate functions and wherein the step of partitioning results is based on the aggregation arguments of the two or more aggregate functions and values in fields corresponding to the aggregation arguments.

6. A method for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, the method comprising the steps of:
- for the lowest level of grouping, eliminating one or more records that are duplicates of a record with respective values in one or more respective grouping fields and aggregation argument field, producing a distinct sort, wherein the grouping fields are specified by the grouping field keys and the aggregation argument field is specified by the aggregation argument;
  
  aggregating, from the distinct sort, the data in the aggregation argument field according to the aggregate function, producing a distinct aggregated sort;
  
  outputting the distinct aggregated sort according to the query; and
  
  for each level of grouping other than the lowest level, iterating the steps of eliminating, aggregating, and outputting on the distinct sort of the previous level.

7. A method for evaluating a database query that includes two or more aggregate functions specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, the method comprising the steps of:
- for records of one or more tables that are a subject of the query, generating a set of expanded records by expanding each record into a plurality of records wherein each expanded record of the set of expanded records corresponds to a particular aggregate function;
  
  eliminating from the set of expanded records one or more records that are duplicates of an expanded record with respective values in one or more respective grouping fields and aggregation argument field, producing a distinct sort, wherein the grouping fields are specified by the grouping field keys and the aggregation argument field is specified by the aggregation argument; and
  
  computing from the distinct sort the two or more aggregate functions with results grouped according to the rollup operator.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
- - 8. The method of claim 7 wherein the step of generating a set of expanded records comprises:
9. The method of claim 8 wherein the step of eliminating records comprises sorting the set of expanded records at least in part on the measure code.
10. The method of claim 7 wherein a query execution plan is divided into two or more data flow operations, each of two or more processing slaves are assigned part of the processing associated with at least one of the data flow operations and the two or more processing slaves process, in parallel, the associated data flow operation, and the processing comprises:
- assigning to each expanded record a measure code associated with the aggregate function to which the expanded record corresponds;
  
  partitioning results from each of two or more first processing slaves that process a first data flow operation, wherein partitioning is based on the measure codes;
  
  assigning to each of two or more second processing slaves the processing associated with a second data flow operation; and
  
  processing by the two or more second processing slaves the second data flow operation based on partitioned results from the two or more first processing slaves.
11. The method of claim 10, wherein the step of partitioning results is based additionally on the values in a field specified by the aggregation argument corresponding to each of the two or more aggregate functions.
12. The method of claim 10, further comprising the steps of:
- partitioning results from the two or more second processing slaves processing the second data flow operation based on the one or more grouping field keys of the rollup operator;
  
  defining a third data flow operation for generating output rows by combining outputs for each of the levels of grouping from the two or more second processing slaves; and
  
  processing the third data flow operation based on partitioned results from the two or more second processing slaves.
13. The method of claim 12, wherein the step of partitioning results from the two or more second processing slaves is based on a bit-vector that uniquely identifies each result grouping specified by the one or more grouping field keys corresponding to the rollup grouping function.
14. The method of claim 7 wherein the step of computing comprises:
- for the lowest level of grouping, for each of the two or more aggregate functions, aggregating data from the distinct sort that is in a field specified by the aggregation argument corresponding to the particular aggregate function, producing distinct aggregated sorts;
  
  outputting the distinct aggregated sorts according to the query; and
  
  for each level of grouping other than the lowest level, iterating the steps of aggregating and outputting on the distinct sorts of the previous level.
15. The method of claim 14 wherein the step of generating a set of expanded records comprises:
- assigning to each expanded record a measure code associated with the aggregate function to which the expanded record corresponds.

16. A computer-readable medium carrying one or more sequences of instructions for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- for the lowest level of grouping, generating output by performing the aggregation on data retrieved from one or more base tables; and
  
  for each level of grouping other than the lowest level, generating output by performing the aggregation on a distinct sorted set of records from the previous level, wherein the distinct sorted set of records includes only unique records.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-readable medium of claim 16 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps using a parallel evaluation process, comprising the steps of:
18. The computer-readable medium of claim 17 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step of partitioning results by partitioning results based on the aggregation argument of the aggregate function.
19. The computer-readable medium of claim 17 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step:
- partitioning results from each of the two or more consumer processing slaves based on the one or more grouping field keys specified by the rollup operator and on a grouping identifier.
20. The computer-readable medium of claim 17 wherein the query comprises two or more aggregate functions, and wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step of partitioning results based on the aggregation arguments of the two or more aggregate functions and values in fields corresponding with the aggregation arguments.

21. A computer-readable medium carrying one or more sequences of instructions for evaluating a database query that includes two or more aggregate functions specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for grouping of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- for records of one or more tables that are a subject of the query, generating a set of expanded records by expanding each record into a plurality of records wherein each expanded record of the set of expanded records corresponds to a particular aggregate function;
  
  eliminating from the set of expanded records one or more records that are duplicates of an expanded record with respective values in one or more respective grouping fields and aggregation argument field, producing a distinct sort, wherein the grouping fields are specified by the grouping field keys and the aggregation argument field is specified by the aggregation argument; and
  
  computing from the distinct sort the two or more aggregate functions with results grouped according to the rollup operator.
- View Dependent Claims (22, 23, 24, 25, 26)
- - 22. The computer-readable medium of claim 21 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to:
23. The computer-readable medium of claim 21 wherein a query execution plan for evaluating the database query is divided into two or more data flow operations, each of two or more processing slaves are assigned part of the processing associated with one of the data flow operations and the two or more processing slaves process, in parallel, the associated data flow operation, and wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- assigning to each expanded record a measure code associated with the aggregate function to which the expanded record corresponds;
  
  partitioning results from each of two or more first processing slaves that process the first data flow operation, wherein partitioning is based on the measure codes;
  
  assigning to each of two or more second processing slaves the processing associated with the second data flow operation; and
  
  processing by the two or more second processing slaves the second data flow operation based on partitioned results from the two or more first processing slaves.
24. The computer-readable medium of claim 23, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step of partitioning results based additionally on the values in a field specified by the aggregation argument corresponding to each of the two or more aggregate functions.
25. The computer-readable medium of claim 23 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- partitioning results from the two or more consumer processing slaves processing the second data flow operation based on the one or more grouping field keys of the rollup operator;
  
  defining a third data flow operation for generating output rows by combining outputs for each of the levels of grouping from the two or more consumer processing slaves; and
  
  processing the third data flow operation based on partitioned results from the two or more consumer processing slaves.
26. The computer-readable medium of claim 25 wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the step of partitioning results from the two or more consumer processing slaves based on a bit-vector that uniquely identifies each result grouping specified by the one or more grouping field keys corresponding to the rollup grouping function.

27. A computer apparatus comprising:
- a memory; and
  
  one or more processors coupled to the memory and configured to execute one or more sequence of instructions for evaluating a database query that includes an aggregate function specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for groupings of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of;
  
  for the lowest level of grouping, generating output by performing the aggregation on data retrieved from one or more base tables; and
  
  for each level of grouping other than the lowest level, generating output by performing the aggregation on a distinct sorted set of records from the previous level, wherein the distinct sorted set of records includes only unique records.

28. A computer apparatus comprising:
- a memory; and
  
  one or more processors coupled to the memory and configured to execute one or more sequence of instructions for evaluating a database query that includes two or more aggregate functions specifying aggregation of data in a field, specified by an aggregation argument, from distinct records and that includes a rollup operator specifying one or more grouping field keys for groupings of results, wherein a result group consisting of data grouped by all fields specified by the grouping field keys is the lowest level of grouping and a result group consisting of data grouped by no fields specified by the grouping field keys is the highest level of grouping, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of;
  
  for records of one or more tables that are a subject of the query, generating a set of expanded records by expanding each record into a plurality of records wherein each expanded record of the set of expanded records corresponds to a particular aggregate function;
  
  eliminating from the set of expanded records one or more records that are duplicates of an expanded record with respective values in one or more respective grouping fields and aggregation argument field, producing a distinct sort, wherein the grouping fields are specified by the grouping field keys and the aggregation argument field is specified by the aggregation argument; and
  
  computing from the distinct sort the two or more aggregate functions with results grouped according to the rollup operator.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Gupta, Abhinav, Witkowski, Andrew, Ballamkonda, Srikanth
Primary Examiner(s)
Mizrahi, Diane D.

Application Number

US10/084,642
Time in Patent Office

896 Days
Field of Search

707/102, 707/1, 707/2, 707/3, 707/4, 707/5
US Class Current

1/1
CPC Class Codes

G06F 16/24556 Aggregation; Duplicate elim...

Y10S 707/99943 Generating database or data...

Evaluation of rollups with distinct aggregates by using sequence of sorts and partitioning by measures

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Evaluation of rollups with distinct aggregates by using sequence of sorts and partitioning by measures

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links