DYNAMIC PARALLEL AGGREGATION WITH HYBRID BATCH FLUSHING
First Claim
1. A method, comprising steps of:
- executing a query execution plan, said query execution plan including a first aggregation operator and a second aggregation operator that is a record source for said first aggregation operator, wherein said first aggregation operator and said second aggregation operator perform the same aggregation operation, wherein executing the query execution plan comprises executing the second aggregation operator, wherein executing the second aggregation operator comprises;
generating an initial version of a batch of aggregation records;
after generating said initial version of said batch of aggregation records, determining whether a first record may be aggregated with any aggregation record in said batch of aggregation records; and
in response to determining that said first record may not be aggregated with any aggregation record of said batch of aggregation records, sending said first record to said first aggregation operator;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, apparatus, and system for dynamic parallel aggregation with hybrid batch flushing are provided. Record sources of an aggregation operator in a query execution plan may dynamically aggregate using the same aggregation operator. The dynamic aggregation creates a batch of aggregation records from an input source, which are then used to aggregate further records from the input source. If a record from the input source is not matched to an aggregation record in the batch, then the record is passed to the next operator. In this manner, records are aggregated ahead of time at a record source to reduce the number of records passed between operators, reducing the impact of network I/O between nodes of a parallel processing system. By adjusting the contents of the batch according to aggregation performance monitored during run-time, hybrid batch flushing can be implemented to adapt to changing data patterns and skewed values.
54 Citations
20 Claims
-
1. A method, comprising steps of:
-
executing a query execution plan, said query execution plan including a first aggregation operator and a second aggregation operator that is a record source for said first aggregation operator, wherein said first aggregation operator and said second aggregation operator perform the same aggregation operation, wherein executing the query execution plan comprises executing the second aggregation operator, wherein executing the second aggregation operator comprises; generating an initial version of a batch of aggregation records; after generating said initial version of said batch of aggregation records, determining whether a first record may be aggregated with any aggregation record in said batch of aggregation records; and in response to determining that said first record may not be aggregated with any aggregation record of said batch of aggregation records, sending said first record to said first aggregation operator; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable medium storing one or more sequences of instructions which, when executed by one or more processors, cause performing of:
executing a query execution plan, said query execution plan including a first aggregation operator and a second aggregation operator that is a record source for said first aggregation operator, wherein said first aggregation operator and said second aggregation operator perform the same aggregation operation, wherein executing the query execution plan comprises executing the second aggregation operator, wherein executing the second aggregation operator comprises; generating an initial version of a batch of aggregation records; after generating said initial version of said batch of aggregation records, determining whether a first record may be aggregated with any aggregation record in said batch of aggregation records; and in response to determining that said first record may not be aggregated with any aggregation record of said batch of aggregation records, sending said first record to said first aggregation operator. - View Dependent Claims (14, 15, 16, 17, 18)
-
19. A database system comprising:
-
a network; a query execution plan including a first aggregation operator and a second aggregation operator that is a record source for said first aggregation operator, wherein said first aggregation operator and said second aggregation operator perform the same aggregation operation, wherein executing the query execution plan comprises executing the second aggregation operator; a first plurality of computing devices executing the first aggregation operator; a second plurality of computing devices executing the second aggregation operator comprising; generating an initial version of a batch of aggregation records; after generating said initial version of said batch of aggregation records, determining whether a first record may be aggregated with any aggregation record in said batch of aggregation records; and in response to determining that said first record may not be aggregated with any aggregation record of said batch of aggregation records, sending said first record to said first aggregation operator over said network. - View Dependent Claims (20)
-
Specification