Memory-constrained aggregation using intra-operator pipelining
First Claim
Patent Images
1. A computer implemented method for aggregating tuples, comprising:
- separating, by at least one processor, the tuples into a plurality of partitions using a hash function;
first processing, by the at least one processor, a portion of the tuples associated with a partition in the plurality of partitions based on the separating;
reserving, by the at least one processor, a first block for the partition from a free list based on a minimum number of blocks in the free list to perform a second processing and a third inserting;
first inserting, by the at least one processor, the processed portion of the tuples into the first block;
second inserting, by the at least one processor the first block into a buffer associated with the partition;
performing the second processing, by the at least one processor, the first block in the buffer; and
performing the third inserting, by the at least one processor, the processed first block into the free list,wherein at least one of the separating, first processing, first inserting, second inserting, second processing, and third inserting is performed by one or more computers.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed herein are system, method, and computer program product embodiments for constraining the amount of memory used during data aggregation. An embodiment operates by separating input data into a plurality of partitions. The embodiment then inserts portions of the input data into blocks from a free list at a given level of a pipeline. The embodiment then inserts the blocks into buffers for processing at a subsequent level of the pipeline. The embodiment processes the inserted blocks at the subsequent level of the pipeline and concatenates the intermediate results into a final aggregate result.
32 Citations
20 Claims
-
1. A computer implemented method for aggregating tuples, comprising:
-
separating, by at least one processor, the tuples into a plurality of partitions using a hash function; first processing, by the at least one processor, a portion of the tuples associated with a partition in the plurality of partitions based on the separating; reserving, by the at least one processor, a first block for the partition from a free list based on a minimum number of blocks in the free list to perform a second processing and a third inserting; first inserting, by the at least one processor, the processed portion of the tuples into the first block; second inserting, by the at least one processor the first block into a buffer associated with the partition; performing the second processing, by the at least one processor, the first block in the buffer; and performing the third inserting, by the at least one processor, the processed first block into the free list, wherein at least one of the separating, first processing, first inserting, second inserting, second processing, and third inserting is performed by one or more computers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system, comprising:
-
a memory; and at least one processor coupled to the memory and configured to; separate tuples into a plurality of partitions using a hash function; first process a portion of the tuples associated with a partition in the plurality of partitions based on the separation; reserve a first block for the partition from a free list based on a minimum number of blocks in the free list to perform a second process and a third insert; first insert the processed portion of the tuples into the first block; second insert the first block into a buffer associated with the partition; perform the second process the first block in the buffer; and perform the third insert the processed first block into the free list. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A tangible computer-readable device having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations comprising:
-
separating tuples into a plurality of partitions using a hash function; first processing a portion of the tuples associated with a partition in the plurality of partitions based on the separating; reserving a first block for the partition from a free list based on a minimum number of blocks in the free list to perform a second processing and a third inserting; first inserting the processed portion of the tuples into the first block; second inserting the first block into a buffer associated with the partition; performing the second processing the first block in the buffer; and performing the third inserting the processed first block into the free list. - View Dependent Claims (17, 18, 19, 20)
-
Specification