Support for sharing computation between aggregations in a data stream management system
First Claim
1. A method implemented in a computer of processing a plurality of streams of data, the method comprising:
- processing the plurality of streams, to execute thereon a plurality of continuous queries based on a global plan;
during the processing, receiving a command to create a first aggregation and a first identification of a set of instructions comprising a function to be executed to perform the first aggregation;
during the processing, receiving a second identification of a second aggregation to be used by the first aggregation;
during the processing, creating in a memory of the computer, a first structure comprising the first identification and the second identification;
during the processing, receiving a new continuous query to be executed using the first aggregation;
during the processing, based on the first structure, creating in the memory an operator comprising at least one second structure, the second structure comprising a first field to hold a reference to the instance, and at least one additional field corresponding to at least one argument of the first aggregation;
during the processing, building in the memory a graph representing a plurality of aggregations as nodes;
wherein the plurality of aggregations comprises said first aggregation and said second aggregation;
wherein a plurality of directed edges are included in the graph, between each node and a group of nodes whose results are used by said each node;
during the processing, performing a sort on the graph, and using a result of the sort to store, in the memory, a temporal order for executing the plurality of aggregations;
during the processing, modifying the global plan in the memory by adding thereto the operator based on the temporal order, thereby to obtain a modified plan in the memory; and
altering the processing, to cause execution of the new continuous query in addition to the plurality of continuous queries, based on the modified plan in the memory;
during execution of the new continuous query creating in the memory an instance of the set of instructions;
invoking the function in the instance to process a tuple of the data received in a message, wherein the function is identified based at least on a type of the message; and
releasing in the memory, space occupied by the instance, in response to a predetermined condition being met.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer is programmed to process a continuous query that is known to perform a new aggregation on one or more stream(s) of data, using one or more other aggregations on the stream(s). The computer creates an operator to execute the continuous query, and schedules the operator for execution in a specific order. In several embodiments, the computer determines the order based on dependency of the new aggregation on other aggregation(s), and on the order of performance of the other aggregation(s). The new aggregation is scheduled for performance after performance of each of the other aggregations. The computer is further programmed to pass results of the other aggregations to the new aggregation, by execution of a predetermined function. Support for use of the other aggregations results within the new aggregation eliminates redundant computation of the other aggregations within the new aggregation. The new aggregation may be user defined or built-in.
105 Citations
5 Claims
-
1. A method implemented in a computer of processing a plurality of streams of data, the method comprising:
-
processing the plurality of streams, to execute thereon a plurality of continuous queries based on a global plan; during the processing, receiving a command to create a first aggregation and a first identification of a set of instructions comprising a function to be executed to perform the first aggregation; during the processing, receiving a second identification of a second aggregation to be used by the first aggregation; during the processing, creating in a memory of the computer, a first structure comprising the first identification and the second identification; during the processing, receiving a new continuous query to be executed using the first aggregation; during the processing, based on the first structure, creating in the memory an operator comprising at least one second structure, the second structure comprising a first field to hold a reference to the instance, and at least one additional field corresponding to at least one argument of the first aggregation; during the processing, building in the memory a graph representing a plurality of aggregations as nodes; wherein the plurality of aggregations comprises said first aggregation and said second aggregation; wherein a plurality of directed edges are included in the graph, between each node and a group of nodes whose results are used by said each node; during the processing, performing a sort on the graph, and using a result of the sort to store, in the memory, a temporal order for executing the plurality of aggregations; during the processing, modifying the global plan in the memory by adding thereto the operator based on the temporal order, thereby to obtain a modified plan in the memory; and altering the processing, to cause execution of the new continuous query in addition to the plurality of continuous queries, based on the modified plan in the memory; during execution of the new continuous query creating in the memory an instance of the set of instructions; invoking the function in the instance to process a tuple of the data received in a message, wherein the function is identified based at least on a type of the message; and releasing in the memory, space occupied by the instance, in response to a predetermined condition being met. - View Dependent Claims (2, 3, 4, 5)
-
Specification