System and Method for Scalable Processing of Multi-Way Data Stream Correlations
First Claim
Patent Images
1. A computer implemented method for processing multi-way stream correlations, the computer implemented method comprising:
- receiving a set of input data streams;
aligning tuples of each input data stream of the set of input data streams with tuples of a master stream according to timestamps, to form aligned tuples;
splitting the set of input data streams into a plurality of sets of segments, wherein each set of segments of the plurality of sets of segments corresponds to a predetermined amount of time, wherein each segment of a set of segments of the plurality of sets of segments comprises the aligned tuples that arrived within the predetermined amount of time; and
sending each set of segments of the plurality of sets of segments to a different host in a set of hosts for join processing, wherein the join processing is performed on the aligned tuples that arrived within the predetermined amount of time.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer implemented method, apparatus, and computer usable program code for processing multi-way stream correlations. Stream data are received for correlation. A task is formed for continuously partitioning a multi-way stream correlation workload into smaller workload pieces. Each of the smaller workload pieces may be processed by a single host. The stream data are sent to different hosts for correlation processing.
119 Citations
30 Claims
-
1. A computer implemented method for processing multi-way stream correlations, the computer implemented method comprising:
-
receiving a set of input data streams; aligning tuples of each input data stream of the set of input data streams with tuples of a master stream according to timestamps, to form aligned tuples; splitting the set of input data streams into a plurality of sets of segments, wherein each set of segments of the plurality of sets of segments corresponds to a predetermined amount of time, wherein each segment of a set of segments of the plurality of sets of segments comprises the aligned tuples that arrived within the predetermined amount of time; and sending each set of segments of the plurality of sets of segments to a different host in a set of hosts for join processing, wherein the join processing is performed on the aligned tuples that arrived within the predetermined amount of time. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
2. (canceled)
-
14-20. -20. (canceled)
-
21. A computer program product, in a computer readable storage medium, for processing multi-way stream correlations, the computer program product comprising:
-
computer usable program code stored in the computer readable storage medium, wherein the computer usable program code is adapted to cause a processor in a computer to perform steps comprising; receiving a set of input data streams, wherein each input data stream of the set of input data streams has a variable rate of data flow; selecting an input data stream of the set of input data streams with a highest rate of data flow to be designated as a master stream, wherein the designation of the master stream changes based on the variable rate of data flow of each input data stream of the set of input data streams; aligning tuples of each input data stream of the set of input data streams with tuples of the master stream according to timestamps, to form aligned tuples; splitting, continuously, the set of input data streams into a plurality of sets of segments, wherein each set of segments of the plurality of sets of segments comprises one segment from each input data stream of the set of input data streams, wherein each set of segments of the plurality of sets of segments corresponds to a predetermined amount of time, wherein each segment of a set of segments of the plurality of sets of segments comprises the aligned tuples that arrived within the predetermined amount of time; and sending each set of segments of the plurality of sets of segments to a different host in a set of hosts for join processing, wherein the join processing is performed on the aligned tuples that arrived within the predetermined amount of time. - View Dependent Claims (22)
-
-
23. A computer program product, in a computer readable storage medium, for processing multi-way stream correlations, the computer program product comprising:
-
computer usable program code stored in the computer readable storage medium, wherein the computer usable program code is adapted to cause a processor in a computer to perform steps comprising; receiving a set of input data streams, wherein each input data stream of the set of input data streams has a variable rate of data flow; selecting an input data stream of the set of input data streams with a highest rate of data flow to be designated as a master stream, wherein the designation of the master stream changes based on the variable rate of data flow of each input data stream of the set of input data streams; aligning tuples of each input data stream of the set of input data streams with tuples of the master stream according to timestamps, to form aligned tuples; splitting, continuously, the set of input data streams into a set of segments, wherein each segment of the set of segments comprises the aligned tuples that arrived within a predetermined amount of time for a specific input data stream; determining a routing path for a segment of the set of segments based on routing paths of previous segments that comprised tuples aligned with the aligned tuples in the segment, wherein the routing path for the segment denotes a set of hosts that the segment needs to be sent to in order to produce a join result, wherein a multi-way stream join computation is split into multiple smaller join operator computations that are executed on the set of hosts; storing the routing path for the segment in a data structure; adding the routing path to the segment; and sending each segment of the set of segments to the set of hosts in the routing path for join processing, wherein intermediate join results of the join result are routed across the set of hosts, and wherein the aligned tuples of each input data stream of the set of input data streams are routed separately. - View Dependent Claims (24, 25, 26)
-
-
27. An apparatus for processing multi-way stream correlations, the apparatus comprising:
-
a processor, and instructions stored in a memory, wherein the instructions are adapted to cause the processor to perform a plurality of steps comprising; receiving a set of input data streams, wherein each input data stream of the set of input data streams has a variable rate of data flow; selecting an input data stream of the set of input data streams with a highest rate of data flow to be designated as a master stream, wherein the designation of the master stream changes based on the variable rate of data flow of each input data stream of the set of input data streams; aligning tuples of each input data stream of the set of input data streams with tuples of the master stream according to timestamps, to form aligned tuples; splitting, continuously, the set of input data streams into a set of segments, wherein each segment of the set of segments comprises the aligned tuples that arrived within a predetermined amount of time for a specific input data stream; determining a routing path for a segment of the set of segments based on routing paths of previous segments that comprised tuples aligned with the aligned tuples in the segment, wherein the routing path for the segment denotes a set of hosts that the segment needs to be sent to in order to produce a join result, wherein a multi-way stream join computation is split into multiple smaller join operator computations that are executed on the set of hosts; storing the routing path for the segment in a data structure; adding the routing path to the segment; and sending each segment of the set of segments to the set of hosts in the routing path for join processing, wherein intermediate join results of the join result are routed across the set of hosts, and wherein the aligned tuples of each input data stream of the set of input data streams are routed separately. - View Dependent Claims (28, 29, 30)
-
Specification