Continuous flow compute point based data processing
First Claim
1. A method for initiating processing of blocks of data within at least one flow of data input to a graph having a plurality of process stages, including at least one subscriber process stage, at least one publisher process stage, and optionally at least one intermediate process stage, the method including:
- generating a compute point indicator in response to a trigger event;
propagating the compute point indicator through the graph, from each subscriber process stage through any intermediate process stages to each publisher process stage, as part of the flow of data;
for each process stage, processing a current block of data associated with the process stage in response to receipt by the process stage of at least one compute point indicator from an immediately previous process stage associated with the process stage.
4 Assignments
0 Petitions
Accused Products
Abstract
A data processing system and method that provides two processes, checkpointing and compute point propagation, and permits a continuous flow of data processing by allowing each process to (1) return to normal operation after checkpointing or (2) respond to receipt of a compute point indicator, independently of the time required by other processes for similar responsive actions. Checkpointing makes use of a command message from a checkpoint processor that sequentially propagates through a process stage from data sources through processes to data sinks, triggering each process to checkpoint its state and then pass on a checkpointing message to connected “downstream” processes. A compute point indicator marks blocks of records that should be processed as a group within each process. A compute point indicator is triggered and sequentially propagates through a process stage from data sources through processes to data sinks without external control. Compute point indicators also effectively self-synchronize multiple data flows without external control. Use of compute point indicators rather than checkpoints avoids the time delay that saving state imposes, while permitting a continuous flow of data processing, including outputting results.
-
Citations
27 Claims
-
1. A method for initiating processing of blocks of data within at least one flow of data input to a graph having a plurality of process stages, including at least one subscriber process stage, at least one publisher process stage, and optionally at least one intermediate process stage, the method including:
-
generating a compute point indicator in response to a trigger event;
propagating the compute point indicator through the graph, from each subscriber process stage through any intermediate process stages to each publisher process stage, as part of the flow of data;
for each process stage, processing a current block of data associated with the process stage in response to receipt by the process stage of at least one compute point indicator from an immediately previous process stage associated with the process stage. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program, stored on a computer-readable medium, for initiating processing of blocks of data within at least one flow of data input to a graph having a plurality of process stages, including at least one subscriber process stage, at least one publisher process stage, and optionally at least one intermediate process stage, the computer program comprising instructions for causing a computer to:
-
generate a compute point indicator in response to a trigger event;
propagate the compute point indicator through the graph, from each subscriber process stage through any intermediate process stages to each publisher process stage, as part of the flow of data;
for each process stage, process a current block of data associated with the process stage in response to receipt by the process stage of at least one compute point indicator from an immediately previous process stage associated with the process stage. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system for initiating processing of blocks of data within at least one flow of data input to a graph having a plurality of process stages, including at least one subscriber process stage, at least one publisher process stage, and optionally at least one intermediate process stage, the system including:
-
means for generating a compute point indicator in response to a trigger event;
means for propagating the compute point indicator through the graph, from each subscriber process stage through any intermediate process stages to each publisher process stage, as part of the flow of data;
for each process stage, means for processing a current block of data associated with the process stage in response to receipt by the process stage of at least one compute point indicator from an immediately previous process stage associated with the process stage. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification