Checkpointing in distributed streaming platform for real-time applications
First Claim
1. A method, comprising:
- receiving a data stream for an application running on a distributed streaming platform over a networked cluster of servers;
converting the data into a plurality of data tuples structured according to a schema;
repeatedly emitting a specified number of the data tuples as a streaming window, which is separated from other streaming windows by a leading control tuple associated with an ordinal identifier for the streaming window and by a trailing control tuple associated with the same ordinal identifier, wherein the streaming window is an atomic sequence of tuples that is associated with a recovery policy; and
emitting a checkpointing tuple following the trailing control tuple after a specified number of streaming windows, wherein the checkpointing tuple causes checkpointing of an instance of an operator for the application when the checkpointing tuple is received by the instance, wherein each of the operations is executed by one or more processors in real time or near real time rather than offline.
1 Assignment
0 Petitions
Accused Products
Abstract
Software for a distributed streaming platform receives a data stream for an application running on a distributed streaming platform over a networked cluster of servers. The software converts the data into a plurality of data tuples structured according to a schema. And the software repeatedly emits a specified number of the data tuples as a streaming window, which is separated from other streaming windows by a leading control tuple associated with an ordinal identifier for the streaming window and by a trailing control tuple associated with the same ordinal identifier. Then the software emits a checkpointing tuple following the trailing control tuple after a specified number of streaming windows. The checkpointing tuple causes checkpointing of an instance of an operator for the application when the checkpointing tuple is received by the instance.
-
Citations
20 Claims
-
1. A method, comprising:
-
receiving a data stream for an application running on a distributed streaming platform over a networked cluster of servers; converting the data into a plurality of data tuples structured according to a schema; repeatedly emitting a specified number of the data tuples as a streaming window, which is separated from other streaming windows by a leading control tuple associated with an ordinal identifier for the streaming window and by a trailing control tuple associated with the same ordinal identifier, wherein the streaming window is an atomic sequence of tuples that is associated with a recovery policy; and emitting a checkpointing tuple following the trailing control tuple after a specified number of streaming windows, wherein the checkpointing tuple causes checkpointing of an instance of an operator for the application when the checkpointing tuple is received by the instance, wherein each of the operations is executed by one or more processors in real time or near real time rather than offline. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more computer-readable media persistently storing one or more programs, wherein the one or more programs, when executed, instructs one or more processors to perform the following operations:
-
receive a data stream for an application running on a distributed streaming platform over a networked cluster of servers; convert the data into a plurality of data tuples structured according to a schema; repeatedly emit a specified number of the data tuples as a streaming window, which is separated from other streaming windows by a leading control tuple associated with an ordinal identifier for the streaming window and by a trailing control tuple associated with the same ordinal identifier, wherein the streaming window is an atomic sequence of tuples that is associated with a recovery policy; and emit a checkpointing tuple following the trailing control tuple after a specified number of streaming windows, wherein the checkpointing tuple causes checkpointing of an instance of an operator for the application when the checkpointing tuple is received by the instance, wherein each of the operations is executed in real time or near real time rather than offline. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method, comprising:
-
receiving a data stream for an application running on a distributed streaming platform over a networked cluster of servers, wherein the application uses one or more stream modes from the group of stream modes consisting of in-line, in-node, and in-rack; converting the data into a plurality of data tuples structured according to a schema; repeatedly emitting a specified number of the data tuples as a streaming window, which is separated from other streaming windows by a leading control tuple associated with an ordinal identifier for the streaming window and by a trailing control tuple associated with the same ordinal identifier, wherein the streaming window is an atomic sequence of tuples that is associated with a recovery policy; and emitting a checkpointing tuple following the trailing control tuple after a specified number of streaming windows, wherein the checkpointing tuple causes checkpointing of an instance of an operator for the application when the checkpointing tuple is received by the instance, wherein each of the operations is executed by one or more processors in real time or near real time rather than offline. - View Dependent Claims (20)
-
Specification