RECOVERY STRATEGY FOR A STREAM PROCESSING SYSTEM
First Claim
1. A method of providing a fault tolerance strategy for a stream processing system, the method including:
- defining a computing grid that consumes data from a message bus as numerous batches, wherein the message bus queues events from one or more near real-time (NRT) data streams in numerous partitions and each event logged in a given partition is assigned a unique event offset;
before processing, in the computing grid, a first batch from a first partition, identifying a current event offset and an end event offset in the first partition and locking the first batch to include events logged between the current event offset and the end event offset;
persisting the current event offset and the end event offset;
detecting failed processing of the first batch and responding by reprocessing the first batch in the computing grid, wherein, between the processing and the reprocessing, the end event offset is subject to change caused by the message bus queuing new events in the first partition; and
restricting the reprocessing of the first batch to events logged between the current event offset and the end event offset, thereby preventing inclusion of the new events in the first batch.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology disclosed relates to discovering multiple previously unknown and undetected technical problems in fault tolerance and data recovery mechanisms of modern stream processing systems. In addition, it relates to providing technical solutions to these previously unknown and undetected problems. In particular, the technology disclosed relates to discovering the problem of modification of batch size of a given batch during its replay after a processing failure. This problem results in over-count when the input during replay is not a superset of the input fed at the original play. Further, the technology disclosed discovers the problem of inaccurate counter updates in replay schemes of modern stream processing systems when one or more keys disappear between a batch'"'"'s first play and its replay. This problem is exacerbated when data in batches is merged or mapped with data from an external data store.
51 Citations
20 Claims
-
1. A method of providing a fault tolerance strategy for a stream processing system, the method including:
-
defining a computing grid that consumes data from a message bus as numerous batches, wherein the message bus queues events from one or more near real-time (NRT) data streams in numerous partitions and each event logged in a given partition is assigned a unique event offset; before processing, in the computing grid, a first batch from a first partition, identifying a current event offset and an end event offset in the first partition and locking the first batch to include events logged between the current event offset and the end event offset; persisting the current event offset and the end event offset; detecting failed processing of the first batch and responding by reprocessing the first batch in the computing grid, wherein, between the processing and the reprocessing, the end event offset is subject to change caused by the message bus queuing new events in the first partition; and restricting the reprocessing of the first batch to events logged between the current event offset and the end event offset, thereby preventing inclusion of the new events in the first batch. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of providing a fault tolerance strategy for a stream processing system, the method including:
-
defining a computing grid that consumes data from a message bus as numerous batches, wherein the message bus queues events from one or more near real-time (NRT) data streams in numerous partitions and each event logged in a given partition is assigned a unique event offset; persisting, in a batch-oriented log segment, keys produced by merging input from events in a first batch from a first partition with key values obtained from an external data store for the events along with corresponding current event offsets for event-key value pairs; and detecting failed processing of the first batch, wherein, between the processing and reprocessing, the external data source key values are subject to change, and responding by; reprocessing the first batch in the computing grid and evaluating the persisted results of keys in the batch-oriented log segment; comparing the reprocessing results with the evaluating results to determine changes in the external data source key values between the processing and the reprocessing; and reporting the changes between the processing and the reprocessing for use in subsequent processes. - View Dependent Claims (12, 13, 14, 19, 20)
-
-
15. A non-transitory computer readable storage medium impressed with computer program instructions to provide a recovery strategy for a stream processing system, the instructions, when executed on a processor, implement a method comprising:
-
defining a computing grid that consumes data from a message bus as numerous batches, wherein the message bus queues events from one or more near real-time (NRT) data streams in numerous partitions and each event logged in a given partition is assigned a unique event offset; persisting, in a batch-oriented log segment, keys produced by merging input from events in a first batch from a first partition with key values obtained from an external data store for the events along with corresponding current event offsets for event-key value pairs; and detecting failed processing of the first batch, wherein, between the processing and reprocessing, the external data source key values are subject to change, and responding by; reprocessing the first batch in the computing grid and evaluating the persisted results of keys in the batch-oriented log segment; comparing the reprocessing results with the evaluating results to determine changes in the external data source key values between the processing and the reprocessing; and reporting the changes between the processing and the reprocessing for use in subsequent processes. - View Dependent Claims (16, 17, 18)
-
Specification