RECOVERING A FAILURE IN A DATA PROCESSING SYSTEM
First Claim
Patent Images
1. A method of recovering a failure in a data processing system comprising:
- at a source node, recording input/output channels and segment numbers for all tuples received in a window;
processing each input tuple to derive a number of output tuples, each output tuple comprising the recorded input/output channels and segment numbers; and
if a failure occurs at a target node;
restore a last window state of the target node; and
request a number of tuples from a current window of the target node up to a current tuple to be resent from a source node based on the input/output channels and segment numbers recorded at the source node.
2 Assignments
0 Petitions
Accused Products
Abstract
A technique of recovering a failure in a data processing system comprises restoring a checkpointed state in a last window, and resending all the input messages received at the second node during the failed window boundary.
-
Citations
16 Claims
-
1. A method of recovering a failure in a data processing system comprising:
-
at a source node, recording input/output channels and segment numbers for all tuples received in a window; processing each input tuple to derive a number of output tuples, each output tuple comprising the recorded input/output channels and segment numbers; and if a failure occurs at a target node; restore a last window state of the target node; and request a number of tuples from a current window of the target node up to a current tuple to be resent from a source node based on the input/output channels and segment numbers recorded at the source node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for processing data, comprising:
-
a processor; and a memory communicatively coupled to the processor, in which the processor, executing computer usable program code; checkpoints a number of states and a number of output messages once per window; emits the output tasks to a second node; and if one of the output tasks fails at the second node; restores the checkpointed state in a last window; and resends all the input messages received at the second node during the failed window boundary based on input/output channels and segment numbers recorded at the first node. - View Dependent Claims (11, 12, 13)
-
-
14. A computer program product for recovering a failure in a data processing system, the computer program product comprising:
a computer readable storage medium comprising computer usable program code embodied therewith, the computer usable program code comprising; computer usable program code to, when executed by a processor, receive a number of input tasks at a first node of a data processing system, the input tasks comprising a number of input tuples; computer usable program code to, when executed by the processor, for each of the number of input tuples, derive a number of output tuples for a number of output tasks; computer usable program code to, when executed by the processor, generate a number of states for a number of the output tasks; computer usable program code to, when executed by the processor, checkpoint the states and a number of output messages once per window; computer usable program code to, when executed by the processor, emit the output tasks to a second node; and if one of the output tasks fails; computer usable program code to, when executed by the processor, restore a checkpointed state in a last window boundary; and computer usable program code to, when executed by the processor, resend all the input messages received at the second node during the failed window boundary based on input/output channels and segment numbers recorded at the source node and appended to the emitted output tasks. - View Dependent Claims (15, 16)
Specification