CONTAINMENT AND RECOVERY OF SOFTWARE EXCEPTIONS IN INTERACTING, REPLICATED-STATE-MACHINE-BASED FAULT-TOLERANT COMPONENTS
First Claim
1. A method of error recovery in a replicated state machine, wherein, at a defined time in an operation of the machine, a batch of inputs are input to the machine, and the machine uses a multitude of components for processing said inputs, and wherein during said processing, one of said components generates an exception, the method comprising the steps of:
- after the exception, rolling the state machine back to a defined point in the operation of the machine;
preemptively failing said one of the components;
re-executing the batch of inputs in the state machine;
handling any failure, during said re-executing step, of said one of the components using a defined error handling procedure; and
repeating the rolling, preemptively failing, re-executing and handling steps until the input batch runs to completion without generating any exception in any of the components that are not pre-emptively failed.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and article of manufacture are disclosed for error recovery in a replicated state machine. A batch of inputs is input to the machine, and the machine uses a multitude of components for processing those inputs. Also, during this processing, one of said components generates an exception. The method comprises the steps of after the exception, rolling the state machine back to a defined point in the operation of the machine; preemptively failing said one of the components; re-executing the input batch in the state machine; and handling any failure, during the re-executing step, of the one of the components using a defined error handling procedure. The rolling, preemptively failing, re-executing and handling steps are repeated until the input batch runs to completion without generating any exception in any of the components that are not preemptively failed.
-
Citations
20 Claims
-
1. A method of error recovery in a replicated state machine, wherein, at a defined time in an operation of the machine, a batch of inputs are input to the machine, and the machine uses a multitude of components for processing said inputs, and wherein during said processing, one of said components generates an exception, the method comprising the steps of:
-
after the exception, rolling the state machine back to a defined point in the operation of the machine; preemptively failing said one of the components; re-executing the batch of inputs in the state machine; handling any failure, during said re-executing step, of said one of the components using a defined error handling procedure; and repeating the rolling, preemptively failing, re-executing and handling steps until the input batch runs to completion without generating any exception in any of the components that are not pre-emptively failed. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An error recovery system in a replicated state machine, wherein, at a defined time in an operation of the machine, a batch of inputs are input to the machine, and the machine uses a multitude of components for processing said inputs, and wherein during said processing, one of said components generates an exception, the error recovery system comprising one or more processor units configured for:
-
after the exception, rolling the state machine back to a defined point in the operation of the machine; preemptively failing said one of the components; re-executing the batch of inputs in the state machine; handling any failure, during said re-executing step, of said one of the components using a defined error handling procedure; and repeating the rolling, preemptively failing, re-executing and handling steps until the input batch runs to completion without generating any exception in any of the components that are not preemptively failed. - View Dependent Claims (12, 13, 14, 15)
-
-
16. An article of manufacture comprising:
-
at least one computer usable medium having computer readable program code logic to execute a machine instruction in a processing unit for error recovery in a replicated stat machine, wherein, at a defined time in an operation of the machine, a batch of inputs are input to the machine, and the machine uses a multitude of components for processing said inputs, and wherein during said processing, one of said components generates an exception, said computer readable program code logic, when executing, performing the following steps; after the exception, rolling the state machine back to a defined point in the operation of the machine; preemptively failing said one of the components; re-executing the batch of inputs in the state machine; handling any failure, during said re-executing step, of said one of the components using a defined error handling procedure; and repeating the rolling, preemptively failing, re-executing and handling steps until the input batch runs to completion without generating any exception in any of the components that are not preemptively failed. - View Dependent Claims (17, 18, 19, 20)
-
Specification