Fault tolerance and failover using active copy-cat
First Claim
1. A computer-implemented method of providing fault tolerant operation to a primary instance, the method comprising:
- providing a backup instance to which a copy of a first transaction transmitted to the primary instance is forwarded, the backup instance operative to process the copy of the first transaction and generate a first backup result based thereon, the first backup result being transmitted as a response to the first transaction when it has been determined, subsequent to the transmission of the first transaction to the primary instance, that the primary instance is unlikely to transmit a first primary result based on the first transaction, and, based thereon, the primary instance has been prevented from completing an external operation upon which the transmission of the first primary result by the primary instance is dependent; and
sending data indicative of a constraint violation to the primary instance in response to an attempt by the primary instance to complete the first transaction, the constraint violation forcing the primary instance into a failure state.
0 Assignments
0 Petitions
Accused Products
Abstract
Fault tolerant operation is disclosed for a primary instance, such as a process, thread, application, processor, etc., using an active copy-cat instance, a.k.a. backup instance, that mirrors operations in the primary instance, but only after those operations have successfully completed in the primary instance. Fault tolerant logic monitors inputs and outputs of the primary instance and gates those inputs to the backup instance once a given input has been processed. The outputs of the backup instance are then compared with the outputs of the primary instance to ensure correct operation. The disclosed embodiments further relate to fault tolerant failover mechanism allowing the backup instance to take over for the primary instance in a fault situation wherein the primary and backup instances are loosely coupled, i.e. they need not be aware that they are operating in a fault tolerant environment.
-
Citations
21 Claims
-
1. A computer-implemented method of providing fault tolerant operation to a primary instance, the method comprising:
-
providing a backup instance to which a copy of a first transaction transmitted to the primary instance is forwarded, the backup instance operative to process the copy of the first transaction and generate a first backup result based thereon, the first backup result being transmitted as a response to the first transaction when it has been determined, subsequent to the transmission of the first transaction to the primary instance, that the primary instance is unlikely to transmit a first primary result based on the first transaction, and, based thereon, the primary instance has been prevented from completing an external operation upon which the transmission of the first primary result by the primary instance is dependent; and sending data indicative of a constraint violation to the primary instance in response to an attempt by the primary instance to complete the first transaction, the constraint violation forcing the primary instance into a failure state. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for providing fault tolerance to a primary instance, the system comprising:
-
a backup instance comprising a first processor operative to duplicate operation of the primary instance and to which a copy of a first transaction, transmitted to the primary instance, is forwarded, the first processor further operative to process the copy of the first transaction and generate a first backup result based thereon and transmit the first backup result as a response to the first transaction and further in response to an indication by a fault detector coupled therewith that, subsequent to the transmission of the first transaction to the primary instance, that the primary instance is unlikely to transmit a first primary result based on the first transaction, and, based thereon, the primary instance has been prevented from completing an external operation upon which the transmission of the first primary result by the primary instance is dependent; and a second processor coupled with the primary instance, wherein the second processor is operative to send data indicative of a constraint violation to the primary instance in response to an attempt by the primary instance to complete the first transaction, the constraint violation forcing the primary instance into a failure state. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for providing fault tolerance to a primary instance, the system comprising:
a backup instance comprising; a first processor; and a first non-transitory memory coupled to the first processor; wherein the first processor is operative to duplicate at least some operation of the primary instance and to which a copy of a first transaction transmitted to a primary instance is forwarded as a result of execution of first logic stored in the non-transitory memory by the first processor, the backup instance operative to process the copy of the first transaction and generate a first backup result based thereon and transmit the first backup result as a response to the first transaction and further in response to an indication provided as a result of execution of second logic stored in the non-transitory memory and executable by the first processor to determine that, subsequent to the transmission of the first transaction to the primary instance, that the primary instance is unlikely to transmit a first primary result based on the first transaction, and, based thereon, the primary instance has been prevented from completing an external operation upon which the transmission of the first primary result by the primary instance is dependent; and a second processor coupled with the primary instance, wherein the second processor is operative to send data indicative of a constraint violation to the primary instance in response to an attempt by the primary instance to complete the first transaction, the constraint violation forcing the primary instance into a failure state.
Specification