System, method and apparatus for error correction in multi-processor systems
First Claim
1. A method of synchronizing the state of a plurality of computing modules in an electronic system, each computing module having a processor, comprising:
- saving at least a portion of processor state data for each of the plurality of computing modules;
hashing at least the portion of the saved processor state data for each of the plurality of computing modules;
comparing the processor hashes for the processor state data;
determining a majority of computing modules having the same processor state data and at least one minority of computing modules having different processor state data;
re-synchronizing the plurality of computing modules if the majority of computing modules are determined to have the same processor state data, and the minority of computing modules are determined to have different processor state data, wherein re-synchronizing comprises;
sending the saved processor state data from a first majority computing module to a first minority computing module;
confirming that the state data of a second majority computing module is the same as the saved state data from the first majority computing module or the saved state data of the first minority computing module after sending the saved processor state data from the first majority computing module to the minority computing module;
flagging an error if the state data is not the same; and
restoring the processor state data of a majority computing module based on the majority computing module'"'"'s saved processor state data in response to completion of the resynchronization.
9 Assignments
0 Petitions
Accused Products
Abstract
This disclosure provides apparatus, methods and systems for error correction in multi processor systems. Some implementations include a plurality of computing modules, each computing module including a processor. Each processor may include processing state. In some other implementations, each computing module may also include a memory. Upon receiving a signal to perform a partial re-synchronization, a hash of each processor'"'"'s state data may be performed. In some embodiments, a hash of at least a portion of each computing module'"'"'s memory data may also be performed. The hashes for each processor are then compared to determine majority hashes and possible minority hashes. Upon identifying a minority hash, the computing module that produced the minority hash may receive new processing state data from one of the computing modules that produced a majority hash.
-
Citations
30 Claims
-
1. A method of synchronizing the state of a plurality of computing modules in an electronic system, each computing module having a processor, comprising:
-
saving at least a portion of processor state data for each of the plurality of computing modules; hashing at least the portion of the saved processor state data for each of the plurality of computing modules; comparing the processor hashes for the processor state data; determining a majority of computing modules having the same processor state data and at least one minority of computing modules having different processor state data; re-synchronizing the plurality of computing modules if the majority of computing modules are determined to have the same processor state data, and the minority of computing modules are determined to have different processor state data, wherein re-synchronizing comprises; sending the saved processor state data from a first majority computing module to a first minority computing module; confirming that the state data of a second majority computing module is the same as the saved state data from the first majority computing module or the saved state data of the first minority computing module after sending the saved processor state data from the first majority computing module to the minority computing module; flagging an error if the state data is not the same; and restoring the processor state data of a majority computing module based on the majority computing module'"'"'s saved processor state data in response to completion of the resynchronization. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A fault tolerant computing apparatus, comprising:
-
a plurality of computing modules, wherein each computing module comprises a hardware processor having processor state data; a re-synchronization module configured to save at least a portion of processor state data for each of the plurality of computing modules; a hashing module configured to generate hash values of at least the saved processor state data; a fault tolerant checking unit configured to receive the plurality of hash values and re-synchronizing the plurality of computing modules if a majority of computing modules are determined to have the same processor state data, and a minority of computing modules are determined to have different processor state data, and wherein the re-synchronization module is further configured to; send the saved processor state data from a first majority computing module to a first minority computing module; confirm that the state data of a second majority computing module is the same as the saved state data from the first majority computing module or the saved state data of the first minority computing module after sending the saved processor state data from the first majority computing module to the minority computing module; flag an error if the state data is not the same; and restore the processor state data of a majority computing module based on the majority computing module'"'"'s saved processor state data in response to completion of the resynchronization. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A fault tolerant computing apparatus, comprising:
-
a plurality of computing modules, wherein each computing module comprises a processor having processor state data; means for saving at least a portion of processor state data for each of the plurality of computing modules; means for hashing configured to generate hash values of at least the saved processor state data; means for comparing the plurality of hash values; means for determining a majority of computing modules having the same processor state data and at least one minority of computing modules having different processor state data; and means for resynchronizing the plurality of computing modules based on the determining, wherein the means for resynchronizing is configured to; send the saved processor state data from a first majority computing module to a first minority computing module; confirm that the state data of a second majority computing module is the same as the saved state data from the first majority computing module or the saved state data of the first minority computing module after sending the saved processor state data from the first majority computing module to the minority computing module, and flag an error if the state data is not the same; and means for restoring the processor state data of a majority computing module based on the majority computing module'"'"'s saved processor state data in response to completion of the resynchronization. - View Dependent Claims (26, 27, 28)
-
-
29. A non-transitory, computer readable storage medium having instructions stored thereon that cause a processing circuit to perform a method comprising:
-
saving at least a portion of processor state data for each of the plurality of computing modules; hashing at least the saved processor state data for each of a plurality of computing modules; comparing the processor hashes for the processor state data; and determining a majority of computing modules having the same processor state data and at least one minority of computing modules having different processor state data; re-synchronizing the plurality of computing modules based at least on the determining wherein re-synchronizing comprises; sending processor state data from a first majority computing module to a first minority computing module; confirming that the state data of a second majority computing module is the same as the saved state data from the first majority computing module or the saved state data of the first minority computing module after sending the saved processor state data from the first majority computing module to the minority computing module, and flagging an error if the state data is not the same; and restoring the processor state data of a majority computing module based on the majority computing module'"'"'s saved processor state data in response to completion of the resynchronization. - View Dependent Claims (30)
-
Specification