High integrity recovery from multi-bit data failures
First Claim
1. A fault-tolerant digital computing system comprising:
- a processor;
a first memory array and a second memory array, wherein each memory array is configured to store data across one or more memory devices;
a databus coupling the processor to each of the memory arrays;
an error detector connected to the processor and the memory arrays on the databus for receiving the data from one of the memory arrays;
a comparator connected to the error detector, the comparator configured to compare each bit of the data from one of the memory arrays to each bit of the corresponding data from the other memory array; and
a control logic module connected to the processor and the memory arrays on the databus, the control logic module configured to correct any errors in the data.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and system for facilitating a computing platform to recover quickly from transient multi-bit data failures within a run-time data memory array in a manner that is transparent to software applications executing on the computing platform. A fault-tolerant digital computing system is provided for that utilizes parallel processing lanes in a lockstep architecture. Each processing lane includes error detectors that are configured to detect multi-bit data errors in each processing lane'"'"'s memory arrays. Upon detection of a multi-bit data failure, an interrupt is generated wherein control logic software responds to the interrupt and corrects the data errors in the memory array of each processing lane.
82 Citations
15 Claims
-
1. A fault-tolerant digital computing system comprising:
-
a processor;
a first memory array and a second memory array, wherein each memory array is configured to store data across one or more memory devices;
a databus coupling the processor to each of the memory arrays;
an error detector connected to the processor and the memory arrays on the databus for receiving the data from one of the memory arrays;
a comparator connected to the error detector, the comparator configured to compare each bit of the data from one of the memory arrays to each bit of the corresponding data from the other memory array; and
a control logic module connected to the processor and the memory arrays on the databus, the control logic module configured to correct any errors in the data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of detecting and correcting multi-bit data failures in data for a fault-tolerant digital computer system, comprising the steps of:
-
generating a first copy of the data at a first processor;
providing the first copy of the data to a first error detector;
generating a second copy of the data at a second processor;
providing the second copy of the data to a second error detector;
performing a bit-by-bit cross comparison of the first and second copies of the data;
detecting a multi-bit data fault with the first or second copy of the data;
providing an interrupt to each of the first and second processors;
storing a fault-free copy of the data in a dedicated memory location, wherein the fault-free copy is created from the first or second copy of the data; and
correcting, in response to the interrupt, the first or second copy of the data using the fault-free data copy. - View Dependent Claims (7, 8, 9)
-
-
10. A system for detecting and correcting multi-bit data failures in data, the system having a first processing lane and a second processing lane, the system comprising:
-
a first processor associated with the first processing lane;
a second processor associated with the second processing lane;
a first memory array connected to the first processor on a databus, the first memory array configured to store a first copy of the data;
a second memory array connected to the second processor on the databus, the second memory array configured to store a second copy of the data;
a first error detector connected to the first processor and to the first and second memory arrays on the databus for receiving the first and second copies of the data;
a second error detector connected to the second processor and to the first and second memory arrays on the databus for receiving the first and second copy of the data;
a first comparator connected to the first error detector, the first comparator configured to compare each bit of the first copy of the data to each bit of the second copy of the data;
a second comparator connected to the second error detector, the second comparator configured to compare each bit of the second copy of the data to each bit of the first copy of the data;
a first control logic module connected to the first processor and to the first memory array on the databus, the first control logic module configured to correct any errors in the first copy of the data; and
a second control logic module connected to the second processor and to the second memory array on the databus, the second control logic module configured to correct any errors in the second copy of the data. - View Dependent Claims (11, 12, 13, 14, 15)
-
Specification