Computer system that tolerates transient errors and method for management in a system of this type
First Claim
1. Computer system tolerating transient errors made up by a processing unit, characterised by the fact that it includes:
- at least two processing units (50, 51) with each one including;
a microprocessor (54, 57), a memory (53, 56) protected by a device generating and controlling a code for the detection and correction of errors, a device (55, 58) for monitoring memory accesses, mainly including;
means for segmentation of the memory and the verification of the access rights to each segment (53, 56), means for specific protection of the memory segments (53, 56) allocated to saving the recovery context, means for generating a correction demand signal to the device (52) for controlling the processing units and the inputs/outputs, a centralised control device (52) for the processing units and for inputs/outputs, including;
macro-synchronisation means for the processing units (50, 51), comparison/vote means for the data generated by the processing units (50,51), correction demand means, decision-making means arising from the memory access watch devices (55, 58) means for decision-making so as to initialise a correction phase in the event of an error and means allowing the demand to be transmitted simultaneously to all the processing units (50, 51), means allowing the inputs/outputs to be made. some links (60, 61) respectively linking each processing unit to the processing units and inputs/outputs control device (52)
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention concerns a software system that tolerates transient errors made up by a processing unit, including:
at least two processing units (50, 51) with each one including:
a microprocessor (54, 57),
a memory (53, 56) protected by a device generating and controlling a code for the detection and correction of errors,
a device (55, 58) for monitoring memory accesses,
A centralised control device (52) for the processing units and for inputs/outputs, including:
macro-synchronisation means,
data comparison/vote means,
correction demand means,
decision-making means,
means allowing the inputs/outputs to be made.
Some links (60, 61) respectively linking each processing unit to the device (52) for controlling the processing units and the inputs/outputs.
23 Citations
14 Claims
-
1. Computer system tolerating transient errors made up by a processing unit, characterised by the fact that it includes:
-
at least two processing units (50, 51) with each one including;
a microprocessor (54, 57), a memory (53, 56) protected by a device generating and controlling a code for the detection and correction of errors, a device (55, 58) for monitoring memory accesses, mainly including;
means for segmentation of the memory and the verification of the access rights to each segment (53, 56), means for specific protection of the memory segments (53, 56) allocated to saving the recovery context, means for generating a correction demand signal to the device (52) for controlling the processing units and the inputs/outputs, a centralised control device (52) for the processing units and for inputs/outputs, including;
macro-synchronisation means for the processing units (50, 51), comparison/vote means for the data generated by the processing units (50,51), correction demand means, decision-making means arising from the memory access watch devices (55, 58) means for decision-making so as to initialise a correction phase in the event of an error and means allowing the demand to be transmitted simultaneously to all the processing units (50, 51), means allowing the inputs/outputs to be made. some links (60, 61) respectively linking each processing unit to the processing units and inputs/outputs control device (52) - View Dependent Claims (2, 3, 4)
-
-
5. Process to make a computer system tolerant to transient faults, made up by a processing unit, characterised by the fact that it allows:
-
identical software applications to be run simultaneously on at least two processing units (50, 51) independently and asynchronously, and complying with the following functioning;
the transient errors affecting the memory (53, 56) in the processing units (50, 51) are detected and corrected thanks to the use of a detection and correction code stored in the memory associated to a software scanning task, the proper functioning of the microprocessor (54, 57) of the processing units (50, 51) is verified thanks to a segmentation of the memory associated to monitoring of the memory accesses which ensures that the microprocessor really holds the access rights for the current segment of the memory (53, 56), the memory segments allocated to saving the recovery context are extremely secure thanks to specific monitoring of the memory accesses so as to ensure that a faulty microprocessor (54, 57) cannot generate any error in these critical zones, a correction demand is transmitted to the control function for the processing units and the inputs/outputs in the event of a violation of the access rights, the following operations to be centralised in the control function for the processing units and the inputs/outputs, macro-synchronisation of the different simultaneous executions of the software, comparison/vote of all the data generated by the different executions of the software, reception of the correction demands arising from the memory access watch functions following an error detection, when an error is detected, whatever its source may be, decision-making in order to initialise a correction phase and transmission of this demand simultaneously to the different executions of the software, performing the inputs/outputs upon demand from the software applications, the interface to be made between the software programs being executed simultaneously and the control function for the processing units and the inputs/outputs. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification