Method and system for automatic fault detection and recovery in a data processing system
First Claim
1. A method of automatic fault detection and recovery in a data processing system, said data processing system comprising a central control unit, a plurality of system units, a bus, and a plurality of associated coupler units connecting said plurality of system units and said central control unit to said bus, wherein said central control unit comprises a watchdog circuit and a primary processor unit coupled to a reconfiguration module, and each of said plurality of coupler units further includes a corresponding, redundant backup coupler, said method comprising the steps of:
- providing a second processor in parallel with said primary processor unit, said second processor not being in service while said primary processor unit is in service;
providing a backup reconfiguration module for said reconfiguration module, said backup reconfiguration module not being in service while said reconfiguration module is in service;
providing a backup bus for said bus;
providing at least one backup system unit for said plurality of system units;
coupling said reconfiguration module to said plurality of system units independently of said bus;
monitoring the operation of said central control unit and said plurality of system units;
upon detection of a fault, said watchdog circuit generating an alarm signal indicative of which component has malfunctioned within said data processing system;
classifying a fault type and severity based on said generated alarm signal; and
selecting one of a plurality of fault recovery options based on the type and severity of said detected fault.
0 Assignments
0 Petitions
Accused Products
Abstract
In the event of failure of a processor module (PMI) which is part of a data processing system the processor module (III.1) is turned off and then on; preferably, complete reconfiguration (III.2) of the module is commanded only if at least one other fault, the number of which can be chosen at will, is detected in a given time (Tmax) from the first fault replacing it with another available cold redundant processor module.
52 Citations
29 Claims
-
1. A method of automatic fault detection and recovery in a data processing system, said data processing system comprising a central control unit, a plurality of system units, a bus, and a plurality of associated coupler units connecting said plurality of system units and said central control unit to said bus, wherein said central control unit comprises a watchdog circuit and a primary processor unit coupled to a reconfiguration module, and each of said plurality of coupler units further includes a corresponding, redundant backup coupler, said method comprising the steps of:
-
providing a second processor in parallel with said primary processor unit, said second processor not being in service while said primary processor unit is in service; providing a backup reconfiguration module for said reconfiguration module, said backup reconfiguration module not being in service while said reconfiguration module is in service; providing a backup bus for said bus; providing at least one backup system unit for said plurality of system units; coupling said reconfiguration module to said plurality of system units independently of said bus; monitoring the operation of said central control unit and said plurality of system units; upon detection of a fault, said watchdog circuit generating an alarm signal indicative of which component has malfunctioned within said data processing system; classifying a fault type and severity based on said generated alarm signal; and selecting one of a plurality of fault recovery options based on the type and severity of said detected fault. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A centralized data processing system located in a hostile environment comprising:
-
a plurality of processor control modules; a plurality of buses; a plurality of system units including at least one backup system unit; a plurality of couplers including a plurality of backup couplers which connect said plurality of system units to said plurality of processor control modules via one of said plurality of buses; and a plurality of reconfiguration modules for replacing said plurality of couplers with said plurality of backup couplers independent of said plurality of processor control modules, each reconfiguration module of said plurality of reconfiguration modules comprising; a data transfer bus; a self-test unit connected to said data transfer bus; a processor module reconfiguration unit connected to said self-test unit; a first test verification unit connected to said processor module reconfiguration unit; a coupler reconfiguration unit connected to said data transfer bus; a second test verification unit connected to said coupler reconfiguration unit; an interface module connected to said data transfer bus; and an output means connected to each reconfiguration module of said plurality of reconfiguration modules and each coupler of said plurality of couplers. - View Dependent Claims (29)
-
Specification