Distributed fault isolation and recovery system and method
First Claim
1. In a distributed processing system of the type including a plurality of modules, a subsystem for isolating faults within said system and recovering said system to optimized operation, said subsystem comprising:
- at least some of some modules being active fault recovery modules including fault detecting means for initializing a fault check routine and sensing faults within said system including faults within a respective module;
voting means associated with each of said active module for placing a vote during each said fault check routine in response to a detected fault;
collective vote determining means for recording the votes of said active modules after each said fault check routine;
means for cooperatively intercoupling each of said voting means and said collective vote determining means; and
recovery sequence initializing means associated with each active module for initializing a fault isolation and recovery sequence in response to a predetermined number of consecutive collective votes exceeding a predetermined value.
1 Assignment
0 Petitions
Accused Products
Abstract
There is disclosed a system and a method for isolating faults and recovering a distributed system of the type including a plurality of modules to optimized operation. At least some of the modules are active fault recovery modules and include fault detecting means for initializing a fault check routine and sensing faults within the distributed system. Voting means are associated with each active module for placing a vote during each fault check routine in response to a detected fault. Collective vote determining means record the votes of the active modules after each fault check routine and recovery sequence initializing means initializes a fault isolation and recovery sequence in response to a given number of consecutive collective votes exceeding a predetermined value.
-
Citations
29 Claims
-
1. In a distributed processing system of the type including a plurality of modules, a subsystem for isolating faults within said system and recovering said system to optimized operation, said subsystem comprising:
-
at least some of some modules being active fault recovery modules including fault detecting means for initializing a fault check routine and sensing faults within said system including faults within a respective module; voting means associated with each of said active module for placing a vote during each said fault check routine in response to a detected fault; collective vote determining means for recording the votes of said active modules after each said fault check routine; means for cooperatively intercoupling each of said voting means and said collective vote determining means; and recovery sequence initializing means associated with each active module for initializing a fault isolation and recovery sequence in response to a predetermined number of consecutive collective votes exceeding a predetermined value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of isolating faults and recovering a distributed processing system including a plurality of modules to optimized operation, said method comprising the steps of:
-
periodically detecting for faults within said system at given ones of said modules; generating a vote a those given ones of said modules detecting a fault within said system, including a fault detected at a respective module; collecting said votes such that a collective vote is generated from the respective votes; and initializing a recovery sequence when a predetermined consecutive number of said collective votes exceed a predetermined value. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification