Fault-tolerant computer system with /CONFIG filesystem
First Claim
1. A method of operating a computer system, the computer system including multiple units including at least one central processing unit (CPU), at least one memory unit, and at least one input/output (I/O) unit, comprising the steps of:
- creating a filesystem within the computer system, the filesystem having a directory tree which has an entry for each of said multiple units;
removing at least one of said multiple units from said computer system while maintaining the filesystem within the computer system, and correspondingly deleting the filesystem entry for any removed unit; and
adding at least one new unit to replace said at least one of said multiple units that was removed from said computer system while said computer system is continuing to operate, and adding at least one new entry to said filesystem, said at least one new entry corresponding to said at least one new unit.
0 Assignments
0 Petitions
Accused Products
Abstract
A fault-tolerant computer system employs a pseudo-filesystem to dynamically manage the hardware components. A directory which appears as a standard, hierarchical directory in this filesystem contains a file for each component; each file maps to either a hardware component or a software module. The pseudo-filesystem hierarchy is determined during system initialization and is automatically updated whenever the software or hardware configuration changes. The pseudo-filesystem, called /config filesystem herein, is implemented as a Unix filesystem in the Unix filesystem switch. This pseudo-filesystem method may be implemented in a fault-tolerant, redundant computer system configuration having multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. The system detects faults in the CPUs and memory modules, and places a faulty unit offline while continuing to operate using the good units. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references are voted at the three separate ports of each of the memory modules.
211 Citations
18 Claims
-
1. A method of operating a computer system, the computer system including multiple units including at least one central processing unit (CPU), at least one memory unit, and at least one input/output (I/O) unit, comprising the steps of:
-
creating a filesystem within the computer system, the filesystem having a directory tree which has an entry for each of said multiple units; removing at least one of said multiple units from said computer system while maintaining the filesystem within the computer system, and correspondingly deleting the filesystem entry for any removed unit; and adding at least one new unit to replace said at least one of said multiple units that was removed from said computer system while said computer system is continuing to operate, and adding at least one new entry to said filesystem, said at least one new entry corresponding to said at least one new unit. - View Dependent Claims (2, 3, 4)
-
-
5. A method of operating a computer system, comprising the steps of:
-
executing a same instruction stream by a plurality of central processing (CPU) units; accessing by said CPU units a plurality of memory units each storing an identical copy of data, and a plurality of redundant input/output (I/O) units; creating a filesystem within the computer system, the filesystem having a directory tree with an entry for each of said CPU units, each of said memory units and each of said I/O units; removing at least one of said CPU units, or at least one of said memory units, or at least one of said I/O units from said system while maintaining the filesystem within the computer system, and correspondingly deleting the filesystem entry for any removed unit; and adding at least one new unit to replace said at least one of said CPU units, memory units, or I/O units that was removed from said system while said CPU units are continuing to execute said instruction stream, and adding at least one new entry to said filesystem, said at least one new entry corresponding to said at least one new unit. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A computer system comprising:
-
a) first, second and third central processing unit (CPU) units each having an address range and each executing a same instruction stream, each of said CPU units having a separate memory access port, wherein a failed one of said first, second and third CPU units is placed off-line and a remaining two of said first, second and third CPU units continue to execute said same instruction stream; b) first and second memory units having identical address spaces within the address range of said CPU units for storing duplicative data to be accessed by said CPU units, each of said first and second memory units having first, second and third input/output ports coupled to said memory access ports of said first, second and third CPU units, respectively, wherein a failed one of said first and second memory units is placed off-line and a remaining one of said first and second memory units continues to be accessed by said CPU units; and c) a filesystem storing a directory having an entry for each one of said CPU units and memory units which is currently operating. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A fault-tolerant computer system, comprising:
-
a) first, second and third central processing unit (CPU) units having similar interfaces and capable of executing an identical instruction set, said first, second and third CPUs executing a same instruction stream, wherein a failed one of said first, second and third CPUs is placed off-line and a remaining two of said first, second and third CPUs continue to execute said same instruction stream; b) first and second memory units having similar interfaces, said first and second memory units storing a same data, wherein a failed one of said first and second memory units is placed off-line; c) busses coupling each of the first, second and third CPU units individually to each of said first and second memory units wherein said first, second and third CPU units access said first and second memory units via the busses separately and in duplicate; d) a first input/output bus coupled to said first memory unit and a second input/output bus coupled to said second memory unit; e) a first input/output processor coupled to both said first and second input/output busses, and a second input/output processor coupled to both said first and second input/output busses; f) a filesystem having a directory with corresponding entries for each of said CPU units, each of said memory units, and each of said input/output processors which is currently operating. - View Dependent Claims (18)
-
Specification