Fault tolerant data processing system
First Claim
1. In a fault tolerant data processing system of the type in which a first plurality of processing units greater than two concurrently perform identical operations under program control, each processor unit being coupled to hardware including fault tolerant I/O devices and storage, in which the processor states are periodically compared for determining the presence of an error in the processor units, and in which fault tolerant operations are continued so long as two of the units whose states can be compared are error free, in combination therewithan additional plurality of processing units, one for each processing unit in the first plurality, having an architecture different than that of said first plurality of units, said additional units performing identical operations concurrently with each other under program control,means periodically comparing the states of the additional plurality of units to detect errors in the additional plurality of units.means including an application program executing on said first plurality of units for uncoupling each unit in the first plurality from its respective hardware, and coupling each unit in the first plurality to a respective unit in the additional plurality,means controlled by said first plurality of units while uncoupled from its respective hardware, and by said application program, for passing I/O commands and data directly from the additional plurality of units to respective units in the first plurality,means controlled by said first plurality of units and the application program for converting said commands and data to commands executable by and data useable by said first plurality of units to permit the first plurality of fault tolerant processing units to act as I/O controllers for the additional plurality of processing units, andmeans associated with said first plurality of processing units responsive to said periodic comparisons for removing one of the processing units in the additional plurality along with its respective processing units in the first plurality from service upon detection of an error in either the one of the processing units in the additional plurality or its respective unit in the first plurality, permitting continued fault tolerant operation of the other processing units in the first and respective additional pluralities of processing units so long as two units each of said first and respective additional pluralities of units are error free.
2 Assignments
0 Petitions
Accused Products
Abstract
The functions of two virtual operating systems (e.g., S/370 VM, VSE or IX370 and S/88 OS) are merged into one physical system. Partner pairs of S/88 processors run the S/88 OS and handle the fault tolerant and single system image aspects of the system. One or more partner pairs of S/370 processors are coupled to corresponding S/88 processors directly and through the S/88 bus. Each S/370 processor is allocated from 1 to 16 megabytes of contiguous storage from the S/88 main storage. Each S/370 virtual operating system thinks its memory allocation starts at address 0, and it manages its memory through normal S/370 dynamic memory allocation and paging techniques. The S/370 is limit checked to prevent the S/370 from accessing S/88 memory space. The S/88 Operating System is the master over all system hardware and I/O devices. The S/88 processors access the S/370 address space in direct response to a S/88 application program so that the S/88 may move I/O data into the S/370 I/O buffers and process the S/370 I/O operations. The S/88 and S/370 peer processor pairs execute their respective Operating Systems in a single system environment without significant rewriting of either operating system. Neither operating system is aware of the other operating system nor the other processor pairs.
-
Citations
15 Claims
-
1. In a fault tolerant data processing system of the type in which a first plurality of processing units greater than two concurrently perform identical operations under program control, each processor unit being coupled to hardware including fault tolerant I/O devices and storage, in which the processor states are periodically compared for determining the presence of an error in the processor units, and in which fault tolerant operations are continued so long as two of the units whose states can be compared are error free, in combination therewith
an additional plurality of processing units, one for each processing unit in the first plurality, having an architecture different than that of said first plurality of units, said additional units performing identical operations concurrently with each other under program control, means periodically comparing the states of the additional plurality of units to detect errors in the additional plurality of units. means including an application program executing on said first plurality of units for uncoupling each unit in the first plurality from its respective hardware, and coupling each unit in the first plurality to a respective unit in the additional plurality, means controlled by said first plurality of units while uncoupled from its respective hardware, and by said application program, for passing I/O commands and data directly from the additional plurality of units to respective units in the first plurality, means controlled by said first plurality of units and the application program for converting said commands and data to commands executable by and data useable by said first plurality of units to permit the first plurality of fault tolerant processing units to act as I/O controllers for the additional plurality of processing units, and means associated with said first plurality of processing units responsive to said periodic comparisons for removing one of the processing units in the additional plurality along with its respective processing units in the first plurality from service upon detection of an error in either the one of the processing units in the additional plurality or its respective unit in the first plurality, permitting continued fault tolerant operation of the other processing units in the first and respective additional pluralities of processing units so long as two units each of said first and respective additional pluralities of units are error free.
-
3. Data processing apparatus comprising
a fault tolerant data processing system of the type in which a first pair of fault tolerant processors of one architecture perform identical operations concurrently under control of a first operating system, in which a second partner pair of fault tolerant processors of said one architecture perform said identical operations concurrent with the first pair under said first operating system control, in which the processor pairs are coupled to duplicated identical system buses and to paired fault tolerant I/O devices and paired fault tolerant main storage units via said buses for the transfer of identical data between the paired processors, I/O devices and main storage units under said operating system control, in which signals applied to the system buses by each pair of processors are periodically compared for error detection and in which means responsive to the detection of an error removes from service an error causing pair of processors to permit continued operation of the system with the remaining processor pair, an additional first pair and an additional partner pair of processors having an architecture differing from said one architecture, coupled to said system buses, and performing identical operations concurrently with each other under control of a second operating system, additional means periodically comparing signals applied to the system buses by each additional pair of processors for detecting errors; -
means including the processors in the first pair and second partner pair for passing I/O commands and data from the processors in the additional first pair and additional partner pair to respective processors in the first pair and second partner pair in a manner indiscernible to the first operating system; means converting said commands and data to commands executable by and data useable by said processors of the first pair and second partner pair to permit the first pair of processors and the second partner pair of processors to act as I/O controllers for respective ones of the additional first pair of processors and the additional partner pair of processors; and means associated with the processors of said first pair and said second partner pair selectively removing from service one pair of said additional pairs of processors along with its respective pair of processors in the first pair or second partner pair when an error is detected in signals applied by either said one pair of said additional pairs of processors or its respective pair of processors in the first pair or second partner pair to the system buses and continuing operation of the other pairs, thereby rendering said additional pairs of processors fault tolerant. - View Dependent Claims (4)
-
-
5. In a fault tolerant data processing system of the type in which a first pair of fault tolerant processors of one architecture perform identical operations concurrently under program control, in which a second partner pair of fault tolerant processors of said one architecture perform said identical operations concurrent with the first pair under said program control, in which the processors pairs are coupled by associated hardware to duplicated identical system buses and to paired fault tolerant I/O devices and paired fault tolerant main storage units via said buses for the transfer of identical data between the paired processors, I/O devices and main storage units, in which signals applied to the system buses by each pair of processors are periodically compared for error detection and in which means responsive to the detection of an error removes from service an error causing pair of processors to permit continued operation of the system with the other processor pair, in combination therewith
an additional first pair and an additional partner pair of processors having a different architecture, coupled to said system buses, and performing identical operations concurrently with each other under program control, additional means periodically comparing signals applied to the system buses by each additional pair of processors for detecting errors; -
means including an application program in said first pair and said second partner pair of processors for uncoupling the first and second partner pairs from their associated hardware; means controlled by the processors in the first pair and second partner pair while uncoupled from their associated hardware, and said application program, for passing I/O commands and data from the processors in the additional first pair and additional partner pair to respective processors in the first pair and second partner pair; means controlled by said first pair and second partner pair of processors and said application program for converting said commands and data to commands executable by and data useable by said processors of the first pair and second partner pair to permit the first pair of processors and the second partner pair of processors to act as I/O controllers for respective ones of the additional first pair of processors and the additional partner pair of processors; and means associated with the processors of said first pair and said second partner pair selectively removing from service one pair of said additional pairs of processors along with its respective pair of processors in the first pair or second partner pair when an error is detected in signals applied to the system buses by either said one pair of said additional pairs of processors or its respective pair of processors in the first pair or second partner pair and continuing operation of the other additional pair and its respective pair in the first pair or second partner pair. - View Dependent Claims (6)
-
-
7. Data processing apparatus comprising
a fault tolerant data processing system of the type in which a first pair of processors perform identical operations under control of programs having a first instruction architecture, in which the processor pair is coupled to hardware including paired fault tolerant I/O devices and paired fault tolerant main storage units via duplicated system buses identical to each other for transferring identical data between the paired I/O devices and main storage units and between the paired processors and main storage units, selected conditions in the processors being periodically compared with each other for error detection, and in which means responsive to the detection of error in either processor electrically uncouples the pair of processors from the system, an additional pair of processors operating under control of programs having a second instruction architecture differing from said first architecture, the additional processors performing identical operations concurrently with each other under program control; -
means tightly coupling each additional processor with a respective processor of the first pair of processors; means including an application program in said first pair of processors for uncoupling the first pair of processors from said hardware; means controlled by said application program and processors in the first pair while uncoupled from said hardware, for passing I/O commands and data via said coupling means between processors in the first and additional pairs; means controlled by said first pair of processors and an application program for converting I/O commands transferred from the additional pair of processors to the first pair of processors to commands executable by the first pair of processors to permit the first pair of processors to act as I/O controllers for the additional pair of processors; means periodically comparing selected processors in the additional pair for detecting errors; and means associated with the processors of said first pair effective when an error is detected in either the first pair of processors or in the additional pair of processors for removing both pairs of processors from service. - View Dependent Claims (8)
-
-
9. Data processing apparatus comprising
a fault tolerant data processing system of the type in which a first pair of processors perform identical operations under control of programs having a first instruction architecture, in which a second partner pair of processors perform said identical operations concurrent with the first pair under control of said programs, in which the processor pairs are coupled by associated hardware to paired fault tolerant I/O devices and paired fault tolerant main storage units via duplicated system buses for transferring identical data between the paired I/O devices and main storage units and between the paired processors and main storage units, in which selected conditions in each pair of processors and their associated hardware are periodically compared for error detection and in which means responsive to detection of error in either pair of processors and their respective hardware removes that pair of processors from service to permit continued error free operation of the system with the other pair of processors, an additional first pair of processors, associated with said first pair of processors, operating under a second instruction architecture differing from said first architecture, and an additional partner pair of processors associated with said second partner pair of processors, operating under said second architecture, both additional pairs and processors within each additional pair performing identical operations concurrently with each other under program control; -
means periodically comparing selected processor conditions in each additional pair for detecting errors; means coupling each processor of said additional first and additional partner processor pairs with a respective processor of said first and second partner pairs of processors; means including an application program adapted to run on said first pair and said second partner pair of processors for uncoupling each of said first pair and said second partner pair of processors from its respective associated hardware; means controlled by each processor in the first pair and second partner pair and by said application program, while a processor is uncoupled from its respective associated hardware, for passing I/O commands and data via said coupling means between the uncoupled processor and its respective processor in the additional processor pairs; means controlled by said first and second partner pairs and said application program for converting said I/O commands and data passed to said first and second partner pairs of processors to commands executable by and data useable by said first and second partner pairs of processors to permit the first pair of processors and the second partner pair of processors to act as I/O controllers for respective ones of the additional first pair of processors and the additional partner pair of processors; and said removing means being effective when an error is detected by said periodic comparing means in one of said additional pairs of processors for removing said one additional pair of processors and its respective pair in the first and second partner pairs from service in order to permit continued operation of the other additional pair and its respective pair in the first and second partner pairs. - View Dependent Claims (10)
-
-
11. Data processing apparatus comprising
a fault tolerant data processing system of the type in which a first pair of processors perform identical operations under control of a first operating system in which the processor pair is coupled to paired fault tolerant I/O devices and paired fault tolerant main storage units via duplicated system buses identical to each other for transferring identical data between the paired I/O devices, main storage units and processors, selected processor conditions being periodically compared with each other for detecting errors in signals transferred between the processors and buses, and in which means responsive to detection of an error in either processor removes the pair of processors from service, an additional pair of processors operating under control of a second operating system differing from said first operating system, the additional processors performing identical operations concurrently with each other under program control; -
means periodically comparing selected conditions in the additional pair for detecting errors; means for passing I/O commands and data from processors in the additional pair of processors in the first pair in a manner indiscernible by the first operating system to permit the first pair of processors to act as I/O controllers for the additional pair of processors, and said removing means associated with the processors of said first pair being responsive to said periodic comparisons for selectively removing both pairs of processors from service upon the detection of an error in the first pair or the additional pair of processors. - View Dependent Claims (12)
-
-
13. In a fault tolerant data processing system of the type in which a first pair of processors perform identical operations under control of a first operating system, in which a second partner pair of processors perform said identical operations concurrent with the first pair under control of said operating system, in which the processor pairs are coupled to fault tolerant I/O devices and fault tolerant main storage units via duplicated system buses for transferring identical data between the I/O devices, main storage units and processors, in which selected conditions in each pair of processors are periodically compared for detecting errors in signals passed between the processors and buses and in which means responsive to detection of error in either pair of processors removes that pair of processors from service and continues operation of the system with the other pair of processors, the improvement comprising
an additional first pair of processors, associated with said first pair of processors, operating under a second operating system different from said first operating system, an additional partner pair of processors, associated with said second partner pair of processors, operating under said second operating system, both additional pairs, and processors within each additional pair, performing identical operations concurrently with each other under program control; -
means linking each processor of said additional first and partner processor pairs with a respective processor of said first and second partner pairs of processors; means for passing I/O commands and data via said linking means between respective processors in the additional processor pairs and the first pair and second partner pair in a manner indiscernible by the first operating system, means for converting I/O commands and data passed from the additional processor pairs to said first and second partner pairs of processors to commands executable by and data useable by said first and second partner pairs of processors to permit processors of the first and second partner pairs of processors to act as I/O controllers for respective ones of the additional pairs of processors; means periodically comparing selected processor conditions in each additional pair for detecting errors; and means responsive to said periodic comparisons selectively removing from service one of said additional pairs of processors along with its associated pair in the first and second partner pairs upon detection of error in either said one of said additional pairs of processors or its associated pair in the first and second partner pairs and continuing operation of the system with the other additional pair and its associated pair in the first and second partner pairs.
-
-
14. In a System/88 (S/88) fault tolerant data processing system in which a first pair of S/88 processors perform identical operations concurrently under program control, in which a second partner pair of S/88 processors perform said identical operations concurrent with the first pair under said program control, in which the S/88 processors pairs are coupled to hardware including paired S/88 fault tolerant I/O devices and paired S/88 fault tolerant main storage units via duplicated system buses for the transfer of identical data between the paired processors, I/O devices and main storage units under control of a S/88 operating system, in which the processor states in each pair are periodically compared for error detection and in which means responsive to the detection of an error removes an error-causing pair of processors from service and continues operation of the system with the other processor pair, in combination therewith
a first pair of System/370 (S/370) processors and an additional partner pair of S/370 processors, coupled to said system buses, and performing identical operations concurrently with each other under control of a S/370 operating system; -
means including a S/88 application program running on said S/88 processors for uncoupling the S/88 processors from said hardware; means controlled by the S/88 processors while uncoupled from said hardware, and said application program, for passing S/370 I/O commands and data from each S/370 processor to a respective S/88 processor in a manner indiscernible to the S/88 operating system; means controlled by S/88 processors and said S/88 application program for converting said S/370 I/O commands to S/88 commands to permit the S/88 processors to act as I/O controllers for respective ones of the S/370 processors; means periodically comparing the processor states in each pair of S/370 processors for detecting errors; and means associated with the S/88 processors responsive to said periodic comparisons for selectively removing from service one pair of S/370 processors and its respective pair of S/88 processors if an error is detected in said one pair of S/370 processors or its respective pair of S/88 processors and continuing system operation with the other pairs of S/370 and S/88 processors. - View Dependent Claims (15)
-
Specification