Fault-tolerant multiprocessor system
First Claim
Patent Images
1. A multiprocessor system, comprising:
- at least three separate processor modules, each processor module including a central processing unit, a main memory and an input/output channel;
at least two interprocessor buses, each bus coupling the separate processor modules to one another to transfer signals and data between the processor modules;
at least two device controllers operable to control data transfer between the processor modules and at least one related peripheral device connected to each device controller, each device controller including at least two separate ports;
a plurality of input/output buses separated from the interprocessor buses, each input/output bus being disposed between the input/output channel of a corresponding processor module and a one of the ports of the device controller so as to connect each device controller with at least two of the processor modules;
means in each processor module for transferring through a one of the interprocessor buses to an associated processor module information associated with an application program resident in the processor modules;
means in each processor module for sending a predetermined control signal to each other processor module through a one of the interprocessor buses;
means in each processor module for detecting when the predetermined control signal has not been received from the other processor module within a predetermined period and determining therefrom that the other processor module has failed;
means in each processor module responsive to the detection of the failure of receipt of the predetermined control signal within the predetermined period for initiating execution of a copy of said application program in the detecting processor module to thereby cause the detecting processor module to take over the work of the determined failed processor module; and
means in each processor module for controlling the related device controller through the input/output bus to control the related peripheral device so as to provide each of the peripheral devices with simultaneous operations.
0 Assignments
0 Petitions
Accused Products
Abstract
In a multiprocessor system interconnected by a bus structure that provides communication and information transfers between the processor modules of the system, each processor broadcasts a central message to all the other processors of the system on a periodic basis. A processor module not receiving the control message from a sending processor module will assume the sending processor module has failed, and operate to take over the task of the failed processor module.
198 Citations
11 Claims
-
1. A multiprocessor system, comprising:
-
at least three separate processor modules, each processor module including a central processing unit, a main memory and an input/output channel; at least two interprocessor buses, each bus coupling the separate processor modules to one another to transfer signals and data between the processor modules; at least two device controllers operable to control data transfer between the processor modules and at least one related peripheral device connected to each device controller, each device controller including at least two separate ports; a plurality of input/output buses separated from the interprocessor buses, each input/output bus being disposed between the input/output channel of a corresponding processor module and a one of the ports of the device controller so as to connect each device controller with at least two of the processor modules; means in each processor module for transferring through a one of the interprocessor buses to an associated processor module information associated with an application program resident in the processor modules; means in each processor module for sending a predetermined control signal to each other processor module through a one of the interprocessor buses; means in each processor module for detecting when the predetermined control signal has not been received from the other processor module within a predetermined period and determining therefrom that the other processor module has failed; means in each processor module responsive to the detection of the failure of receipt of the predetermined control signal within the predetermined period for initiating execution of a copy of said application program in the detecting processor module to thereby cause the detecting processor module to take over the work of the determined failed processor module; and means in each processor module for controlling the related device controller through the input/output bus to control the related peripheral device so as to provide each of the peripheral devices with simultaneous operations.
-
-
2. A multiprocessor system comprising:
-
at least two separate processor modules, each processor module including a central processing unit, a main memory and an input/output channel; interprocessor bus means connecting the separate processor modules to transfer signals and data therebetween; at least one device controller between the processor modules and peripheral devices to control data transfer between each processor module and a related peripheral device, the device controller including at least two separate ports; a plurality of input/output buses, separate from the interprocessor bus means, each input/output bus being disposed between the input/output channel of a processor module and a respective port of the device controller so as to connect one device controller with at least two processor modules; means in at least first and second processor modules for transferring information from the first processor module to the second processor module through the interprocessor bus means, which information is associated with a program resident in the first processor module; means in the first processor module for sending a predetermined control signal to the second processor module through the interprocessor bus means; means in the second processor module for receiving the predetermined control signal from the first processor module, and for detecting, when the predetermined control signal has not been received from the first processor module within a predetermined period, that the first processor module has failed; and means in the second processor module responsive to detection of the failure of the first processor module to initiate execution of a copy of said program, utilizing the transferred information, and to cause the second process module to take over the work of the first processor module. - View Dependent Claims (3)
-
-
4. A multiprocessor system comprising a first and at least a second separate processor module, each processor module comprising a central processing unit, a main memory and an input/output channel;
-
interprocessor bus means connected to the separate processor modules to transfer information between the processor modules; at least one device controller adapted to be connected to peripheral devices to control data transfer between each processor module and the peripheral devices the device controller including at least two separate ports, each port being adapted for connection to a processor module; a plurality of input/output buses separate from the interprocessor bus means, each input/output bus being disposed between the input/output channel of a corresponding processor module and an associated one of the ports of the device controller so as to connect one device controller with at least two processor modules; means in the first and second processor modules for transferring information from the first processor module to the second processor module through the interprocessor bus means, which information is associated with a program resident in the first processor module; means in the first and second processor modules for sending a predetermined control signal from the one processor module to the other processor module through the interprocessor bus means; means in each of the first and second processor modules for receiving the predetermined control signal from the other processor module and for detecting, when the predetermined control signal has not been received from the other processor module within a predetermined period, that the other processor module has failed; and means in each of the first and second processor modules responsive to detection of the failure of the other processor module to initiate execution of a copy of said program, utilizing the transferred information, and to take over the work of the failed other processor module.
-
-
5. A multiprocessor system, comprising:
-
a first and at least a second separate processor module, each processor module including a central processing unit; interprocessor bus means connected to the separate processor modules to transfer information between the processor modules; means in the first and second processor modules for transferring information from the first processor module to the second processor module through the interprocessor bus means, which information is associated with an active program resident in the first and a copy of said active program that is resident in at least the second processor module; means in the first and second processor modules for sending through the interprocessor bus means, at intervals, from each processor module to each other processor module, a predetermined control signal indicative of continued operation of the sending processor; means in each of the first and second processor modules for receiving the predetermined control signal from the sending processor module and for detecting, when the predetermined control signal has not been received from the sending processor module within a predetermine period that the sending processor module has failed; and means in the first and second processor modules responsive to detection of the failure of the sending processor module for informing said copy of said active program that the sending processor module has failed, and to initiate execution of said copy of said active program, utilizing the transferred information, and to take over the work of the failed sending processor module. - View Dependent Claims (6)
-
-
7. A method for providing fault tolerant operation of a multiprocessor computer system wherein the multiprocessor computer system comprises a first and at least a second separate processor module and each processor module includes a central processing unit, comprising the steps of:
-
connecting together the separate processor modules to transfer information between the processor modules; transferring information from the first to the second processor module, which information is associated with an active program resident in the first and an inactive copy of said active program that is resident in at least the second processor module; sending, at intervals, from each processor module to each other processor module, a predetermined control signal indicative of continued operation of the sending processor module; receiving at each of the first and second processor modules, the predetermined control signal from the other processor module; detecting, when the predetermined control signal has not been received from the first processor module within a predetermined period, that the first processor module has failed; and initiating, responsive to detection of the failure of the first processor module, execution of said inactive copy of said active program in the second processor module utilizing the transferred information to thereby cause the second processor module to take over the work of the failed first processor module. - View Dependent Claims (8)
-
-
9. A multiprocessor system comprising at least three separate processor modules, each processor module including a central processing unit, a main memory and an input/output channel,
at least two interprocessor buses, each bus connected to the separate processor modules to transfer signals and data between the processor modules; -
at least two device controllers between the processor modules and peripheral devices to control data transfer between a processor module and a related peripheral device, each device controller including at least two separate ports each coupled to corresponding ones of the processor modules separately; a plurality of input/output buses, separate from the interprocessor buses, each input/output bus being disposed between the input/output channel of a processor module and a respective port of a device controller so as to connect each device controller with at least two processor modules; means in at least first and second processor modules for transferring information from the first processor module to the second processor module through one of the interprocessor buses which information is associated with a program resident in a first and a copy of said program resident in at least a second processor module; means in each processor module for sending a predetermined control signal from each processor module to the other processor modules through an interprocessor bus; means in at least two of the processor modules for receiving the predetermined control signal from each other and for detecting, when the predetermined control signal has not been received from the first processor module of the two within a predetermined period, that the first processor module of the two has failed; and means in the at least two of the processor modules responsive to detection of the failure of the first processor module of the two for informing said copy of said program in the second processor module of the two that the first has failed, thereby to initiate execution of said copy of said program by the second processor module of the two, utilizing the transferred information and to cause the second to take over the work of the failed first processor module.
-
-
10. A multiprocessor system, comprising:
-
a plurality of separate processor modules, each processor module including means for formulating messages; interprocessor bus means connected to the separate processor modules to transfer the messages between the processor modules; means in each of the processor modules for transferring certain ones of the messages from such processor module to at least another of the processor modules through the interprocessor bus means, which certain ones of the messages contain information associated with an active program being executed in such processor module for an inactive copy of said active program that is resident in the another of the processor modules; means in each of the processor modules for sending through the interprocessor bus means, at intervals, from each processor module to each other processor module, a control message containing information indicative of continued operation of the sending processor; means in each of the processor modules for receiving the control message from the sending processor module and for detecting, when the control message has not been received from the sending processor module within a predetermined period that the sending processor module has failed; and means in each of the processor modules responsive to detection of the failure of the sending processor module for informing said inactive copy of said active program that the sending processor module has failed, and utilizing the transferred information to initiate execution of said inactive copy of said active program to take over the work of the failed sending processor module.
-
-
11. A multiprocessor system, comprising:
-
a first and at least a second separate processor module, each processor module including a central processing unit; interprocessor bus means connecting the separate processor modules to one another for transferring information therebetween in the form of messages; means in the first and second processor modules for transferring messages of a first type from the first processor module to the second processor module through the interprocessor bus means, which first type messages contain information associated with an active program being executed in the first processor module for an inactive copy of said active program that is resident in at least the second processor module; means in the first and second processor modules for sending through the interprocessor bus means, at intervals, from each processor module to each other processor module, a message of a second type that is indicative of continued operation of the sending processor; means in each of the first and second processor modules for receiving the second type message from the sending processor module and for detecting, when the second type message has not been received from the sending processor module within a predetermined period, that the sending processor module has failed; and means in each of the first and second processor modules responsive to detection of the failure of the sending processor module for initiating execution of said inactive copy of said active program, utilizing the transferred information to take over the work of the failed sending processor module.
-
Specification