Fault-tolerant multi-computer system
First Claim
1. A fault-tolerant multi-computer system architecture, responsive to intercomputer messages and to inputs from external sources for executing a predetermined set of tasks to produce an output to at least one external device, comprising:
- a plurality of computers for collectively executing the predetermined set of tasks in a coordinated manner to produce outputs to the at least one external device in response to the inputs from the external sources and the intercomputer messages, each of said computers having an assigned subset of the tasks which it is capable of selecting and executing in a predetermined order of priority, each task in said predetermined set of tasks being included in more than one of said assigned subset so that each task is capable of being selected and executed by more than one computer; and
a like plurality of communication links, one associated with each computer, each communication link transmitting only the intercomputer messages sent by the associated computer to all of the computers which require any message generated by the associated computer; and
wherein each of said plurality of computers comprises;
operations controller means for controlling the operation of its own computer in coordination with like operations controllers in the other computers, each operations controller including;
receiver means for receiving intercomputer messages, fault handler means for checking said intercomputer messages to detect the faulty operation of any computer in the system, and to exclude from further processing the messages received from faulty computers, scheduler means responsive to the receipt of all the data variables for the execution of at least one of its assigned tasks for selecting from its assigned subset the tasks to be executed, task communicator means responsive to the task selected by the scheduler means for assembling the data variables required for the execution of the selected task and transmitter means responsive to said fault handler means, said scheduler means and said task communicator means for sending intercomputer messages to all of the computers in the system, said messages containing an identification of the faulty computers, identification of the tasks it has selected, and the values of the data variables resulting from the execution of the selected tasks required for the execution of a subsequent task; and
applications computer means for executing the tasks selected by said scheduler means using the data assembled by said task communicator means.
1 Assignment
0 Petitions
Accused Products
Abstract
A Fault-Tolerant Multi-Computer System for control applications is disclosed. The system has a plurality of Computers (10a-10n), each having an assigned set of tasks which it is capable of executing. No one Computer in the system acts as a master and no one Computer executes all of the tasks. Communication between the Computers is by individual communication links (16, 18, 20) over which each Computer sends information directly to all other Computers in the system. Each Computer comprises an Applications Computer (100) and an Operations Controller (200). The Operations Controller receives messages over the communication links and selects, from the assigned tasks, the tasks to be performed by the associated Applications Computer. Each Operations Controller includes a fault handler which checks the messages received from the other Computers. The fault handlers send and receive error messages, over the communication links, to assist in the identification of a faulty Computer. Subsequent messages from the Computers deemed to be faulty are ignored, and the tasks assigned to the faulty Computer are executed by alternate Computers in the system.
-
Citations
52 Claims
-
1. A fault-tolerant multi-computer system architecture, responsive to intercomputer messages and to inputs from external sources for executing a predetermined set of tasks to produce an output to at least one external device, comprising:
-
a plurality of computers for collectively executing the predetermined set of tasks in a coordinated manner to produce outputs to the at least one external device in response to the inputs from the external sources and the intercomputer messages, each of said computers having an assigned subset of the tasks which it is capable of selecting and executing in a predetermined order of priority, each task in said predetermined set of tasks being included in more than one of said assigned subset so that each task is capable of being selected and executed by more than one computer; and a like plurality of communication links, one associated with each computer, each communication link transmitting only the intercomputer messages sent by the associated computer to all of the computers which require any message generated by the associated computer; and wherein each of said plurality of computers comprises; operations controller means for controlling the operation of its own computer in coordination with like operations controllers in the other computers, each operations controller including;
receiver means for receiving intercomputer messages, fault handler means for checking said intercomputer messages to detect the faulty operation of any computer in the system, and to exclude from further processing the messages received from faulty computers, scheduler means responsive to the receipt of all the data variables for the execution of at least one of its assigned tasks for selecting from its assigned subset the tasks to be executed, task communicator means responsive to the task selected by the scheduler means for assembling the data variables required for the execution of the selected task and transmitter means responsive to said fault handler means, said scheduler means and said task communicator means for sending intercomputer messages to all of the computers in the system, said messages containing an identification of the faulty computers, identification of the tasks it has selected, and the values of the data variables resulting from the execution of the selected tasks required for the execution of a subsequent task; andapplications computer means for executing the tasks selected by said scheduler means using the data assembled by said task communicator means. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. A method for controlling the operation of each computer in a fault tolerant multiple computer system wherein the system includes a communication network whereby each computer can send error, data value, task selection and task completed/started messages to every other computer in the system and each computer has an operations controller and applications computer and wherein each computer further has a set of tasks it is capable of selecting and executing, comprising the steps of:
-
checking for error with said operations controller all messages received by each computer from all the computers in system to generate an error signal identifying each computer which sent the message containing an error; sending error messages to all of the other computers in response to said error signals identifying as a faulty computer each computer which sent a message containing an error; recording in said operations controller as faulty each computer which sent a message containing an error in response to said error signals and each computer identified in error messages received from a predetermined number of other computers to generate a fault status table; discarding all messages containing errors and messages received from computers recorded as faulty in said fault status table; recording in a status table contained within said operations controller the status information contained in the task data value, task selection and task completed/started messages which were not discarded, said status table listing said tasks and their associated status information in their order of execution priority; detecting when all the information required for the execution of any task is recorded in said status table to generate an dispatch signal; recording as unselected in said status table, each task which was previously recorded as selected by a computer which has subsequently been recorded as faulty in said fault status table; generating by said operations controller a dispatch signal to signify the tasks unselected are ready for execution; selecting by said operations controller from said status table the highest priority task ready for execution and not selected by another computer in response to said dispatch signal; recording in said operations controller the selected task as selected in said task status table; sending to all of the computers a task selected message containing the identity of the task selected; generating in said operations controller a release task signal containing the identify of the selected task in response to the computer signifying it has completed the execution of a preceeding task; recording in a data values table contained within said operations controller the value of the data variable contained in each non discarded data value message received from all the computers; communicating from said data values table to the applications computer the value of the data variables required for the execution of the selected task in response to said release task signal; executing in the applications computer the selected task using the communicated values of the data variables to generate values for new data variables; sending to all of the computers by said operations controller, data value messages containing the values of the new data variables received from the applications computer; and sending to all of the computers by said operations controller a task completed/started message identifying the computer, the task completed, and the new task started by the identified computer after the execution of the task is completed. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
-
Specification