Operations controller for a fault-tolerant multiple computer system
First Claim
1. In a fault tolerant multiple computer system wherein the multiple computer system is capable of executing, in a coordinated manner, a predetermined set a task in response to inputs from external sources to produce an output to at least one external device, and wherein each computer includes an application computer capable of executing a subset of the predetermined set of tasks, and wherein the subset of tasks for each computer is different, and contains at least one predetermined input/output task for inputting new data from an external source or outputing data to the at least one external device, an operations controller for each computer in the system for selecting the tasks to be executed by the associated applications computer and for sending messages to every other computer;
- said messages including task selected messages identifying the tasks it has selected, task data valve messages containing the values of the data variables resulting from the tasks executed by its applications computer, and error messages containing the identification of the computers which sent messages having detected errors, each operations controller comprising;
fault handler means responsive to messages received from all the computers in the system for checking each message received to identify faulty messages, to record as faulty in a fault state table the computers sending messages identified as being faulty, and to record as faulty in said same fault state table each computer identified as being faulty in said error messages received from a predetermined number of other computers, for passing on for further processing the error-free messages received from non-faulty computers and for sending said error messages to the other computers identifying each computer it has recorded to be faulty in said fault state table;
scheduler means storing said subset of tasks in their order of priority and their execution status in response to the error-free task selected and data value messages passed by the fault handler for selecting for execution from said stored subset of tasks said input/output task at predetermined intervals and for selecting in the interval between the selection of said input/output tasks the highest priority unselected task ready for execution, for generating a dispatch signal identifying the task selected for execution, and for sending said task selected messages to all of the computers identifying the tasks it has selected for execution; and
task communicator means interfacing said scheduler means and the associated applications computer for storing the values of the data variables contained in the error-free data value messages passed by the fault handler, for assembling the values of the data variables required for the execution of the selected task in response to said dispatch signal, for sending to the applications computer the assembled values of the data variables in response to the applications computer signifying it has completed the previous task, for sending to all of the computers data value messages containing the values of the data variables resulting from the execution of the selected task by the associated applications computer and for receiving from said external sources and sending to said at least one external device the data values resulting from the execution of said input/output task.
1 Assignment
0 Petitions
Accused Products
Abstract
An operations controller for each computer in a multiple computer system is disclosed. Each operations controller controls the operations of its associated computer, so that all of the computers cooperate to perform system functions in a fault-tolerant manner. Each operations controller comprises a fault handler (204), a scheduler (206), a task communicator (208), plus a transmitter (212) and requisite receivers (202) which receive and send messages to all the other computers in the system. The fault handler (204) checks each message received and decides which computers are operating correctly and which are faulty. A scheduler (206) selects each task its own computer will execute, from the tasks assigned to its own computer. A task communicator (208) assembles the data values required for the execution of the selected task, and forwards this data to the computer for execution. The operations controller sends messages to all of the other computers in the system, informing them of which computer it deems to be faulty, the tasks it selects to execute, when it starts and completes the execution of each selected task, and the data variable values produced by the execution of each task.
-
Citations
35 Claims
-
1. In a fault tolerant multiple computer system wherein the multiple computer system is capable of executing, in a coordinated manner, a predetermined set a task in response to inputs from external sources to produce an output to at least one external device, and wherein each computer includes an application computer capable of executing a subset of the predetermined set of tasks, and wherein the subset of tasks for each computer is different, and contains at least one predetermined input/output task for inputting new data from an external source or outputing data to the at least one external device, an operations controller for each computer in the system for selecting the tasks to be executed by the associated applications computer and for sending messages to every other computer;
- said messages including task selected messages identifying the tasks it has selected, task data valve messages containing the values of the data variables resulting from the tasks executed by its applications computer, and error messages containing the identification of the computers which sent messages having detected errors, each operations controller comprising;
fault handler means responsive to messages received from all the computers in the system for checking each message received to identify faulty messages, to record as faulty in a fault state table the computers sending messages identified as being faulty, and to record as faulty in said same fault state table each computer identified as being faulty in said error messages received from a predetermined number of other computers, for passing on for further processing the error-free messages received from non-faulty computers and for sending said error messages to the other computers identifying each computer it has recorded to be faulty in said fault state table; scheduler means storing said subset of tasks in their order of priority and their execution status in response to the error-free task selected and data value messages passed by the fault handler for selecting for execution from said stored subset of tasks said input/output task at predetermined intervals and for selecting in the interval between the selection of said input/output tasks the highest priority unselected task ready for execution, for generating a dispatch signal identifying the task selected for execution, and for sending said task selected messages to all of the computers identifying the tasks it has selected for execution; and task communicator means interfacing said scheduler means and the associated applications computer for storing the values of the data variables contained in the error-free data value messages passed by the fault handler, for assembling the values of the data variables required for the execution of the selected task in response to said dispatch signal, for sending to the applications computer the assembled values of the data variables in response to the applications computer signifying it has completed the previous task, for sending to all of the computers data value messages containing the values of the data variables resulting from the execution of the selected task by the associated applications computer and for receiving from said external sources and sending to said at least one external device the data values resulting from the execution of said input/output task. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- said messages including task selected messages identifying the tasks it has selected, task data valve messages containing the values of the data variables resulting from the tasks executed by its applications computer, and error messages containing the identification of the computers which sent messages having detected errors, each operations controller comprising;
-
31. A method for controlling a plurality of computers to execute, in a coordinated fault tolerant manner, a predetermined set of tasks in response to inputs from external sources to produce an output to at least one external device, and wherein each computer includes an application computer capable of executing a subset of the predetermined set of tasks, and wherein the subset of tasks for each computer is different and includes at least one input/output task for receiving data variables from said external sources and outputing data variables to said at least one external device, and wherein said plurality of computers sends messages to every other computer including task selected messages identifying the tasks it has selected, task data value messages containing the values of the data variables resulting from the executed tasks, and error messages containing the identification of the computers which sent messages having detected errors, said method comprising the steps of:
-
checking all the messages received from said plurality of computers to detect any errors in said messages and to generate error signals identifying each computer which sent a message having a detected error; counting the number of computers which sent said error messages identifying each computer which sent a message having a detected error; comparing said number with a predetermined number to generate said error signal identifying each computer which sent a message having an error detected by a number of computers greater than said predetermined number; recording as faulty, the identify of each computer identified in said error signals to generate a fault status table storing the identify of each computer which sent a message containing an error; discarding all messages received from computers recorded as faulty in said fault status table; storing by each computer its own subset of tasks in a task status table, said task status table listing each task of said subset in its order of execution priority; recording in said task status table the receipt of each data variable required for the execution of each task in response to said task data value messages; setting a task ready indicator signifying the task is ready for execution in response to the recording in said task status table all data variables required for the execution of that task; setting in said task status table task selected indicators signifying the task has been selected by another computer in response to said task selected message received from other computers said task selected indicators further identifying the computer that selected the task; selecting for execution from said task status table said at least one input/output task at predetermined intervals; selecting for execution from said task status table in the interval between the selection of said input and output tasks the highest priority unselected task having its task ready indicator set; setting the task selected indicator associated with its own computer in response to the selection of a task; generating a dispatch signal identifying the selected task in response to selecting said task; sending to all of the computers said task selected message identifying the selected task in response to selecting said task; storing the values of the data variables input in response to said input/output tasks and the data variables contained in said data value messages to generate a data values table; assembling the values of the data variables stored in the data values table for the selected task in response to said dispatch signal to generate a task input table; forwarding to the applications computer the values of the data variables stored in said task input table when the applications computer signifies it has completed a preceeding task; storing the values of the data variables generated by the applications computer in the execution of the selected task; sending to all of the computers the values of the data variables stored in the task output table in response to the applications computer signifying it has completed the execution of the task; and repetitively executing all of the above steps. - View Dependent Claims (32, 33, 34, 35)
-
Specification