×

Operations controller for a fault tolerant multiple node processing system

  • US 4,914,657 A
  • Filed: 04/15/1987
  • Issued: 04/03/1990
  • Est. Priority Date: 04/15/1987
  • Status: Expired due to Term
First Claim
Patent Images

1. In a fault tolerant multiple node processing system wherein each node has an applications processor for executing a predetermined set of tasks, wherein each task in said predetermined set of tasks is included in the predetermined set of tasks of at least one other node in the processing system and an operations controller for establishing and maintaining its own node in synchronization with every other node in the system, for controlling the operation of its own node, and for selecting from an active task list the tasks to be executed by its own application processor in coordination with all of the other nodes in the system through the exchange of inter-node messages with all of the other nodes in the system, said active task list containing a selected subset of said predetermined set of tasks, the operations controller comprising:

  • a transmitter for transmitting all of the inter-node messages generated by its own operations controller to all the nodes in the system including its own node over a private communication link, said transmitter having an arbitrator for deciding the order in which said inter-node messages are to be transmitted when two or more messages are ready for transmission at the same time;

    a plurality of receivers, each receiver associated with a respective one of said multiple nodes and only receiving messages from that node;

    a message checker connected to said plurality of receivers for checking each received message for physical and logical errors to generate an internal error report containing an error status byte identifying each detected error, said message checker polling each of said receivers to unloadd the received messages in a repetitive sequence;

    a voter subsystem having a voter for voting on the content of all error free messages having a value produced by the execution of the same task in said at least one other node to generate a voted value and a deviance checker for generating an internal error report containing a deviance vector identifying each node which sent a message used in the generaion of said voted value whose value differed from the voted value by more than a predetermined deviance value;

    a fault tolerator connected to said message checker, said voter subsystem and said transmitter for passing all error free messages received from said message checker to said voter subsystem, for generating an inter-node error message containing all of said error reports accumulated by all the subsystems which is sent to all of the nodes in the system by said transmitter, for generating a base penalty count message containing a base penalty count for each node in the system based on the number of errors detected and the severity of the detected errors identified in said internal error reports which is sent to all of the nodes in the system by said transmitter, for globally verifying the base penalty count for each node through the exchange of inter-node base penalty count messages, and for generating a system state vector identifying each node whose base penalty count exceeds a predetermined exclusion threshold;

    a task scheduler connected to said fault tolerator for selecting the next task to be executed by the node'"'"'s own applications processor from said active task list, for replicating the scheduling of other nodes in the system, for maintaining a global data base in the scheduling and execution of tasks by each node through the exchange of task completed/started messages received from the fault tolerator, and for generating an error report identifying each node whose scheduling process differs from the scheduling process replicated for that node, said task scheduler further having meand to reconfigure said active task list in response to said system state vector received from the fault tolerator indicating a change in the number of non-excluded nodes;

    a data memory;

    a task communicator connected to said voter subsystem, said data memory, said task scheduler, the transmitter and the applications processor for storing said voted values received from said voter subsystem in said data memory, for passing the identity of the task selected by the task scheduler to the applications processor, for extracting from said data memory the voted values required for the execution of the selected task and passing them to the applications processor, for generating said task completed/started messages identifying the task just completed and the new task started by the applications processor which is transmitted to all the nodes by said transmitter, and for generating inter-node data value messages containing the data values generated by the applicationsprocessor in the execution of the selected tasks which are also transmitted to all the nodes by said transmitter; and

    a synchronizer connected to said message checker, said task scheduler and said transmitter for synchronizing the operation of its own node with all of the other non-faulty nodes in the system through the exchange of inter-node time-dependent messages, said synchronizer generating a time-dependent message which is transmitted by said transmitter to all the nodes in the system, storing a time stamp signifying the local time which each time-dependent message received from said message checker is and correcting the synchronization of said task scheduler of its own node based on the difference between the time stamp of its own time-dependent message and a voted time stamp derived from the time stamps for all the time-dependent messages received within a predetermined time window.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×