Apparatus for redundant operation of modules in a multiprocessing system
First Claim
1. For use in a data processing system in which a number of intelligent nodes are provided in a matrix composed of processor buses with corresponding error-reporting and control lines, and memory buses with corresponding error-reporting and control lines, wherein each node has means for logging errors and reporting errors on said error-report lines, wherein a primary module is connected to a primary node which controls access to a common memory bus, and wherein a shadow module is connected to a shadow node which controls access to said common memory bus, the combination in each of said primary and shadow nodes comprising:
- shadow bit means settable to a first state and to a second state;
first logic means connected to said processor bus lines for receiving data destined for said common memory bus,said first logic means including means for seting said shadow bit means and said married bit means;
married bit means operative when set to a first state for marrying said primary module and said shadow module in a primary/shadow pair such that each module in said primary/shadow pair is activated to receive data directed to said primary/shadow pair; and
,second logic means responsive to said shadow bit means and to said married bit means for causing said primary module to be active for at least the first bus transaction and passive for other predetermined bus transactions upon the condition that said shadow bit means is set to said first state and said married bit is set to said first state,said logic means including means for causing said primary module to be passive for the first bus transaction and active for other predetermined bus transactions upon the condition that said shadow bit means is set to said second state and said married bit is set to said second state, such that each module in said primary/shadow pair alternates with the other module in the handling of a predetermined number of memory bus transactions.
1 Assignment
0 Petitions
Accused Products
Abstract
A number of intelligent nodes (bus-interface units-BIUs and memory-control units-MCUs) are provided in a matrix composed of processor buses (105) with corresponding error-reporting and control lines (106); and memory buses (107) with corresponding error-reporting and control lines (108). Each node (100, 101, 102, 103) has means for logging errors and reporting errors on the error-report lines (106, 108). Processor modules (110) and memory modules (112) are each connected to a node which controls access to a common memory bus (107). Each node includes means (a married bit-170 and a shadow bit-172) for marrying modules in pairs such that each module in the pair tracks the operations directed to the module pair, and each module in the pair alternates with the other module in the handling of requests or replies. Each node registers the ID of the other node in a spouse ID register. Comparison logic (162, 164) in each node resets the married bit upon the condition that the node ID (identifying the node at which the error occurred) in an error-report message is equal to the ID stored in the spouse ID register, thus identifying the spouse node (the partner of the node in which the comparison logic is located) as the source of the error. Resetting the married bit splits apart the primary/shadow pair, so that the error-free module takes over and ceases to alternate with its partner.
-
Citations
7 Claims
-
1. For use in a data processing system in which a number of intelligent nodes are provided in a matrix composed of processor buses with corresponding error-reporting and control lines, and memory buses with corresponding error-reporting and control lines, wherein each node has means for logging errors and reporting errors on said error-report lines, wherein a primary module is connected to a primary node which controls access to a common memory bus, and wherein a shadow module is connected to a shadow node which controls access to said common memory bus, the combination in each of said primary and shadow nodes comprising:
-
shadow bit means settable to a first state and to a second state; first logic means connected to said processor bus lines for receiving data destined for said common memory bus, said first logic means including means for seting said shadow bit means and said married bit means; married bit means operative when set to a first state for marrying said primary module and said shadow module in a primary/shadow pair such that each module in said primary/shadow pair is activated to receive data directed to said primary/shadow pair; and
,second logic means responsive to said shadow bit means and to said married bit means for causing said primary module to be active for at least the first bus transaction and passive for other predetermined bus transactions upon the condition that said shadow bit means is set to said first state and said married bit is set to said first state, said logic means including means for causing said primary module to be passive for the first bus transaction and active for other predetermined bus transactions upon the condition that said shadow bit means is set to said second state and said married bit is set to said second state, such that each module in said primary/shadow pair alternates with the other module in the handling of a predetermined number of memory bus transactions. - View Dependent Claims (2, 3, 4, 5)
-
-
6. In a data processing system including,
a number of bus-interface unit (BIU) nodes and memory-control unit (MCU) nodes and in which a switching matrix provides electrical interconnections between horizontal memory buses and vertical ACD buses connected in said matrix by means of said BIU nodes located at the intersections of said memory and ACD buses, said memory-control unit (MCU) nodes connected to said memory buses, means for detecting an error, occuring in a particular node; -
an error-reporting matrix including horizontal Bus Error Report Lines (BERLs) and vertical Module Error Report Lines (MERLs), said BERLs being associated with said memory buses such that all BIU and MCU nodes sharing an memory bus are connected with a pair of BERLs, said MERLs being associated with said ACD buses such that all nodes sharing an ACD bus are connected with a MERL, and, error-reporting means in said particular node connected to said means for detecting an error, said error-reporting means including means for receiving error messages transmitted over at least said one BERL, and means reporting error messages over at least said one BERL, said error messages identifying the type of error and the location (ID) at which the error was detected, a recovery mechanism in said particular node comprising; permanent error determining means operative upon the condition that an error recurs, for causing said error-reporting means in said particular node to propagate a premanent-error error-report message, said error message identifying the type of error and the location (ID) at which the permanent error was detected; error-report logging means in said particuar node connected to at least one of said error-report lines, for logging received error-report messages propagated to said particular node; first registering means in said node for registering the ID of a resource with which said node is paired to provide a redundant resource; means connected to said first registering means for comparing said location ID in said received error-report message with said registered ID of said resource; and
,means responsive to said comparison means for activating said node upon the condition that said received error-report message identifies, as a faulty resource, said resource with which said node is paired, to thereby cause said node to become active and take over operation for said faulty resource.
-
-
7. In a data processing system including,
a number of bus-interface unit (BIU) nodes and memory-control unit (MCU) nodes and in which a switching matrix provides electrical interconnections between horizontal memory buses and vertical ACD buses connected in said matrix by means of said BIU nodes located at the intersections of said memory and ACD buses, said memory-control unit (MCU) nodes being connected to said memory buses, means for detecting an error, in a particular node; -
an error-reporting matrix including horizontal Bus Error Report Lines (BERLs) and vertical Module Error Report Lines (MERLs), said BERLs being associated with said memory buses such that all BIU and MCU nodes sharing an memory bus are connected to a BERLs, said MERLs being associated with said ACD buses such that all nodes sharing an ACD bus are connected to a MERL, and, error-reporting means in said particuar node connected to said means for detecting an error, said error-reporting means including means for receiving error messages transmitted over at least said one BERL, said error messages identifying an error type and identifying the location (module ID) at which the error was detected, a mechanism in said particular node for allowing two identical modules to duplicate the operation of each other, so that if one module fails the other module can be substituted therefor, comprising; permanent error determining means operative upon the condition that a permanent error type occurs, for causing said error-reporting means in said particular node to propagate a permanent-error type error-report message, said error message identifying thetype of error and the location (module ID) at which the permanent error was detected; error-report logging means in said particular node connected to at least one of said error-report lines, for logging received error-report messages propagated to said particular node, including permanent error-type messages propagated by another node; a shadow register, initially set in said particular node to thereby designate said particular node as the primary node in a primary/shadow pair of nodes comprised of said particular node and said other node; married bit means for storing a married bit, said married bit means being initially set to thereby marry said particular node and said other node as a primary/shadow pair such that each module in said primary/shadow pair is activated to receive data directed to said primary/shadow pair; a logical ID register for registering the physical identification of said particular node; a spouse ID register for registering the physical identification of said other node; and
,means connected to said error-report logging means, to said logical ID register, and to said spouse ID register for comparing said module ID in said error-report message with said spouse ID in said spouse ID register and with said physical ID in said logical ID register, said comparing means including means for resetting said married bit upon the condition that said module ID and said spouse ID match, such that during normal operation the primary/shadow pair operate as a single logical module, to provide a complete and current backup for each other, said primary module in the air being the active module, while said other module is passive, initially, for a first bus transaction, whereby by tracking the active module, the passive module is able to maintain exactly the same state information as the active module, the roles of active and passive module being interchanged after each bus transaction, whereby resetting the married bit will cause a module to be active on every access instead of alternating with its partner so that said active module will mask the failure of its spouse;
said comparing means further including means for disabling said node upon the condition that said module ID in said error-report message and said logical ID match, thus identifying said node as the source of the error.
-
Specification