Apparatus of fault-handling in a multiprocessing system
First Claim
1. In a data processing system in which a switching matrix provides electrical interconnections between horizontal MACD buses and vertical ACD buses connected in said matrix by means of nodes, a fault-handling mechanism comprising:
- an error-reporting matrix including horizontal Bus Error Report Lines (BERLs) and vertical Module Error Report Lines (MERLs),said BERLs being associated with said MACD buses such that all nodes sharing an MACD bus are connected with a BERL,said MERLs being associated with said ACD buses such that all nodes sharing an ACD bus are connected with a MERL; and
,error-reporting means connected at the intersection of one of said MERLs and one of said BERLs,said error-reporting means including receiving means connected to said MERL for receiving first error messages, said first error messages being transmitted over said one MERL,said error-reporting means further including propagating means connected to said receiving means and said one BERL, responsive to said receiving means for propagating second error messages over said one BERL to other error-reporting means located at said other nodes in said matrix.
1 Assignment
0 Petitions
Accused Products
Abstract
A number of intelligent crossbar switches (100) are provided in a matrix of orthogonal lines interconnecting processor (110) and memory control unit (MCU) modules (112). The matrix is composed of processor buses (105) and corresponding error-reporting lines (106); and memory buses (107) with corresponding error-reporting lines (108). At the intersection of these lines is a crossbar switch node (100). The crossbar switches function to pass memory requests from a processor to a memory module attached to an MCU node and to pass any data associated with the requests. The system is organized into confinement areas at the boundaries of which are positioned error-detection mechanisms to deal with information flow occurring across area boundaries. Each crossbar switch and MCU node has means for the logging and signaling of errors to other nodes. Means are provided to reconfigure the system to reroute traffic around the confinement area at fault and for restarting system operation in a possibly degraded mode.
168 Citations
10 Claims
-
1. In a data processing system in which a switching matrix provides electrical interconnections between horizontal MACD buses and vertical ACD buses connected in said matrix by means of nodes, a fault-handling mechanism comprising:
-
an error-reporting matrix including horizontal Bus Error Report Lines (BERLs) and vertical Module Error Report Lines (MERLs), said BERLs being associated with said MACD buses such that all nodes sharing an MACD bus are connected with a BERL, said MERLs being associated with said ACD buses such that all nodes sharing an ACD bus are connected with a MERL; and
,error-reporting means connected at the intersection of one of said MERLs and one of said BERLs, said error-reporting means including receiving means connected to said MERL for receiving first error messages, said first error messages being transmitted over said one MERL, said error-reporting means further including propagating means connected to said receiving means and said one BERL, responsive to said receiving means for propagating second error messages over said one BERL to other error-reporting means located at said other nodes in said matrix. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. In a data processing system in which a switching matrix provides electrical interconnections between horizonal MACD buses and vertical ACD buses connected in said matrix by means of nodes, and in which a fault-handling mechanism provides an error-reporting matrix including horizontal Bus Error Report Lines (BERLs) and vertical Module Error Report Lines (MERLs), said BERLs being associated with said MACD buses such that all nodes sharing an MACD bus are connected with a BERL so that errors occurring in error confinement areas may be rebroadcast over one of said BERLs, said MERLs being associated with said ACD buses such that all nodes sharing an ACD bus are conneted with a MERL so that errors occurring in error-confinement areas may be reported over a corresponding one of said MERLs;
- error-report log means connected at the intersection of one of said MERLs and one of said BERLs comprising;
holding means connected to said MERLS and said BERLS for holding information from a most recent error-report message; logging means connected to said holding means for logging error reports, said logging means including means for uniquely identifying said error reports so that said error reports are distinguishable as between the single occurrence of an error, and repeated occurrences of the same error; and
,logic means connected to said holding means and said logging means for immediately updating said logging means upon te condition that an error-report message is received, said logic means including an error-count means, and means connected to said error-count means for incrementing said error count means upon the condition that the single occurrence of an error is reported. - View Dependent Claims (8)
- error-report log means connected at the intersection of one of said MERLs and one of said BERLs comprising;
-
9. In a data processing system in which a switching matrix provides electrical interconnections between horizontal MACD buses and vertical ACD buses connected in said matrix by means of nodes, and in which a fault-handling mechanism provides an error-reporting matrix including horizontal Bus Error Report Lines (BERLs) and vertical Module Error Report Lines (MERLs), said BERLs being associatd with said MACD buses such that all nodes sharing an MACD bus are connected with a BERL so that errors occurring in error-confinement areas may be reported over one of said BERLs, said MERLs being associated with said ACD buses such that all nodes sharing an ACD bus are connected with a MERL so that errors occurring in error-confinement areas may be rebroadcast over one of said MERLs;
- error-reporting logic testing means at one node connected at the intersection of one of said MERLs and one of said BERLs comprising;
test command receiving means connected to said MACD bus for receiving a test command; said test command including means which identifies said one node as a node (tested node) to be tested, said test command being comprised of a read access request for read data from said one node, said read access request being directed to said one node (tested node) from another node (testing node) over said MACD bus; and
,test logic means connected to said test command receiving means and to said MACD bus, responsive to said test command receiving means, for returning said read data to said testing node; said test logic means including means for causing a known error condition to occur in said read data, thereby forcing an error at one of said confinement areas. - View Dependent Claims (10)
- error-reporting logic testing means at one node connected at the intersection of one of said MERLs and one of said BERLs comprising;
Specification