Critical adapter local error handling
First Claim
1. A method for handling errors in adapters used for communication in data processing network having at least two nodes connected through a switch, said error handling method comprising the steps of:
- detecting a nonpermanent error condition, within an adapter connected to one of said nodes, from which recovery is possible from within the node connected to said adapter;
suspending communications from within the node with the adapter affected by said error condition;
disabling communication between said affected adapter and said switch so as to provide an indication to at least one other node in said network that communication with said affected adapter is at least temporarily suspended so as to effectively cause suspension of, but not termination of, applications running on said at least one other node in said network;
performing recovery operations, at said affected node, to restore operation of said affected adapter, based on said detected error condition, said recovery including enablement of said disabled communication; and
resuming communication with said affected adapter upon enablement of said disabled communication.
1 Assignment
0 Petitions
Accused Products
Abstract
Adapters, which provide message communications capabilities in a multinode data processing network, are provided with a mechanism for indicating critical errors from which recovery may ultimately be possible. Error handling capabilities are incorporated which operate both globally and locally to insure, to the greatest extent possible, that applications running on the network are not prematurely terminated and that the node with the error affected adapter is not prematurely removed from its connectivity with the other nodes within it network group.
57 Citations
6 Claims
-
1. A method for handling errors in adapters used for communication in data processing network having at least two nodes connected through a switch, said error handling method comprising the steps of:
-
detecting a nonpermanent error condition, within an adapter connected to one of said nodes, from which recovery is possible from within the node connected to said adapter; suspending communications from within the node with the adapter affected by said error condition; disabling communication between said affected adapter and said switch so as to provide an indication to at least one other node in said network that communication with said affected adapter is at least temporarily suspended so as to effectively cause suspension of, but not termination of, applications running on said at least one other node in said network; performing recovery operations, at said affected node, to restore operation of said affected adapter, based on said detected error condition, said recovery including enablement of said disabled communication; and resuming communication with said affected adapter upon enablement of said disabled communication.
-
-
2. A method for handling adapter errors in a multinode data processing network in which node-to-node communication is at least partially handled by adapters connected to said nodes, said adapters operating to pass messages from said nodes through a switch which links the nodes in said network, said error handling method comprising the steps of:
-
detecting a nonpermanent error condition, within an adapter connected to one of said nodes, from which recovery is possible from within the node connected to said error affected adapter; suspending communication from the node connected to said affected adapter; disabling communication between said affected adapter and said switch so as to provide an indication to at least one other node in said network that communication with said affected adapter is at least temporarily suspended, so as to effectively cause suspension of, but not termination of, applications running on said at least one other node in said network; performing recovery operations, at said affected node, to restore operation of said affected adapter, based on said detected error condition, said recovery including enablement of said disabled communication; terminating said running applications on nonaffected nodes in said network upon a determination that reestablishment of communication with said affected adapter is taking too long; and otherwise maintaining said running applications and restoring communication with said affected node after performance of said recovery operations. - View Dependent Claims (3, 4, 5, 6)
-
Specification