External storage
First Claim
1. A failure recovery method for use in a data processing system including at least one host system, a plurality of controllers, and an interface cable connecting said host system to said controllers in a daisy chain, said controllers respectively including therein I/O ports being connected to said interface cable and having mutually different IDs, an I/O device being controlled by a group of at least two controllers, the method comprising the steps of:
- detecting, when a failure is detected in a controller of said group, a utilization state of said interface cable by a controller as a substitutive unit of a failed controller of said group;
deciding, according to the utilization state of said interface cable, a state of reception by said failed controller of an I/O request from said host system;
suppressing by a substitutive controller, when the I/O request is not yet received by said failed controller as a result of the decision, reception of the I/O request by said failed controller;
adding an ID of an I/O port related to said failed controller to an I/O port of said substitutive controller; and
resetting the I/O port related to said failed controller; and
adding by said substitutive controller, when the I/O request is already received by said failed controller as a result of the decision, the ID of said I/O port related to said failed controller to the I/O port of said substitutive controller and resetting the I/O port related to said failed controller before said host system recognizes a permanent error in said failed controller.
0 Assignments
0 Petitions
Accused Products
Abstract
In an external storage, an I/O process is continued without any intervention of a user or a host system at failure of a controller. When a failure occurs in a controller, a host system 10 recognizes the failure of the controller. Before the failure is notified to the user and application to stop the job, the substitutive controller reads the SCSI-ID possessed by an SCSI port of the failed controller from a shared memory, registers the SCSI-ID of the SCSI port to the SCSI port associated with the substitutive controller, and erases by a port address resetting facility 45 of the substitutive controller the SCSI-ID possessed by an SCSI port of the failed controller. Thanks to the provision, since the SCSI-ID specified at issuance of an I/O request is transferred between the controllers, the user or the host system need not alter the I/O request issuing route. Moreover, while the host system does not recognize the error, the transfer can be conducted.
73 Citations
21 Claims
-
1. A failure recovery method for use in a data processing system including at least one host system, a plurality of controllers, and an interface cable connecting said host system to said controllers in a daisy chain, said controllers respectively including therein I/O ports being connected to said interface cable and having mutually different IDs, an I/O device being controlled by a group of at least two controllers, the method comprising the steps of:
-
detecting, when a failure is detected in a controller of said group, a utilization state of said interface cable by a controller as a substitutive unit of a failed controller of said group;
deciding, according to the utilization state of said interface cable, a state of reception by said failed controller of an I/O request from said host system;
suppressing by a substitutive controller, when the I/O request is not yet received by said failed controller as a result of the decision, reception of the I/O request by said failed controller;
adding an ID of an I/O port related to said failed controller to an I/O port of said substitutive controller; and
resetting the I/O port related to said failed controller; and
adding by said substitutive controller, when the I/O request is already received by said failed controller as a result of the decision, the ID of said I/O port related to said failed controller to the I/O port of said substitutive controller and resetting the I/O port related to said failed controller before said host system recognizes a permanent error in said failed controller. - View Dependent Claims (2, 3, 4)
-
-
5. A data processing system, comprising:
-
at least one host system;
a plurality of controllers; and
an interface cable connecting said host system to said controllers in a daisy chain, said controllers respectively including therein I/O ports being connected to said interface cable and having mutually different IDs;
an I/O device being commonly controlled by a group of at least two controllers; and
a shared memory being commonly accessed from said group, each of controllers in said group including a microprocessor, the microprocessor in each of said controllers including;
means for detecting a failure in a controller of said group according to contents of said shared memory;
means for detecting a utilization state of said interface cable via an I/O port;
means for deciding, according to the utilization state of said interface cable, a state of reception by said failed controller of an I/O request from said host system;
means for suppressing, when the I/O request is not yet received by said failed controller as a result of the decision, reception of the I/O request by said failed controller;
adding an ID of the I/O port related to said failed controller to an I/O port of a controller of its own; and
indicating to reset the I/O port related to said failed controller; and
means for adding, when the I/O request is already received by said failed controller as a result of the decision, the ID of the I/O port related to said failed controller to the I/O port of the controller of its own; and
indicating to reset the I/O port related to said failed controller before said host system recognizes a permanent error in said failed controller. - View Dependent Claims (6, 7, 8)
-
-
9. An external storage for use in a data processing system including a host system, an external storage including a plurality of controllers respectively having therein ports possessing identifiers as individual port addresses and a group of storages controlled by and shared between said plural controllers, and an interface cable connecting in a daisy chain said host system to said plural controllers having the ports therein, said plural controllers and storages being accessible from said host system,
said external storage having a function that at occurrence of a failure in a controller excepting at least one controller, a normal controller detects the failure, references a port address of a failed controller, receives control information of said failed controller, and adds control information to the port address thereof.
-
11. An external storage in a data processing system including host system, an external storage including a plurality of controllers respectively having therein ports possessing identifiers as individual port addresses and a group of storages controlled by and shared between said plural controllers, and an interface cable connecting in a daisy chain said host system to said plural controllers having the ports therein, said plural controllers and storages being accessible from said host system,
said external storage having a function that at occurrence of a failure in a controller excepting at least one controller, a normal controller detects the failure, references a port address of a failed controller, receives control information of said failed controller, and adds the control information to the port address thereof, a controller having a port address resetting facility for resetting the port address of said failed controller and erasing an ID thereof in such a manner that the controller resets the port address of said failed controller, that said failed controller does not respond to subsequent I/O requests from said host system, and that said normal controller having received the port address responds to the I/O requests.
-
16. An external storage in a data processing system including a host system, an external storage including a plurality of controllers respectively having therein ports possessing identifiers as individual port addresses and a group of storages controlled by and shared between said plural controllers, and an interface cable connecting in a daisy chain said host system to said plural controllers having the ports therein, said plural controllers and storages being accessible from said host system, wherein:
-
at occurrence of a failure in a controller excepting at least one controller, a failed controller recognizes the failure thereof and enters a wait state without executing a control operation thereof in at least a period of time equal to time in which said normal controller conducts a transfer process of control information of said failed controller and addition of a port address;
after said normal controller which recognized the failure finishes the transfer and addition processes, said failed controller erases the port address of said failed controller; and
said normal controller which received the port address of said failed controller responds to a subsequent I/O request issued from said host system since the port address of said failed controller is already erased. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A host system and an external storage connected by an interface cable in a configuration including a host system, an external storage including a plurality of controllers respectively having therein ports possessing identifiers as individual port addresses and a group of storages controlled by and shared between said plural controllers, and an interface cable connecting in a daisy chain said host system to said plural controllers having the ports therein, said plural controllers and said storages being accessible from said host system,
said external storage having a function that at occurrence of a failure in a controller excepting at least one controller, said normal controller detects the failure, references the port address of the failed controller, receives control information of said failed controller, and adds the control information to the port address thereof, said host system having a function that in a state in which a controller having received an I/O request issued from the host system cannot respond thereto due to occurrence of a failure in the controller, said host system monitors an I/O completion report from the controller, issues again the I/O request to said failed controller after lapse of the predetermined monitoring period, executes a recovery process including a resetting operation, recognizes a permanent error when the controller does not respond to the recovery process, and notifies the error to the application, and said normal controller completing an operation including the reference, transfer, and additional port address processes before the permanent error is recognized, thereby preventing a report of the permanent error to an application of said host system.
Specification