Method and means for utilizing device long busy response for resolving detected anomalies at the lowest level in a hierarchical, demand/response storage management subsystem

US 5,968,182 A
Filed: 05/12/1997
Issued: 10/19/1999
Est. Priority Date: 05/12/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for detecting and correcting a defective operating state or condition of a hierarchical demand/responsive storage subsystem of the passive fault management type attaching a host CPU, said subsystem including a plurality of cyclic, tracked storage devices, an interrupt-driven, task-switched control logic, and means responsive to the control logic for forming at least one path of a set of paths coupling the host to at least one device, said host enqueuing one or more read and write requests to said subsystem, said subsystem control logic responsively interpreting each request and establishing a path to an addressed storage device, comprising the steps at the subsystem of:

(a) detecting an anomaly in the read back or staging of data from the device and executing a retry of the counterpart request by active or passive querying of said addressed device;

(b) in the event that the detected anomaly persists, presenting a long busy status signal to the host CPU by the control logic, said long busy signal being an indication that the counterpart request has yet to be completed by the subsystem;

(c) inhibiting host access to the device by the control logic for no more than a predetermined time interval;

(d) ascertaining whether the inhibited device has returned to an operational state, and(1) in the event the anomaly is resolved, setting an attention interrupt in the control logic by the device and terminating the device long busy signal in the host CPU by the control logic, and(2) in the event that the time interval has been exceeded and the anomaly is not resolved, invoking one or more data recovery procedures including resetting the device by the control logic; and

(e) reporting status to the host CPU.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and means within a hierarchical, demand/response DASD subsystem of the passive fault management type in which, upon the occurrence of fault, error, or erasure, a long device busy signal of finite duration is provided to a host CPU. Any DASD storage device subject to the anomaly is isolated from any host inquiry during this interval. These measures permit retry or other recovery procedures to be implemented transparent to the host and the executing application. This avoids premature declarations of faults, errors, or erasures and consequent host application aborts and other catastrophic measures. If the detected anomaly is not resolved within the allotted time, then other data recovery procedures can be invoked including device reset, the status reported to the host, and the next request processed.

Citations

6 Claims

1. A method for detecting and correcting a defective operating state or condition of a hierarchical demand/responsive storage subsystem of the passive fault management type attaching a host CPU, said subsystem including a plurality of cyclic, tracked storage devices, an interrupt-driven, task-switched control logic, and means responsive to the control logic for forming at least one path of a set of paths coupling the host to at least one device, said host enqueuing one or more read and write requests to said subsystem, said subsystem control logic responsively interpreting each request and establishing a path to an addressed storage device, comprising the steps at the subsystem of:
- (a) detecting an anomaly in the read back or staging of data from the device and executing a retry of the counterpart request by active or passive querying of said addressed device;
  
  (b) in the event that the detected anomaly persists, presenting a long busy status signal to the host CPU by the control logic, said long busy signal being an indication that the counterpart request has yet to be completed by the subsystem;
  
  (c) inhibiting host access to the device by the control logic for no more than a predetermined time interval;
  
  (d) ascertaining whether the inhibited device has returned to an operational state, and(1) in the event the anomaly is resolved, setting an attention interrupt in the control logic by the device and terminating the device long busy signal in the host CPU by the control logic, and(2) in the event that the time interval has been exceeded and the anomaly is not resolved, invoking one or more data recovery procedures including resetting the device by the control logic; and
  
  (e) reporting status to the host CPU.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, wherein the step of ascertaining whether the inhibited device has returned to an operational state includes executing at least one step selected from the set of steps consisting of polling device status by the control logic, setting of an interrupt in the control logic by the device, and exceeding the predetermined (recovery) time.
  - 3. The method according to claim 1, wherein the step of detecting an anomaly and retrying the request includes the steps of ascertaining whether at least two of the failure-independent paths to the device are operable and invoking step (b) where only one such path is ascertained as available.
  - 4. The method according to claim 1, wherein the step of inhibiting access to the device includes the steps of suspending execution of any new read and write requests and pinning the destaging of any data to the device.

5. In a hierarchical demand/response storage subsystem of the passive fault management type, said subsystem being responsive to read and write requests from a host CPU for establishing access to at least one of a plurality of cyclic, multitracked storage devices over one path selected from a set of at least two failure-independent paths terminating in said device, said subsystem including means for detecting and correcting a defective operating state or condition in the subsystem or attached devices, whereby said detecting and correcting means further comprise:
- means for detecting an anomaly in the read back or staging of a binary data stream from a device and for retrying said read back or staging;
  
  means for ascertaining whether only one path to the device is operable, whether the anomaly persists after retry and, if so, for presenting a long busy status to the host CPU;
  
  means for inhibiting host access to the device for up to a predetermined time interval;
  
  means for terminating the long busy status in the host CPU responsive to an attention interrupt from the device indicative that the inhibited device has returned to an operational state and the anomaly has been resolved;
  
  means responsive to the time interval having been exceeded and the nonresolution of the anomaly for invoking one or more data recovery procedures including resetting the device; and
  
  means for reporting the current status of the device to the host.

6. An article of manufacture comprising a machine-readable memory having stored therein indicia of a plurality of processor-executable control program steps for detecting and correcting a defective operating state or condition of a hierarchical demand/responsive storage subsystem of the passive fault management type attaching a host CPU, said subsystem including a plurality of cyclic, tracked storage devices, an interrupt-driven, task-switched control logic, and means responsive to the control logic for forming at least one path of a set of paths coupling the host to at least one device, said host enqueuing one or more read and write requests to said subsystem, said subsystem control logic responsively interpreting each request and establishing a path to an addressed storage device, said plurality indicia of control program steps executable at the subsystem include:
- (a) indicia of a control program step for detecting an anomaly in the read back or staging of data from the device and executing a retry of the counterpart request by active or passive querying of said addressed device;
  
  (b) indicia of a control program step in the event that the detected anomaly persists for presenting a long busy status signal to the host CPU by the control logic, said long busy signal being an indication that the counterpart request has yet to be completed by the subsystem;
  
  (c) indicia of a control program step for inhibiting host access to the device by the control logic for no more than a predetermined time interval;
  
  (d) indicia of a control program step for ascertaining whether the inhibited device has returned to an operational state, and(1) in the event the anomaly is resolved, for setting an attention interrupt in the control logic by the device and for terminating the device long busy signal in the host CPU by the control logic, and(2) in the event that the time interval has been exceeded and the anomaly is not resolved, for invoking one or more data recovery procedures including resetting the device by the control logic; and
  
  (e) indicia of a control program step for reporting status to the host CPU.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
GlobalFoundries, Inc.
Original Assignee
International Business Machines Corporation
Inventors
Liu, Julia, Sherman, William G. II, Chen, James C., Ng, Chan Y.
Primary Examiner(s)
Beausoliel, Jr., Robert W.
Assistant Examiner(s)
Nguyen, Andy

Application Number

US08/854,441
Time in Patent Office

890 Days
Field of Search

395/182.03, 395/182.04, 395/182.05, 395/182.06, 395/183.18, 395/185.01, 395/185.07, 395/846, 395/858, 395/185.1, 395/182.09 , 395/183.01, 395/183.17, 371/37.7, 371/40.13, 371/40.15, 364/728.02
US Class Current

714/5.11
CPC Class Codes

G06F 11/1435 using file system or storag...

Method and means for utilizing device long busy response for resolving detected anomalies at the lowest level in a hierarchical, demand/response storage management subsystem

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Method and means for utilizing device long busy response for resolving detected anomalies at the lowest level in a hierarchical, demand/response storage management subsystem

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links