Method, apparatus, and program for detecting sequential and distributed path errors in MPIO
First Claim
1. A method for detecting errors in a device path, the method comprising:
- setting a time span for a time window based on a time to process a successful input/output command;
starting the time window;
responsive to the time window ending, determining whether at least one input/output error occurs on a device path during the time window;
responsive to one or more input/output errors occurring on the device path during the time window, incrementing an error count by one;
determining whether the error count reaches a predetermined limit; and
responsive to the error count reaching the predetermined limit, disabling the device path.
1 Assignment
0 Petitions
Accused Products
Abstract
An error detection mechanism is provided for detecting sequential and distributed errors in a device I/O stream. The sensitivity of the errors is user definable. The result of the error detection is fed back into the path management software, which may use the error information to decide whether a device path should be disabled. The error detection mechanism sets a time span for a time window and counts the number of errors that occur during the time window. Each time a time window ends with at least one error, the sequential error count and the distributed error count are incremented. However, if an I/O returns without an error, the sequential error count is cleared. If the sequential error count reaches a predetermined limit, the path is disabled. After a predetermined number of time windows, if the distributed error count reaches a predetermined limit, the path is disabled.
-
Citations
26 Claims
-
1. A method for detecting errors in a device path, the method comprising:
-
setting a time span for a time window based on a time to process a successful input/output command; starting the time window; responsive to the time window ending, determining whether at least one input/output error occurs on a device path during the time window; responsive to one or more input/output errors occurring on the device path during the time window, incrementing an error count by one; determining whether the error count reaches a predetermined limit; and responsive to the error count reaching the predetermined limit, disabling the device path. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 24)
-
-
13. An apparatus for detecting errors in a device path, the apparatus comprising:
-
means for setting a time span for a time window based on a time to process a successful input/output command; means for starting the time window; means, responsive to the time window ending, for determining whether one or more input/output errors occur on a device path during a time window; means, responsive to one or more input/output errors occurring on the device path during the time window, for incrementing an error count; means for determining whether the error count reaches a predetermined limit; and means, responsive to the error count reaching the predetermined limit, for disabling the device path. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 25)
-
-
23. A computer program product, in a computer readable medium, for detecting errors in a device path, the computer program product comprising:
-
instructions for setting a time span for a time window based on a time to process a successful input/output command; instructions for starting the time window; instructions, responsive to the time window ending, for determining whether at least one input/output error occurs on a device path during the time window; instructions, responsive to one or more input/output errors occurring on the device path during the time window, for incrementing an error count by one; instructions for determining whether the error count reaches a predetermined limit; and instructions, responsive to the error count reaching the predetermined limit, for disabling the device path. - View Dependent Claims (26)
-
Specification