Recovery from transitory storage area network component failures
First Claim
1. A mass data storage system comprising a storage server computer, a data storage subsystem, and a plurality of communication paths connecting the storage server computer to the data storage subsystem over which communications are conducted between the storage server computer and the data storage subsystem, each communication path subject to multiple different types of errors each having a specific severity which negatively affects the communications over that communication path;
- and wherein the storage server computer is operative to;
detect each type of error which occurs on each communication path;
count each detected error for each communication path on a continuous ongoing basis to establish an accumulated count value equal to the number of each type of detected error for each communication path;
decrement the accumulated count value for each type of detected error for each communication path by a predetermined amount at periodic intervals to establish a decremented accumulated count value until the decremented accumulated count value for each type of error for each communication path reaches a zero value;
attribute a weight value for each type of detected error related to the severity of the type of detected error;
calculate a weighted error value for each communication path by multiplying the weight value for each type of error by the decremented accumulated count value for each type of error and adding the results of such multiplications for each communication path;
establish a first fault threshold of weighted error values indicative of unreliable communications over each communication path;
establish a second fault threshold of weighted error values indicative of reliable communications over each communication path, the second fault threshold being less than the first fault threshold;
cease use of any one communication path for communications when the weighted error value for that one communication path exceeds or is equal to the first fault threshold; and
resume use of the one communication path when the weighted error value for that one communication path is less than or equal to the second fault threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
Lun communications between a storage server and a storage subsystem for a particular lun are assigned both a current path and an alternate path. Lun communications use the current path unless the current path is determined to be faulty. Path errors may result in the storage server determining a path to be faulty. If the current path for a lun communication is determined to be faulty, then the lun communications will be sent through the alternate path so long as the alternate path is determined to be reliable. Over time, a path previously determined to be faulty may recover and be used again for lun communications.
-
Citations
24 Claims
-
1. A mass data storage system comprising a storage server computer, a data storage subsystem, and a plurality of communication paths connecting the storage server computer to the data storage subsystem over which communications are conducted between the storage server computer and the data storage subsystem, each communication path subject to multiple different types of errors each having a specific severity which negatively affects the communications over that communication path;
- and wherein the storage server computer is operative to;
detect each type of error which occurs on each communication path; count each detected error for each communication path on a continuous ongoing basis to establish an accumulated count value equal to the number of each type of detected error for each communication path; decrement the accumulated count value for each type of detected error for each communication path by a predetermined amount at periodic intervals to establish a decremented accumulated count value until the decremented accumulated count value for each type of error for each communication path reaches a zero value; attribute a weight value for each type of detected error related to the severity of the type of detected error; calculate a weighted error value for each communication path by multiplying the weight value for each type of error by the decremented accumulated count value for each type of error and adding the results of such multiplications for each communication path; establish a first fault threshold of weighted error values indicative of unreliable communications over each communication path; establish a second fault threshold of weighted error values indicative of reliable communications over each communication path, the second fault threshold being less than the first fault threshold; cease use of any one communication path for communications when the weighted error value for that one communication path exceeds or is equal to the first fault threshold; and resume use of the one communication path when the weighted error value for that one communication path is less than or equal to the second fault threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- and wherein the storage server computer is operative to;
-
10. A method of determining the use of a plurality of communication paths connecting a storage server computer to a data storage subsystem for communications between the storage server computer and the data storage subsystem, each communication path subject to multiple different types of errors each having a specific severity which negatively affects the communications over that communication path, comprising:
-
detecting each type of error which occurs on each communication path; attributing a weight value for each type of detected error related to the severity of the type of detected error; counting each detected error for each communication path on a continuous ongoing basis to establish an accumulated count value equal to the number of each type of detected error for each communication path; decrementing the accumulated count value for each type of detected error for each communication path by a predetermined amount at periodic intervals to establish a decremented accumulated count value until the decremented accumulated count value for each type of error for each communication path reaches a zero decremented accumulated count value; calculating a weighted error value for each communication path by multiplying the weight value for each type of error by the decremented accumulated count value for each type of error and adding the results of such multiplications for each communication path; establishing a first fault threshold of weighted error values indicative of unreliable communications over each communication path; establishing a second fault threshold of weighted error values indicative of reliable communications over each communication path, the second fault threshold being less than the first fault threshold; ceasing use of any one communication path for communications when the weighted error value for that one communication path equals or exceeds the first fault threshold; and resuming use of the one communication path when the weighted error value for that one communication path is less than or equal to the second fault threshold. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A mass data storage system comprising a storage server computer, a data storage subsystem, and a plurality of communication paths connecting the storage server computer and the data storage subsystem over which communications are conducted between the storage server computer and the data storage subsystem, each communication path subject to multiple different types of errors each having a specific severity which negatively affects the communications over that communication path, and wherein the storage server computer is operative to:
-
calculate a weighted error value for each communication path by multiplying a weight value for each type of error that occurs on each communication path by a count value for each error type and adding the results of all such multiplications for each communication path; establish first and second fault thresholds of weighted error values indicative of unreliable and reliable communications over each communication path, respectively, the second fault threshold being less than the first fault threshold; cease use of any one communication path for communications when the weighted error value for the one communication path exceeds or is equal to the first fault threshold; and resume use of the one communication path when the weighted error value for the one communication path is less than or equal to the second fault threshold. - View Dependent Claims (20, 21)
-
-
22. A method of determining the use of a plurality of communication paths connecting a storage server computer to a data storage subsystem for communications between the storage server computer and the data storage subsystem, each communication path subject to multiple different types of errors each having a specific severity which negatively affects the communications over that communication path, comprising:
-
calculating a weighted error value for each communication path by multiplying a weight value for each type of error that occurs on each communication path by a count value for each error type and adding the results of all such multiplications for each communication path; establishing first and second fault thresholds of weighted error values indicative of unreliable and reliable communications over each communication path, respectively, the second fault threshold being less than the first fault threshold; ceasing use of any one communication path for communications when the weighted error value for the one communication path exceeds or is equal to the first fault threshold; and resuming use of the one communication path when the weighted error value for the one communication path is less than or equal to the second fault threshold. - View Dependent Claims (23, 24)
-
Specification