System and method for rapid fault isolation in a storage area network
First Claim
1. A fault isolation system adapted for use in a computer network having a host in communication with a storage device configured to store information relating to a plurality of detected errors that occur in packets transmitted over the network, the system having a plurality of program modules configured to execute on at least one processor, the program modules including:
- an error detection module configured to identify respective components within the network at which each of the plurality of detected errors occurs;
an error count module configured to increment an error count at each identified component where an error has occurred;
a packet ignore module configured to alter a given packet for which an error has been detected to indicate to components other than the identified component not to increment their error counts for the given packet; and
a link segment identification module configured to identify at least one link segment coupled to each identified component at which an error count is incremented.
3 Assignments
0 Petitions
Accused Products
Abstract
A fault region identification system adapted for use in a network, such as a storage area network (SAN), includes logic and/or program modules configured to identify errors that occur in the transmission of command, data and response packets between at least one host, switches and target devices on the network. The system maintains a count at each of a plurality of packet-receiving components of the network, the count indicating a number of CRC or other errors that have been detected by each component. The error counts are stored with the time of detection. The system alters the EOF (end-of-file) delimiter for each packet for which an error was counted such that other components ignore that packet, i.e. do not increment their error counts for that packet. Link segments adjacent single- or multiple-device components of the network are identified as fault regions, based upon the error counts of those components.
67 Citations
21 Claims
-
1. A fault isolation system adapted for use in a computer network having a host in communication with a storage device configured to store information relating to a plurality of detected errors that occur in packets transmitted over the network, the system having a plurality of program modules configured to execute on at least one processor, the program modules including:
-
an error detection module configured to identify respective components within the network at which each of the plurality of detected errors occurs;
an error count module configured to increment an error count at each identified component where an error has occurred;
a packet ignore module configured to alter a given packet for which an error has been detected to indicate to components other than the identified component not to increment their error counts for the given packet; and
a link segment identification module configured to identify at least one link segment coupled to each identified component at which an error count is incremented. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19)
-
-
15. A method for identifying a fault region in a network having a host processor-based system in communication with at least one target device and a plurality of switches, including the steps of:
-
identifying at least one error relating to transmission of a packet on the network;
generating an error count relating to the identified error and corresponding to a first component at which the error was identified;
generating an indicator relating to the packet configured to inhibit other components from generating error counts relating to the identified error.
-
-
18. A computer program product stored on a computer-usable medium, comprising a computer-readable program configured to cause a computer to control execution of an application to identify a fault region associated with at least one of a plurality of detected errors in a network, the computer-readable program including:
-
an error identification module configured to identify at least one error relating to transmission of an error packet on the network;
an error count module configured to maintain an error count relating to at least one error identified at each of a plurality of components on the network;
a packet delimiter module configured to modify a packet for which an error is detected at a first component of the network to inhibit other components from generating error counts relating to the identified error; and
a fault region detection module configured to identify at least one link segment adjacent the first component as the fault region.
-
-
20. A computer network, including:
-
a host including a processor and a host bus adapter;
error identification logic configured to identify at least one error relating to transmission of an error packet on the network;
error count logic configured to maintain an error count relating to at least one error identified at each of a plurality of components on the network;
packet delimiter logic configured to modify a packet for which an error is detected at a first component of the network to inhibit other components from generating error counts relating to the identified error; and
fault region detection logic configured to identify at least one link segment adjacent the first component as the fault region. - View Dependent Claims (21)
-
Specification