Method and system for end-to-end problem determination and fault isolation for storage area networks
First Claim
1. A method for processing errors within a storage area network (SAN), the method comprising the computer implemented steps of:
- generating a SAN topology map comprising a table in which each row of the SAN topology table is uniquely mapped to a communication architecture element (CAE) and each column of the SAN topology table is uniquely mapped to the CAE, wherein the CAE is a network-connected device that has successfully registered with a communications architecture manager (CAM) via a network service protocol, wherein the CAM contains problem determining (PD) functionality for the SAN and maintains the SPDIT;
generating a SAN problem determination information table (SPDIT); and
generating a SAN diagnostic table (SDT) using the SAN topology map and the SPDIT.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for problem determination and fault isolation in a storage area network (SAN) is provided. A complex configuration of multi-vendor host systems, FC switches, and storage peripherals are connected in a SAN via a communications architecture (CA). A communications architecture element (CAE) is a network-connected device that has successfully registered with a communications architecture manager (CAM) on a host computer via a network service protocol, and the CAM contains problem determination (PD) functionality for the SAN and maintains a SAN PD information table (SPDIT). The CA comprises all network-connected elements capable of communicating information stored in the SPDIT. The CAM uses a SAN topology map and the SPDIT are used to create a SAN diagnostic table (SDT). A failing component in a particular device may generate errors that cause devices along the same network connection path to generate errors. As the CAM receives error packets or error messages, the errors are stored in the SDT, and each error is analyzed by temporally and spatially comparing the error with other errors in the SDT. If a CAE is determined to be a candidate for generating the error, then the CAE is reported for replacement if possible.
129 Citations
28 Claims
-
1. A method for processing errors within a storage area network (SAN), the method comprising the computer implemented steps of:
-
generating a SAN topology map comprising a table in which each row of the SAN topology table is uniquely mapped to a communication architecture element (CAE) and each column of the SAN topology table is uniquely mapped to the CAE, wherein the CAE is a network-connected device that has successfully registered with a communications architecture manager (CAM) via a network service protocol, wherein the CAM contains problem determining (PD) functionality for the SAN and maintains the SPDIT;
generating a SAN problem determination information table (SPDIT); and
generating a SAN diagnostic table (SDT) using the SAN topology map and the SPDIT. - View Dependent Claims (2, 3, 4, 5)
product vendor information;
product identifier information;
information concerning a type of communication link supported by the product or element; and
/or information concerning a type of error information to be reported by the product or element.
-
-
4. The method of claim 3 wherein the type of error information indicates whether the product or element supports Extended Link Services (ELS) Registered Link Incident Record (RLIR).
-
5. The method of claim 3 wherein the SDT stores information from the SAN topology map and errors received by the CAM from CAEs.
-
6. A method for processing errors within a storage area network (SAN), the method comprising the computer-implemented steps of:
-
receiving an error message at a communication architecture manager (CAM), wherein the CAM comprises problem determination (PD) functionality for the SAN, wherein a CAM maintains a SAN PD information table (SPDIT), and wherein a communication architecture (CA) managed by the CAM comprises all network-connected elements capable of communicating information stored in the SPDIT;
generating a SAN topology map comprising a table in which each row of the SAN topology tables uniquely mapped to a communication architecture element (CAE) and each column of the SAN topology table is uniquely mapped to the CAE, wherein the CAE is a network-connected device that has successfully registered with the communications architecture manager (CAM) via a network service protocol, wherein the CAM contains problem determination (PD) functionality for the SAN and maintains the SPDIT; and
processing the error message using a real-time diagnostic algorithm (RDA). - View Dependent Claims (7, 8, 9, 10, 11)
a plurality of storage devices connected to the network; and
a plurality of host computers connected to the network, wherein at least one of the plurality of host computers comprises a CAM;
wherein at least some of the error messages are generated by at least some of the plurality of storage devices and host computers.
-
-
9. The method of claim 6 further comprising:
-
map;
generating a SAN diagnostic table (SDT) using the SAN topology map and the SPDIT.
-
-
10. The method of claim 9 further comprising:
-
analyzing the received error message using a temporal correlation window (TCW) value to temporally constrain fault isolation determination while searching for temporally-related error messages previously received by the CAM and stored within the SDT; and
analyzing the received error message using a spatial correlation path data structure (SCP) to spatially constrain fault isolation determination while searching for spatially-related error messages previously received by the CAM and stored within the SDI.
-
-
11. The method of claim 10 further comprising:
analyzing the received error message using error severity weightings according to a type of error indicated by the received error message.
-
12. A data processing system for communicating error information in a storage area network (SAN), the data processing system comprising:
-
a network comprising in-band Fibre Channel communication links and out-of-band communication links, wherein the network supports a communications architecture (CA);
a plurality of storage devices connected to the network;
a plurality of host computers connected to the network, wherein at least one of the plurality of host computers comprises a communications architecture manager (CAM) containing problem determination (PD) functionality, wherein a CAM maintains a SAN PD information table (SPDIT), and wherein the CA comprises all network-connected elements capable of communicating information stored in the SPDIT, and wherein the at least one of the plurality of host computer systems includes a SAN topology map comprising a table in which each row of the SAN topology table is uniquely mapped to a communication architecture element (CAE) and each column of the SAN topology table is uniquely mapped to the CAE, wherein the CAE is a network-connected device that has successfully registered with the communications architecture manager (CAM) via a network service protocol, wherein the CAM contains problem determination (PD) functionality for the SAN and maintains the SPDIT. - View Dependent Claims (13, 14, 15)
a plurality of CAMs, wherein the CA comprises a primary CAM and one or more secondary CAMs, wherein a secondary CAM operates redundantly for a primary CAM.
-
-
14. The data processing system of claim 12 wherein the CA further comprises one or more CA elements (CAEs) and one or more CA non-participants (CANs), wherein a CAE is a network-connected device that has successfully registered with a CAM via a network service protocol, and wherein a CAN is a network-connected device that has not registered with a CAM yet known to be present via a SAN topology discovery process.
-
15. The data processing system of claim 12 wherein the in-band Fibre Channel communication links and the out-of-band communication links are provided by a single, physical communication link.
-
16. A data processing system for processing errors within a storage area network (SAN), the data processing system comprising:
-
first generating means for generating a SAN topology map comprising a table in which each row of the SAN topology table is uniquely mapped to a communication architecture element (CAE) and each column of the SAN topology table is uniquely mapped to the CAE, wherein the CAE is a network-connected device that has successfully registered with a communications architecture manager (CAM) via a network service protocol, wherein the CAM contains problem determination (PD) functionality for the SAN and maintains the SPDIT;
second generating means for generating a SAN problem determination information table (SPDIT); and
third generating means for generating a SAN diagnostic table (SDT) using the SAN topology map and the SPDIT. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A data processing system for processing errors within a storage area network (SAN), the data processing system comprising:
-
receiving means for receiving an error message at a communication architecture manager (CAM), wherein the CAM comprises problem determination (PD) functionality for the SAN, wherein a CAM maintains a SAN PD information table (SPDIT), and wherein a communication architecture (CA) managed by the CAM comprises all network-connected elements capable of communicating information stored in the SPDIT;
generating means for generating a SAN topology map comprising a table in which each row of the SAN topology table is uniquely mapped to a communication architecture element (CAE) and each column of the SAN topology table is uniquely mapped to the CAE, wherein the CAE is a network-connected device that has successfully registered with the communications architecture manager (CAM) via a network service protocol, wherein the CAM contains problem determination (PD) functionality for the SAN and maintains the SPDIT; and
processing means for processing the error message using a real-time diagnostic algorithm (RDA). - View Dependent Claims (22, 23, 24, 25, 26)
a plurality of storage devices connected to the network; and
a plurality of host computers connected to the network, wherein at least one of the plurality of host computers comprises a CAM;
wherein at least some of the error messages are generated by at least some of the plurality of storage devices and host computers.
-
-
24. The data processing system of claim 21 further comprising:
generating means for generating a SAN diagnostic table (SDT) using the SAN topology map and the SPDIT.
-
25. The data processing system of claim 24 further comprising:
-
first analyzing means for analyzing the received error message using a temporal correlation window (TCW) value to temporally constrain fault isolation determination while searching for temporally-related error messages previously received by the CAM and stored within the SDT; and
second analyzing means for analyzing the received error message using a spatial correlation path data structure (SCP) to spatially constrain fault isolation determination while searching for spatially-related error messages previously received by the CAM and stored within the SDT.
-
-
26. The data processing system of claim 25 further comprising:
third analyzing means for analyzing the received error message using error severity weightings according to a type of error indicated by the received error message.
-
27. A computer program product in a computer-readable medium for use in a data processing system for processing errors within a storage area network (SAN), the computer program product comprising:
-
first instructions for generating a SAN topology map comprising a table in which each row of the SAN topology table is uniquely mapped to a communication architecture element (CAE) and each column of the SAN topology table is uniquely mapped to the CAE, wherein the CAE is a network-connected device that has, successfully registered with a communications architecture manager (CAM) via a network service protocol, wherein the CAM contains problem determination (PD) functionality for the SAN and maintains the SPDIT;
second instructions for generating a SAN problem determination information table (SPDIT); and
third instructions for generating a SAN diagnostic table (SDT) using the SAN topology map and the SPDIT.
-
-
28. A computer program product in a computer-readable medium for use in a data processing system for processing errors within a storage area network (SAN), the computer program product comprising:
-
first instructions for receiving an error message at a communication architecture manager (CAM), wherein the CAM comprises problem determination (PD) functionality for the SAN, wherein a CAM maintains a SAN PD information table (SPDIT), and wherein a communication architecture (CA) managed by the CAM comprises all network-connected elements capable of communicating information stored in the SPDIT; and
second instructions for generating a SAN topology map comprising a table in which each row of the SAN topology table is uniquely mapped to a communication architecture element (CAE) and each column of the SAN topology table is uniquely mapped to the CAE, wherein the CAE is a network-connected device that has successfully registered with the communications architecture manager (CAM) via a network service protocol wherein the CAM contains problem determination (PD) functionality for the SAN and maintains the SPDIT; and
third instructions for processing the error message using a real-time diagnostic algorithm (RDA).
-
Specification