Creating environmental snapshots of storage device failure events
First Claim
1. A method for analyzing a storage device failure in a computer storage system, comprising the steps of:
- continuously collecting information about the storage device including at least one of an average input/output (I/O) time of the storage device or a maximum I/O time of the storage device;
storing the collected information;
polling shelf log data from a shelf controller of a storage shelf containing the storage device when an I/O error occurs, wherein the I/O error is time-correlated with one or more errors of the shelf log data;
determining whether the storage device has failed;
analyzing the stored collected information and the shelf log data by the computer storage system to determine a reason for the storage device failure;
wherein if the computer storage system cannot determine the reason for the storage device failure based on the stored collected information, the method further comprising;
gathering additional information about the storage device failure; and
said analyzing step analyzing the stored collected information and the gathered additional information by the computer storage system to determine the reason for the storage device failure, the gathered additional information including information about the storage shelf on which the storage device is located and information about adjacent storage devices on the storage shelf.
2 Assignments
0 Petitions
Accused Products
Abstract
A storage device failure in a computer storage system can be analyzed by the storage system by examining relevant information about the storage device and its environment. Information about the storage device is collected in real-time and stored; this is an on-going process such that some information is continuously available. The information can include information relating to the storage device, such as input/output related information, and information relating to a storage shelf where the storage device is located, such as a status of adjacent storage devices on the shelf. All of the relevant information is analyzed to determine a reason for the storage device failure. Optionally, additional information may be collected and analyzed by the storage system to help determine the reason for the storage device failure. The analysis and supporting information can be stored in a log and/or presented to a storage system administrator to view.
359 Citations
20 Claims
-
1. A method for analyzing a storage device failure in a computer storage system, comprising the steps of:
-
continuously collecting information about the storage device including at least one of an average input/output (I/O) time of the storage device or a maximum I/O time of the storage device; storing the collected information; polling shelf log data from a shelf controller of a storage shelf containing the storage device when an I/O error occurs, wherein the I/O error is time-correlated with one or more errors of the shelf log data; determining whether the storage device has failed; analyzing the stored collected information and the shelf log data by the computer storage system to determine a reason for the storage device failure; wherein if the computer storage system cannot determine the reason for the storage device failure based on the stored collected information, the method further comprising; gathering additional information about the storage device failure; and said analyzing step analyzing the stored collected information and the gathered additional information by the computer storage system to determine the reason for the storage device failure, the gathered additional information including information about the storage shelf on which the storage device is located and information about adjacent storage devices on the storage shelf. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for analyzing a storage device failure in a storage system, comprising:
-
a storage shelf including a shelf controller and a storage device; and a storage device driver in communication with said storage device, said storage device driver configured to; continuously collect information regarding said storage device and a storage shelf on which said storage device is located, wherein the information regarding said storage device includes at least one of an average input/output (I/O) time of the storage device or a maximum I/O time of the storage device; poll shelf log data from the shelf controller of the storage shelf when an I/O error occurs, wherein the I/O error is time-correlated with one or more errors of the shelf log data; analyze the collected information and the shelf log data to determine a reason for the storage device failure; wherein if the system cannot determine the reason for the storage device failure based on the collected information, said storage device driver is configured to; gather additional information about the storage device failure; and analyze the collected information and the gathered additional information by the system to determine the reason for the storage device failure, the gathered additional information including information about the storage shelf on which the storage device is located and information about adjacent storage devices on the storage shelf. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable storage medium storing a set of instructions for execution by a general purpose computer to analyze failure of a storage device in a computer storage system, the set of instructions comprising:
-
a collecting code segment for collecting information about the storage device and at least one other component of the computer storage system, wherein; the information about the storage device includes an average input/output (I/O) time of the storage device or a maximum I/O time of the storage device; and the at least one other component is not another storage device; a storing code segment for storing the collected information; a polling code segment for polling shelf log data from a shelf controller of a storage shelf containing the storage device when an I/O error occurs, wherein the I/O error is time-correlated with one or more errors of the shelf log data; a determining code segment for determining whether the storage device has failed; an analyzing code segment for analyzing the stored collected information and the shelf log data to determine a reason for the storage device failure; a code segment for if the computer cannot determine the reason for the storage device failure based on the stored collected information; gathering additional information about the storage device failure; and analyzing the stored collected information and the gathered additional information by the computer to determine the reason for the storage device failure, the gathered additional information including information about the storage shelf on which the storage device is located and information about adjacent storage devices on the storage shelf. - View Dependent Claims (18, 19, 20)
-
Specification