Method and system for reliability analysis of disk drive failures
First Claim
1. A method of performing reliability analysis in disk drives in an installation base comprising the steps of:
- recording identification information about each disk drive in a family of disk drives in the installation base;
recording error codes that are generated by the disk drives operating in the installation base;
creating a reliability database including mapping each recorded error code with a particular drive family;
adding to said reliability database further information from a drive manufacturer about failure mechanisms associated with one or more error codes; and
retrieving one or more error codes reported for a disk drive family during a time period, and matching the retrieved error codes with one or more failure mechanisms for that drive family.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for performing reliability analysis of disk drive failure mechanisms is provided. The information for performing the analysis is obtained in accordance with the invention from a database generated from identification information stored about individual drives, and drive families, that are deployed in the field. An error database stores error codes that are issued by a disk drive upon a particular event. These error codes are reported to a storage system administrator and recorded in the error database. The disk drive information and the error codes are mapped, and error codes are translated into failure mechanisms for a particular drive family. An analysis is performed whereby a hazard rate plot is provided for either all failure indicators or selected failure indicators or subpopulations for a particular drive family over a given time.
-
Citations
29 Claims
-
1. A method of performing reliability analysis in disk drives in an installation base comprising the steps of:
-
recording identification information about each disk drive in a family of disk drives in the installation base; recording error codes that are generated by the disk drives operating in the installation base; creating a reliability database including mapping each recorded error code with a particular drive family; adding to said reliability database further information from a drive manufacturer about failure mechanisms associated with one or more error codes; and retrieving one or more error codes reported for a disk drive family during a time period, and matching the retrieved error codes with one or more failure mechanisms for that drive family. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of performing reliability analysis in disk drives in an installation base comprising the steps of:
-
recording identification information about each disk drive in a family of disk drives in the installation base; recording error codes that are generated by the disk drives operating in the installation base; creating a reliability database including mapping each recorded error code with a particular drive family; adding to said reliability database further information from a drive manufacturer about failure mechanisms associated with one or more error codes; retrieving one or more error codes reported for a disk drive family during a time period, and matching the retrieved error codes with one or more failure mechanisms for that drive family; and dividing a disk drive family into subpopulations based upon predetermined conditions or events using disk drives that have undergone an upgrade or a new firmware version as a subpopulation.
-
-
10. A method of performing reliability analysis in disk drives in an installation base comprising the steps of:
-
recording identification information about each disk drive in a family of disk drives in the installation base; recording error codes that are generated by the disk drives operating in the installation base; creating a reliability database including mapping each recorded error code with a particular drive family; adding to said reliability database further information from a drive manufacturer about failure mechanisms associated with one or more error codes; retrieving one or more error codes reported for a disk drive family during a time period, and matching the retrieved error codes with one or more failure mechanisms for that drive family; and dividing a disk drive family into subpopulations based upon predetermined conditions or events using disk drives are operating under a new or upgraded operating system software version as a subpopulation.
-
-
11. A system configured to perform reliability analysis on disk drives, comprising:
-
(A) a reliability database configured to store; (i) information about each drive in a particular drive family; (ii) error information including a failure mechanism associated with each error code from each drive family; (iii) a mapping between individual error codes received, and drives within a drive family; and (B) a reliability analysis utility configured to extract data from said reliability database and to construct one or more hazard rate plots of failure indicators associated with particular drive families.
-
-
12. A system adapted to perform reliability analysis on disk drives, comprising:
-
(A) a storage system adapted to execute an autosupport utility that is adapted to forward error code information generated about disk drives associated with the storage system to a storage system administrator; (B) a reliability analysis utility adapted to receive and parse said error code information and to perform a reliability analysis thereupon; and (C) a graphical user interface adapted to display a representation of said reliability analysis. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A system configured to perform reliability analysis on disk drive failures indicators comprising:
-
means for reporting error codes that are generated by disk drives in said field installation; means for extracting information about an individual disk drive family and error codes reported on drives in that drive family over a time period; means for analyzing failure indicators associated with said reported error codes about said individual drive family including means for generating a hazard rate plot of one or more failure indicators over a time period.
-
-
19. A computer readable medium for performing reliability analysis on disk drive failure indicators, including program instructions for performing the steps of:
-
generating a reliability database containing information regarding disk drive identity information and error log information stored in an associated computer; selecting a particular drive family to study; extracting from said reliability database one or more error codes representing failure indicators about a particular drive or drive family which have been received over a time period. - View Dependent Claims (20, 21, 22)
-
-
23. A method of performing reliability analysis in disk drive failure indicators comprising:
-
generating a reliability database containing information regarding disk drive identity information and error log information stored in an associated computer; selecting a particular drive family to study from the reliability database; and extracting from said reliability database one or more error codes representing failure indicators about the particular drive family which have been received over a time period. - View Dependent Claims (24, 25)
-
-
26. A system of performing reliability analysis in disk drive failure indicators comprising:
-
means for generating a reliability database containing information regarding disk drive identity information and error log information stored in an associated computer; means for selecting a particular drive family to study from the reliability database; and means for extracting from said reliability database one or more error codes representing failure indicators about the particular drive family which have been received over a time period. - View Dependent Claims (27, 28)
-
-
29. A system of performing reliability analysis in disk drive failure indicators comprising:
-
a reliability database containing information regarding disk drive identity information and error log information stored in an associated computer; a reliability analysis utility configured to select a particular drive family to study from the reliability database; and the reliability analysis utility further configured to extract from the reliability database one or more error codes representing failure indicators about the particular drive family which have been received over a time period.
-
Specification