Method for dynamically modeling medium error evolution to predict disk failure
First Claim
1. A computer-implemented method for predicting disk failures in a redundant array of independent disks (RAID) environment, the method comprising:
- respectively receiving hard disk status information from each set of a plurality of sets of hard disks in a storage system, wherein the hard disk status information comprises a plurality of states within the set, a number of hard disks within the set, and an indicator of how many hard disks have failed within the set;
for each set of the plurality of sets of hard disks,calculating a transitional probability that a hard disk will fail within a predetermined period of time based on the respective hard disk status information, wherein calculating the transitional probability comprises;
querying a medium error history of hard disks within the set,using the medium error history to identify which of the hard disks has experienced at least one medium error,counting a number of transitions of each of the hard disks having the at least one medium error according to different transition types, each of the transition types representing a specific transition from one of the plurality of states to another of the plurality of states, wherein the one state and the other state are of same state or different states, andidentifying a number of transition types based on the counting, andgenerating a first risk profile for the hard disk based on the calculated transitional probability;
generating a second risk profile for a set of hard disks based on two or more of the first risk profiles; and
determining which of the plurality of sets of hard disks has a highest probability of failing within the predetermined period of time based on the second risk profile.
9 Assignments
0 Petitions
Accused Products
Abstract
A method for predicting disk failures in a RAID environment is provided. A data collection center receives hard disk status information from one or more sets of hard disks in a storage system. For each of the sets of hard disks, the data collection center calculates a transitional probability that a hard disk will fail within a predetermined period of time based on the hard disk status information, and generates a first risk profile for the hard disk based on the calculated transitional probability. The data collection center then generates a second risk profile for a set of hard disks based on two or more of the first risk profiles, and compares the second risk profiles of the sets of hard disks to determine which of the sets of hard disks has a highest probability of failing within the predetermined period of time.
15 Citations
20 Claims
-
1. A computer-implemented method for predicting disk failures in a redundant array of independent disks (RAID) environment, the method comprising:
-
respectively receiving hard disk status information from each set of a plurality of sets of hard disks in a storage system, wherein the hard disk status information comprises a plurality of states within the set, a number of hard disks within the set, and an indicator of how many hard disks have failed within the set; for each set of the plurality of sets of hard disks, calculating a transitional probability that a hard disk will fail within a predetermined period of time based on the respective hard disk status information, wherein calculating the transitional probability comprises; querying a medium error history of hard disks within the set, using the medium error history to identify which of the hard disks has experienced at least one medium error, counting a number of transitions of each of the hard disks having the at least one medium error according to different transition types, each of the transition types representing a specific transition from one of the plurality of states to another of the plurality of states, wherein the one state and the other state are of same state or different states, and identifying a number of transition types based on the counting, and generating a first risk profile for the hard disk based on the calculated transitional probability; generating a second risk profile for a set of hard disks based on two or more of the first risk profiles; and determining which of the plurality of sets of hard disks has a highest probability of failing within the predetermined period of time based on the second risk profile. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for predicting disk failures in RAID environment, the operations comprising:
-
respectively receiving hard disk status information from each set of a plurality of sets of hard disks in a storage system, wherein the hard disk status information comprises a plurality of states within the set, a number of hard disks within the set, and an indicator of how many hard disks have failed within the set; for each set of the plurality of sets of hard disks, calculating a transitional probability that a hard disk will fail within a predetermined period of time based on the respective hard disk status information, wherein calculating the transitional probability comprises; querying a medium error history of hard disks within the set, using the medium error history to identify which of the hard disks has experienced at least one medium error, counting a number of transitions of each of the hard disks having the at least one medium error according to different transition types, each of the transition types representing a specific transition from one of the plurality of states to another of the plurality of states, wherein the one state and the other state are of same state or different states, and identifying a number of transition types based on the counting, and generating a first risk profile for the hard disk based on the calculated transitional probability; generating a second risk profile for a set of hard disks based on two or more of the first risk profiles; and determining which of the sets of hard disks has a highest probability of failing within the predetermined period of time based on the second risk profile. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system, comprising:
-
a processor; and a memory coupled to the processor for storing instructions, which when executed from the memory, cause the processor to perform operations for predicting disk failures in RAID environment, the operations including; respectively receiving hard disk status information from each set of a plurality of sets of hard disks in a storage system, wherein the hard disk status information comprises a plurality of states within the set, a number of hard disks within the set, and an indicator of how many hard disks have failed within the set; for each set of the plurality of sets of hard disks, calculating a transitional probability that a hard disk will fail within a predetermined period of time based on the respective hard disk status information, wherein calculating the transitional probability comprises; querying a medium error history of hard disks within the set, using the medium error history to identify which of the hard disks has experienced at least one medium error, counting a number of transitions of each of the hard disks having the at least one medium error according to different transition types, each of the transition types representing a specific transition from one of the plurality of states to another of the plurality of states, wherein the one state and the other state are of same state or different states, and identifying a number of transition types based on the counting, and generating a first risk profile for the hard disk based on the calculated transitional probability; generating a second risk profile for a set of hard disks based on two or more of the first risk profiles; and determining which of the sets of hard disks has a highest probability of failing within the predetermined period of time based on the second risk profile. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification