Proactively resilvering a striped disk array in advance of a predicted disk drive failure
First Claim
1. A method for proactively resilvering a disk array when a disk drive in the array is determined to have an elevated risk of failure, comprising:
- receiving time-series signals associated with the disk array during operation of the disk array;
analyzing the received time-series signals to identify at-risk disk drives that have an elevated risk of failure by;
using an inferential model trained on previously received time-series signals associated with the disk array to generate estimated values for the time-series signals based on correlations among the time-series signals,performing a pairwise differencing operation between actual values and the estimated values for the time-series signals to produce residuals, andperforming a sequential probability ratio test (SPRT) on the residuals to identify one or more at-risk disk drives that have an elevated risk of failure; and
when one or more disk drives are identified as being at-risk, performing a proactive resilvering operation on the disk array using a background process while the disk array continues to operate using the at-risk disk drives.
1 Assignment
0 Petitions
Accused Products
Abstract
The disclosed embodiments provide a system that proactively resilvers a disk array when a disk drive in the array is determined to have an elevated risk of failure. The system receives time-series signals associated with the disk array during operation of the disk array. Next, the system analyzes the time-series signals to identify at-risk disk drives that have an elevated risk of failure. If one or more disk drives are identified as being at-risk, the system performs a proactive resilvering operation on the disk array using a background process while the disk array continues to operate using the at-risk disk drives.
-
Citations
17 Claims
-
1. A method for proactively resilvering a disk array when a disk drive in the array is determined to have an elevated risk of failure, comprising:
-
receiving time-series signals associated with the disk array during operation of the disk array; analyzing the received time-series signals to identify at-risk disk drives that have an elevated risk of failure by; using an inferential model trained on previously received time-series signals associated with the disk array to generate estimated values for the time-series signals based on correlations among the time-series signals, performing a pairwise differencing operation between actual values and the estimated values for the time-series signals to produce residuals, and performing a sequential probability ratio test (SPRT) on the residuals to identify one or more at-risk disk drives that have an elevated risk of failure; and when one or more disk drives are identified as being at-risk, performing a proactive resilvering operation on the disk array using a background process while the disk array continues to operate using the at-risk disk drives. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory, computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for proactively resilvering a disk array when a disk drive in the array is determined to have an elevated risk of failure, the method comprising:
-
receiving time-series signals associated with the disk array during operation of the disk array; analyzing the received time-series signals to identify at-risk disk drives that have an elevated risk of failure by; using an inferential model trained on previously received time-series signals associated with the disk array to generate estimated values for the time-series signals based on correlations among the time-series signals, performing a pairwise differencing operation between actual values and the estimated values for the time-series signals to produce residuals, and performing a sequential probability ratio test (SPRT) on the residuals to identify one or more at-risk disk drives that have an elevated risk of failure; and when one or more disk drives are identified as being at-risk, performing a proactive resilvering operation on the disk array using a background process while the disk array continues to operate using the at-risk disk drives. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system that proactively resilvers a disk array when a disk drive in the array is determined to have an elevated risk of failure, comprising:
-
at least one processor and at least one associated memory; and a proactive resilvering mechanism that executes on the at least one processor, wherein during operation, the proactive resilvering mechanism; receives time-series signals associated with the disk array during operation of the disk array; analyzes the received time-series signals to identify at-risk disk drives that have an elevated risk of failure by; using an inferential model trained on previously received time-series signals associated with the disk array to generate estimated values for the time-series signals based on correlations among the time-series signals, performing a pairwise differencing operation between actual values and the estimated values for the time-series signals to produce residuals, and performing a sequential probability ratio test (SPRT) on the residuals to identify one or more at-risk disk drives that have an elevated risk of failure; and when one or more disk drives are identified as being at-risk, performs a proactive resilvering operation on the disk array using a background process while the disk array continues to operate using the at-risk disk drives. - View Dependent Claims (16, 17)
-
Specification