System and method for avoiding storage failures in a storage array system
First Claim
1. A storage array system, comprising:
- a plurality of data storage devices for storing data; and
a control unit for controlling input and/or output operations of the plurality of data storage devices;
wherein the control unit includes;
means for storing a history of self recovered errors for each of the plurality of data storage devices;
means for calculating an error rate for a specified interval of each of the plurality of data storage devices based on the history of errors; and
means for judging a reliability of operation of each of the plurality of data storage devices from the error rate for the specified interval.
2 Assignments
0 Petitions
Accused Products
Abstract
A higher reliable storage array system. A plurality of data storage devices store data; a spare storage device replaces one of the plurality of data storage devices; and a control unit controls an I/O operation of the plurality of data storage devices and the spare storage device. The control unit includes means for storing a history of self recovered errors of each one of the plurality of data storage devices, means for calculating an error rate of each of the plurality of data storage devices on the basis of the history of errors, means for judging a necessity to execute a preventive maintenance of each one of the plurality of data storage devices from the error rate, and means for executing the preventive maintenance. The storage array system judges if each of the plurality of data storage devices is in need of exchange or not at intervals of predetermined access size, from the error rate and an inclination of two error rates of adjacent intervals in addition to a total number of errors of the plurality of data storage devices. Furthermore, the storage array system reuses the data storage device judged to be in need of exchange for the time being. Accordingly, the storage array system can avoid disk failures leading to a data loss.
78 Citations
21 Claims
-
1. A storage array system, comprising:
-
a plurality of data storage devices for storing data; and
a control unit for controlling input and/or output operations of the plurality of data storage devices;
wherein the control unit includes;
means for storing a history of self recovered errors for each of the plurality of data storage devices;
means for calculating an error rate for a specified interval of each of the plurality of data storage devices based on the history of errors; and
means for judging a reliability of operation of each of the plurality of data storage devices from the error rate for the specified interval. - View Dependent Claims (2, 3, 4, 5)
means for detecting a number of errors and an access size of each one of the plurality of data storage devices respectively as part of the history of errors for each input and/or output operation.
-
-
3. The storage array system of claim 2, wherein:
-
the storing means further includes, means for determining a total number of errors, a total access size, and an access size when each error is detected, for each of the plurality of data storage devices, and a non-volatile memory for storing, for each of the plurality of data storage devices, the total number of errors, the total access size and the access size when each error is detected; and
wherein the calculating means calculates the error rate at intervals of predetermined access size from the total number of errors, the total access size, and the access size when each error is detected.
-
-
4. The storage array system of claim 3, wherein:
-
the non-volatile memory further stores a threshold value of error rate; and
the judging means judges the reliability of operation of each of the plurality of data storage devices by comparing the calculated error rate and the threshold value of error rate.
-
-
5. The storage array system of claim 3, wherein:
-
the calculating means further calculates two inclinations of two error rates of adjacent intervals;
the non-volatile memory further stores a threshold value of inclination of two error rates of adjacent intervals; and
the judging mean judges the reliability of operation of each of the plurality of data storage devices from the threshold value and the inclinations.
-
-
6. A storage array system, comprising:
-
a plurality of data storage devices for storing data;
a spare storage device for replacing one of the plurality of data storage devices; and
a control unit for controlling input and/or output operations of the plurality of data storage devices and the spare storage device;
wherein the control unit includes;
means for storing a history of self recovered errors for each of the plurality of data storage devices;
means for calculating an error rate for a specified interval for each of the plurality of data storage devices based on the history of errors;
means for judging a necessity to execute preventive maintenance of each of the plurality of data storage devices from the error rate for the specified interval; and
means for executing the preventive maintenance. - View Dependent Claims (7, 8)
reproducing means for reproducing data from one of the plurality of data storage devices judged to be in need of the preventive maintenance on the spare storage device.
-
-
8. The storage array system of claim 7, further comprising:
-
a redundant storage device for storing back-up data created from a set of the data sequentially addressed; and
wherein the reproducing mean regenerates the data from one of the plurality of data storage devices judged to be in need of the preventive maintenance on the spare storage device from the data stored on the remainder of the plurality of data storage devices and the back-up data.
-
-
9. A method for controlling a storage array system, comprising the steps of:
-
storing data onto a plurality of data storage devices;
storing a history of self recovered errors of each of the plurality of data storage devices;
calculating an error rate for a specified interval of each of the plurality of data storage devices based on the history of errors; and
judging a reliability of operation of each of the plurality of data storage devices from the error rate for the specified interval. - View Dependent Claims (10, 11, 12, 13, 14)
detecting a number of errors and an access size of each of the plurality of data storage devices as a part of the history of errors.
-
-
11. The method for controlling a storage array system of claim 10, wherein the calculating step includes the step of:
calculating the error rate by dividing the number of errors by the access size.
-
12. The method for controlling a storage array system of claim 10, further comprising the steps of:
-
determining a total number of errors, a total access size, and an access size when each error is detected, for each of the plurality of data storage devices;
storing the total number of errors, the total access size, and the access size when each error is detected onto a non-volatile memory; and
wherein the calculating step includes the step of calculating the error rate at intervals of predetermined access size from the total number of errors, the total access size, and the access size when each error is detected.
-
-
13. The method for controlling a storage array system of claim 12, further comprising the step of:
-
storing a threshold value of error rate in the non-volatile memory; and
wherein the judging step includes the step of judging the reliability of operation of each of the plurality of data storage devices by comparing the calculated error rate and the threshold value of error rate.
-
-
14. The method for controlling a storage array system of claim 12, further comprising the steps of:
-
calculating two inclinations of two error rates of adjacent intervals;
storing a threshold value of inclination of two error rates of adjacent intervals in the non-volatile memory; and
wherein the judging step includes the step of judging the reliability of operation of each of the plurality of data storage devices from the threshold value and the inclinations.
-
-
15. A method for controlling a storage array system, comprising the steps of:
-
storing data onto a plurality of data storage devices;
storing a history of self recovered errors of each of the plurality of data storage devices;
calculating an error rate for a specified interval for each one of the plurality of data storage devices based on the history of errors;
judging a necessity to execute a preventive maintenance of each one of the plurality of data storage devices from the error rate for the specified interval; and
executing the preventive maintenance. - View Dependent Claims (16, 17, 18, 19, 20, 21)
reproducing data from one of the plurality of data storage devices judged to be in need of the preventive maintenance on a spare storage device which replaces the one of the plurality of data storage devices.
-
-
17. The method for controlling a storage array system of claim 16, wherein the data storing step includes the steps of:
-
dividing the data into a plurality of data blocks;
addressing the plurality of data blocks in sequence;
creating a back-up data block from a set of the data blocks sequentially addressed;
storing the data blocks onto the plurality of data storage devices; and
storing the back-up data block onto a redundant storage device.
-
-
18. The method for controlling a storage array system of claim 17, wherein the reproducing step includes the steps of:
-
regenerating, from the data stored on the remainder of the plurality of data storage devices and the redundant data, the data on one of the plurality of data storage devices judged to be in need of the preventive maintenance; and
storing the regenerated data onto the spare storage device.
-
-
19. The method for controlling a storage array system of claim 16, wherein the executing step includes the step of:
naming, after the reproducing step, the one of the plurality of data storage devices judged to be in need of the preventive maintenance as a new spare storage device for replacing another storage devices.
-
20. The method for controlling a storage array system of claim 19, wherein the executing step includes the step of:
formatting, between the reproducing step and the naming step, a data structure of the one of the plurality of data storage devices judged to be in need of the preventive maintenance.
-
21. The method for controlling a storage array system of claim 16, wherein the executing step includes the step of:
naming the one of the plurality of data storage devices judged to be in need of the preventive maintenance as a copy storage device which contains the same data as the spare storage device.
Specification