Recovery from errors in a redundant array of disk drives
First Claim
1. In a machine-effected method of rebuilding data in a redundant array of a plurality of disk drives which includes an error affected disk drive, including the machine-executed steps of:
- detecting and indicating that a one of the disk drives is error affected;
measuring and indicating a rate of machine operations of the array;
establishing a rate of rebuilding data affected by the error affected disk drive which rate is predetermined inversely proportional to said measured and indicated rate of accesses;
intermediate predetermined ones of said accesses which is in said inverse proportion, rebuilding data in a predetermined one of the disk drives for replacing data in error.
0 Assignments
0 Petitions
Accused Products
Abstract
Fault tolerance in a redundant array of disk drives is degraded when error conditions exist in the array. Several methods for rebuilding data of the array to remove the degradation are described. Data rebuilding for entire disk drives and partial data rebuilds of disk drives are described. All rebuild methods tend to reduce the negative affect of using array resources for the data rebuild. In one method rebuilding occurs during idle time of the array. In a second method rebuilding is interleaved between current data area accessing operations of the array at a rate which is inversely proportional to activity level of the array. In a third method, the data are rebuilt when a data area being accessed is a data area needing rebuilding.
64 Citations
33 Claims
-
1. In a machine-effected method of rebuilding data in a redundant array of a plurality of disk drives which includes an error affected disk drive, including the machine-executed steps of:
-
detecting and indicating that a one of the disk drives is error affected; measuring and indicating a rate of machine operations of the array; establishing a rate of rebuilding data affected by the error affected disk drive which rate is predetermined inversely proportional to said measured and indicated rate of accesses; intermediate predetermined ones of said accesses which is in said inverse proportion, rebuilding data in a predetermined one of the disk drives for replacing data in error. - View Dependent Claims (2, 3)
-
-
4. In a machine-effected method of automatically maintaining fault tolerance in a parity array of disk drives including the machine-executed steps of:
-
detecting and indicating a degradation of the fault tolerance of the parity array; evaluating and indicating the current information handling activity of the parity array; establishing a plurality of data rebuild methods for the parity array for removing the fault tolerance degradation from the parity array; and analyzing the indicated current information handling activity of the parity array and selecting a one of the plurality of rebuild methods which effects a data rebuild without degrading performance of said current information handling activity more than a predetermined degradation level. - View Dependent Claims (5, 6, 7, 8)
-
-
9. idle times..].10. In a machine-effected method of automatically maintaining fault tolerance in a fault tolerant parity array of disk drives including the machine-executed steps of:
-
indicating that fault tolerance of the parity array is degraded by a plurality of error-affected addressable data units of the parity array which respectively need data rebuilding to reestablish the fault tolerance; performing a data area access operation to an addressable data unit in the parity array; while performing the data area access operation, detecting and indicating that the data access operation is accessing a one of the error-affected addressable data units needing a data rebuild; and
-
-
10. rebuilding the addressable data unit being accessed. 11. Apparatus having a redundant array of disk devices, the improvement including, in combination:
-
rebuild need evaluation means for detecting and indicating a degradation in the redundant array including indicating a one of the disk drives needs to have data rebuilt to such one disk drive; access rate means for measuring and indicating a rate of machine operations of said array; rebuild rate means coupled to said evaluation means and to said access rate means for responding to said indicated rebuild need and to said indicated operations rate for establishing and indicating a predetermined rate of rebuilding for the array for recovering from said degradation of fault tolerance; and rebuild means having a plurality of data rebuild effecting means and being coupled to said rebuild rate means and to said rebuild need means for effecting data rebuild in said one disk drive using a predetermined one of
-
-
11. said plurality of data rebuild effecting means. 12. Apparatus having a redundant array of disk drives as set forth in claim 11, further including, in combination:
-
control means in the apparatus for controlling access to the disk drives in the redundant array and for detecting and indicating when the array is currently not being accessed for a data handling operation; a first one of said rebuild effecting means being connected to said control means, to said rebuild rate means and to said rebuild need means for determining a rebuild can be scheduled and then activating the control means to give access to the redundant array to the first one of said rebuild effecting means for effecting a series of time space-apart rebuild - View Dependent Claims (13, 14, 15)
-
-
12. operations at said predetermined rate. 13. Apparatus having a redundant array of disk drives as set forth in claim 12, further including, in combination:
-
said rebuild rate means including means or indicating a plurality of rebuild rates, said plurality of rebuild rates increasing in rate values in an inverse proportion to said indicated operations rate and said rate values corresponding respectively to predetermined ranges of said indicated machine operations rates; and predetermined rate means in said rebuild rate means for indicated said predetermined rate as one of said plurality of rebuild rates which corresponds to a current one of the indicated machine operations rate.
-
-
17. said detecting of no current access..Iaddend..Iadd.21. In a machine-effected method of automatically maintaining fault tolerance in a fault tolerant parity array of disk drives, the machine-executed steps of:
-
indicating that fault tolerance of the parity array is degraded by one or more addressable data units of the parity array which respectively need data rebuilding to reestablish the fault tolerance; performing a data read operation in an addressable data unit in the parity array; while accessing the parity array to perform the data read operation, detecting and indicating that one of the error-affected addressable data units needing a data rebuild is being accessed; and
-
-
22. rebuild rate..Iaddend..Iadd.35. The controller of claim 34, wherein:
-
said evaluating means provides an indication when the parity array is in an idle state; and another of said rebuild effecting means is responsive to said idle state indication for causing at least one of the access mechanisms to continue the rebuilding of addressable error-affected data units upon completion of a rebuild of at least one error-affected data unit at said indicated data rebuild rate..Iaddend..Iadd.36. The controller of claim 34, wherein; said evaluating means provides an indication when the parity array is in an idle state; and another of said rebuild effecting means is responsive to said idle state indication for causing at least one of the access mechanisms to effect a rebuilding of addressable error-affected data units in the parity array whenever the parity array is idle and the fault tolerance of the array is degraded..Iaddend..Iadd.37. The controller of claim 34, wherein; another of said rebuild effecting means includes means responsive to said indication of fault tolerance degradation for detecting and indicating when an access mechanism of the parity array is accessing an error-affected data unit during a normal data access operation, and means responsive to said normal data access indication for causing said access mechanism to rebuild the accessed error-affected data unit accessed - View Dependent Claims (16, 18)
-
-
23. during such normal data access operation..Iaddend..Iadd.38. In a machine-effected method of automatically maintaining fault tolerance in a redundant array of disk drives, the machine-executed steps of:
-
(a) detecting and indicating a degradation of the fault tolerance of the array; (b) evaluating and indicating the current information handling activity of the array; and (c) following steps (a) and (b), analyzing the indicated current information handling activity of the array and selecting one of a plurality of available machine-executable rebuild methods which effects a data rebuild without degrading performance of said current information handling activity more than a predetermined
-
-
24. degradation level..Iaddend..Iadd.39. The machine-effected method set forth in claim 38, further including the machine-executed steps of:
-
in said evaluating step (b), determining the rate of information handling activity and selecting a data rebuild rate in a predetermined inverse ratio to the determined rate of information handling activity; and selecting a one of the plurality of rebuild methods as a variable rate rebuild method which effects data rebuilding at said selected rebuild rate of a predetermined number of addressable error-affected data units in the array..Iaddend..Iadd.40. The machine-effected method set forth in claim 39, further including the machine-executed steps of; completing a data rebuild using said variable rebuild method; detecting that the array is idle; and continuing the data rebuilding of additional ones of the addressable
-
-
25. error-affected data units so lone as the array is idle..Iaddend..Iadd.41. The machine-effected method set forth in claim 38, further including the machine-executed steps of:
-
in said evaluating step, determining that the array is idle; and selecting a one of the plurality of rebuild methods as an idle rebuild method to be selected whenever the array is idle and a rebuild need exists in the array..Iaddend..Iadd.42. The machine-effected method set forth in claim 38, further including the machine-executed steps of; performing a data area access operation in the array; while performing the data area access operation, performing said detecting and indicating step for detecting and indicating that the data access operation is accessing a one of the addressable data units needing a data rebuild; and selecting a one of the plurality of rebuild methods so as to carry out a rebuild of at least the data unit being accessed..Iaddend..Iadd.43. The machine-effected method of claim 38, wherein; said evaluating step (b) determines when access requests to the array are pending and when the array is idle; and in step (c), data is rebuilt at a first rate when the array is not idle, and is rebuilt at a second rate which is greater than said first rate when the array is idle..Iaddend..Iadd.44. A controller for automatically maintaining fault tolerance in an array of disk drives having respective access mechanisms for reading and writing data in the form of addressable data units in the disk drives of the array, said controller comprising; means for detecting a degradation of the fault tolerance of the array and providing an indication thereof; means for evaluating the current information handling activity of the array and providing an indication of such activity; means responsive to said fault tolerance degradation and information handling-activity indications for rendering operative one of a plurality of available rebuild-effecting means to cause at least one of the access mechanisms to effect a data rebuild without degrading performance of said current information handling activity more than a predetermined - View Dependent Claims (19, 20)
-
-
26. degradation level..Iaddend..Iadd.45. The controller of claim 44, wherein:
-
said evaluating means provides an indication of the rate of the current information handling activity of the array and establishes an indicated data rebuild rate for addressable error-affected data units in the array, which rebuild rate is inversely related to the indicated activity rate; and one of said rebuild effecting means is responsive to said indicated data rebuild rate to cause at least one of said access mechanisms to effect a rebuilding of addressable error-affected data units at said indicated data rebuild rate..Iaddend..Iadd.46. The controller of claim 45, wherein; said evaluating means provides an indication when the array is in an idle state; and one of said rebuild effecting means is responsive to said idle state indication for causing at least one of the access mechanisms to continue the rebuilding of addressable error-affected data units upon completion of a rebuild of at least one error-affected data unit at said indicated data
-
-
27. rebuild rate..Iaddend..Iadd.47. The controller of claim 44, wherein:
-
said evaluating means provides an indication when the array is in an idle state; and one of said rebuild effecting means is responsive to said idle state indication for causing at least one of the access mechanisms to effect a rebuilding of addressable error-affected data units in the array whenever the array is idle and the fault tolerance of the array is degraded..Iaddend..Iadd.48. The controller of claim 44, wherein; one of said rebuild effecting means includes means responsive to said indication of fault tolerance degradation for detecting and indicating when an access mechanism of the array is accessing an error-affected data unit during a normal data access operation, and means responsive to said normal data access indication for causing said access mechanism to rebuild the accessed error-affected data unit accessed
-
-
28. during such normal data access operation..Iaddend..Iadd.49. The controller of claim 44, wherein:
-
said evaluation means provides an indication of whether or not the array is idle; and at least one of said plurality of rebuild-effecting means causes at least one of the access mechanisms to effect a data rebuilding at a first rate when the array is not idle, and causes at least one of the access mechanisms to effect a data rebuild at a second rate higher than said first rate when the array is idle..Iaddend..Iadd.50. A data storage system, comprising; a redundant array of disk drives each having respective access mechanisms for reading and writing data in the form of addressable data units stored on the disk drives of the array, and a controller for said disk drives, said controller comprising means for detecting a degradation of the fault tolerance of the array and providing an indication thereof; means for evaluating the current information handling activity of the array and providing an indication of such activity; means responsive to said fault tolerance degradation and information handling-activity indications for rendering operative one of a plurality of available rebuild-effecting means to cause at least one of the access mechanisms to effect a data rebuild without degrading performance of said current information handling activity more than a predetermined degradation level..Iaddend..Iadd.51. The data storage system of claim 50, wherein; said evaluating means provides an indication of the rate of the current information handling activity of the array and establishes an indicated data rebuild rate for addressable error-affected data units in the array, which rebuild rate is inversely related to the indicated activity rate; and one of said rebuild effecting means is responsive to said indicated data rebuild rate to cause at least one of said access mechanisms to effect a rebuilding of addressable error-affected data units at said indicated data
-
-
29. rebuild rate..Iaddend..Iadd.52. The data storage system of claim 50, wherein:
-
said evaluating means provides an indication when the array is in an idle state; and one of said rebuild effecting means is responsive to said idle state indication for causing at least one of the access mechanisms to continue the rebuilding of addressable error-affected data units upon completion of a rebuild of at least one error-affected data unit at said indicated data rebuild rate..Iaddend..Iadd.53. The data storage system of claim 50, wherein; said evaluating means provides an indication when the array is in an idle state; and one of said rebuild effecting means is responsive to said idle state indication for causing at least one of the access mechanisms to effect a rebuilding of addressable error-affected data units in the parity array whenever the array is idle and the fault tolerance of the array is degraded..Iaddend..Iadd.54. The data storage system of claim 53, wherein; the rebuilding of addressable error-affected data units when the array is idle is performed at an associated rate, and when the array is not idle, at least one of said rebuild-effecting means is rendered operative to cause at least one of the access mechanisms to effect data rebuilds at a rate lower than said associated rate..Iaddend..Iadd.55. The data storage system of claim 50, wherein; one of said rebuild effecting means includes means responsive to said indication of fault tolerance degradation for detecting and indicating when an access mechanism of the array is accessing an error-affected data unit during a normal data access operation, and means responsive to said normal data access indication for causing said access mechanism to rebuild the accessed error-affected data unit accessed
-
-
30. during such normal data access operation..Iaddend..Iadd.56. A controller for automatically maintaining fault tolerance in an array of disk drives having respective access mechanisms for reading and writing data in the form of addressable data units in the disk drives of the array, said controller comprising:
-
a fault tolerance degradation detector; an information handling activity evaluator; and a selector responsive to said fault tolerance degradation detector and said information handling activity evaluator, said selector rendering operative one of a plurality of available rebuild-effecting mechanisms to cause at least one of the access mechanisms to effect a data rebuild without degrading performance more than a predetermined degradation level..Iaddend..Iadd.57. The controller of claim 56 wherein; said information handling activity evaluator establishes an indicated data rebuild rate for addressable error-affected data units in the array; and one of said rebuild effecting mechanisms is responsive to said indicated data rebuild rate to cause at least one of said access mechanisms to effect a rebuilding of addressable error affected data units at said indicated data rebuild rate..Iaddend..Iadd.58. The controller of claim 56 wherein; said information handling activity evaluator provides an indication when the array is in an idle state; and one of said rebuild effecting mechanisms is rendered operative by said selector in response to said idle state indication to cause at least one of said access mechanisms to effect a rebuilding of addressable error affected data units whenever the array is idle and the fault tolerance of
-
-
31. the array is degraded..Iaddend..Iadd.59. A data storage system comprising:
-
a redundant array of disk drives, each having a respective access mechanism for reading and writing data in the form of addressable data units in the disk drives of the array; and a controller for said array, said controller automatically maintaining fault tolerance in said array, said controller comprising; a fault tolerance degradation detector; an information handling activity evaluator; and a selector responsive to said fault tolerance degradation detector and said information handling activity evaluator, said selector rendering operative one of a plurality of available rebuild-effecting mechanisms to cause at least one of the access mechanisms to effect a data rebuild without degrading performance more than a predetermined degradation level..Iaddend..Iadd.60. The data storage system of claim 59 wherein; said information handling activity evaluator establishes an indicated data rebuild rate for addressable error-affected data units in the array; and one of said rebuild effecting mechanisms rendered operative by said selector is responsive to said indicated data rebuild rate to cause at least one of said access mechanisms to effect a rebuilding of addressable error affected data units at said indicated data rebuild
-
-
32. rate..Iaddend..Iadd.61. The data storage system of claim 59 wherein:
-
said information handling activity evaluator provides an indication when the array is in an idle state; and one of said rebuild effecting mechanisms is rendered operative by said selector in response to said idle state indication to cause at least one of said access mechanisms to effect a rebuilding of addressable error affected data units whenever the array is idle and the fault tolerance of the array is degraded..Iaddend..Iadd.62. In a machine-effected method of rebuilding data in a redundant array of disk drives which includes an error-affected disk drive, the machine-executed steps of; detecting that one of the disk drives is error-affected; measuring a rate of accesses to the disk drives; and rebuilding data affected by the error affected disk drive at a rate which is inversely related to said measured rate of accesses, the rebuilding of data in the error-affected disk drive occurring during times intermediate
-
-
33. of certain of said accesses..Iaddend..Iadd.63. An article of manufacture for use in a data storage system, which system includes a redundant array of disk drives of which one or more disk drives may become error-affected, an access mechanism for reading and writing data to the disk drives of the array, and processors for executing programs containing executable statements for controlling the reading and writing of data from and to the respective disk drives of the array,
said article of manufacture comprising computer-readable storage medium having computer program code embodied therein that is capable of causing the system to perform the steps of: -
detecting that one of the disk drives is error-affected; measuring a rate of accesses to the disk drives; and rebuilding data affected by the error-affected disk drive at a rate which is inversely related to said measured rate of accesses, the rebuilding of data in the error-affected disk drive occurring during times intermediate of certain of said accesses..Iaddend. - View Dependent Claims (21)
-
Specification