Data storage device in-situ self test, repair, and recovery
First Claim
1. A computer implemented method for performing a set of operations on a data storage device in a Redundant Array of Independent Disk (RAID) array, the computer implemented method comprising:
- flagging, by an adapter, a data storage device as a suspect data storage device, wherein the flagging indicates a rejection due to an error;
suspending, by the adapter, the suspect data storage device from participation in the RAID array;
assigning, by the adapter, the suspect data storage device to a pool of data storage devices to be retested;
selecting, by the adapter, another data storage device from a pool of spare data storage devices, forming a selected data storage device;
rebuilding, by the adapter, contents of the suspect data storage device on the selected data storage device, forming a substitute data storage device;
assigning, by the adapter, the substitute data storage device to the RAID array;
invoking a diagnostic test on the suspect data storage device to produce a diagnostic result, wherein the diagnostic test runs in a background of the RAID array;
analyzing, by the adapter, the diagnostic result;
responsive to the diagnostic result not exceeding a threshold, repairing the suspect data storage device to form a repaired data storage device;
assigning, by the adapter, the repaired data storage device to the pool of spare data storage devices;
incrementing, by the adapter, a counter associated with the repaired data storage device, wherein the counter indicates a number of times the repaired data storage device has been repaired;
measuring, by the data storage device, a response interval for an expected communication from the adapter to form a response interval;
responsive to the response interval exceeding a predetermined threshold, generating a timeout error; and
wherein the error is the timeout error.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus, and computer program product for performing a set of operations on a data storage device is provided. A data storage device is flagged as suspect. The adapter suspends the suspect data storage device from participation in the RAID array, assigns the suspect data storage device to a pool of data storage devices to be retested, selects a data storage device from a pool of spare data storage devices, rebuilds contents of the suspect data storage device on the selected disk drive, assigns the substitute data storage device to the RAID array, invokes a diagnostic test on the suspect data storage device, and analyzes the diagnostic result. Responsive to the diagnostic result exceeding a threshold, the suspect data storage device is repaired. The adapter assigns the repaired data storage device to the pool of spare data storage devices and increments a counter of the repaired data storage device.
24 Citations
11 Claims
-
1. A computer implemented method for performing a set of operations on a data storage device in a Redundant Array of Independent Disk (RAID) array, the computer implemented method comprising:
-
flagging, by an adapter, a data storage device as a suspect data storage device, wherein the flagging indicates a rejection due to an error; suspending, by the adapter, the suspect data storage device from participation in the RAID array; assigning, by the adapter, the suspect data storage device to a pool of data storage devices to be retested; selecting, by the adapter, another data storage device from a pool of spare data storage devices, forming a selected data storage device; rebuilding, by the adapter, contents of the suspect data storage device on the selected data storage device, forming a substitute data storage device; assigning, by the adapter, the substitute data storage device to the RAID array; invoking a diagnostic test on the suspect data storage device to produce a diagnostic result, wherein the diagnostic test runs in a background of the RAID array; analyzing, by the adapter, the diagnostic result; responsive to the diagnostic result not exceeding a threshold, repairing the suspect data storage device to form a repaired data storage device; assigning, by the adapter, the repaired data storage device to the pool of spare data storage devices; incrementing, by the adapter, a counter associated with the repaired data storage device, wherein the counter indicates a number of times the repaired data storage device has been repaired; measuring, by the data storage device, a response interval for an expected communication from the adapter to form a response interval; responsive to the response interval exceeding a predetermined threshold, generating a timeout error; and wherein the error is the timeout error. - View Dependent Claims (2, 3, 4)
-
-
5. A computer program product comprising:
-
a non-transitory computer usable medium including computer usable program code for performing a set of operations on a data storage device in a Redundant Array of Independent Disk (RAID) array, the computer program product including instructions adapted to cause a computer to perform the following steps; flagging, by an adapter, a data storage device as a suspect data storage device, wherein the flagging indicates a rejection due to an error; suspending, by the adapter, the suspect data storage device from participation in the RAID array; assigning, by the adapter, the suspect data storage device to a pool of data storage devices to be retested; selecting, by the adapter, another data storage device from a pool of spare data storage devices, forming a selected data storage device; rebuilding, by the adapter, contents of the suspect data storage device on the selected data storage device, forming a substitute data storage device; assigning, by the adapter, the substitute data storage device to the RAID array; invoking a diagnostic test on the suspect data storage device to produce a diagnostic result, wherein the diagnostic test runs in a background of the RAID array; analyzing, by the adapter, the diagnostic result; responsive to the diagnostic result not exceeding a threshold, repairing the suspect data storage device to form a repaired data storage device; assigning, by the adapter, the repaired data storage device to the pool of spare data storage devices; incrementing, by the adapter, a counter associated with the repaired data storage device, wherein the counter indicates a number of times the repaired data storage device has been repaired; computer usable program code for measuring, by the data storage device, a response interval for an expected communication from the adapter to form a response interval; and computer usable program code for, responsive to the response interval exceeding a predetermined threshold, generating a timeout error; and wherein the error is the timeout error. - View Dependent Claims (6, 7, 8)
-
-
9. A data processing system for performing a set of operations on a data storage device in a Redundant Array of Independent Disk (RAID) array, comprising:
-
a bus system; a communications system connected to the bus system; a memory connected to the bus system, wherein the memory includes a set of instructions; and a processing unit connected to the bus system, wherein the processing unit executes the set of instructions to flag, by an adapter, a data storage device as a suspect data storage device, wherein the flagging indicates a rejection due to an error;
suspend, by the adapter, the suspect data storage device from participation in the RAID array;
assign, by the adapter, the suspect data storage device to a pool of data storage devices to be retested;
select, by the adapter, another data storage device from a pool of spare data storage devices, forming a selected data storage device;
rebuild, by the adapter, contents of the suspect data storage device on the selected data storage device, forming a substitute data storage device;
assign, by the adapter, the substitute data storage device to the RAID array;
invoke a diagnostic test on the suspect data storage device to produce a diagnostic result, wherein the diagnostic test runs in a background of the RAID array;
analyze, by the adapter, the diagnostic result;
responsive to the diagnostic result not exceeding a threshold, repair the suspect data storage device to form a repaired data storage device;
assign, by the adapter, the repaired data storage device to the pool of spare data storage devices;
increment, by the adapter, a counter associated with the repaired data storage device, wherein the counter indicates a number of times the repaired data storage device has been repaired;
measure, by the data storage device, a response interval for an expected communication from the adapter to form a response interval;
responsive to the response interval exceeding a predetermined threshold, generate a timeout error; and
wherein the error is the timeout error. - View Dependent Claims (10, 11)
-
Specification