Method and apparatus for efficient fault-tolerant disk drive replacement in raid storage systems
First Claim
1. An apparatus for improving fault tolerance of a storage system, the apparatus comprising:
- a. a first set of disk drives;
b. a second set of disk drives, the second set of disk drives in power-off condition;
c. a processing unit, the processing unit comprising;
i. a drive replacement logic unit, the drive replacement logic unit identifying a potential failing disk drive from the first set of disk drives that has not failed yet;
ii. a drive control unit, the drive control unit receiving an indication from the drive replacement logic unit to replace the potential failing disk drive with a spare disk drive from the second set of disk drives, the drive control unit powering-on the spare disk drive before the potential failing disk drive fails to replace the potential failing disk drive; and
d. a data copying mechanism to copy data to the powered-on spare disk drive, the data being a copy of data stored on the potential failing disk, wherein data copying mechanism stores data received by the storage system to the potential failing disk drive and to the spare disk drive.
13 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and a method for improving the fault tolerance of storage systems by replacing disk drives, which are about to fail, are disclosed. The set of disk drives in a storage system are monitored to identify failing disk drives. A processing unit identifies the failing disk drive and selects a spare disk drive to replace the failing disk drive. The selected spare disk drive is powered on, and data from the failing disk drive is copied to the selected spare disk drive. A memory unit stores attributes and sensor data for the disk drives in the storage system. The attributes and sensor data are used by the processing unit to identify a failing disk drive. Attributes for disk drives are obtained by using SMART, and sensor data is obtained from environmental sensors such as temperature and vibration sensors.
-
Citations
32 Claims
-
1. An apparatus for improving fault tolerance of a storage system, the apparatus comprising:
-
a. a first set of disk drives; b. a second set of disk drives, the second set of disk drives in power-off condition; c. a processing unit, the processing unit comprising; i. a drive replacement logic unit, the drive replacement logic unit identifying a potential failing disk drive from the first set of disk drives that has not failed yet; ii. a drive control unit, the drive control unit receiving an indication from the drive replacement logic unit to replace the potential failing disk drive with a spare disk drive from the second set of disk drives, the drive control unit powering-on the spare disk drive before the potential failing disk drive fails to replace the potential failing disk drive; and d. a data copying mechanism to copy data to the powered-on spare disk drive, the data being a copy of data stored on the potential failing disk, wherein data copying mechanism stores data received by the storage system to the potential failing disk drive and to the spare disk drive. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A processing unit for improving fault tolerance of a storage system, the storage system comprising
a first set of disk drives storing data and a second set of disk drives, the processing unit comprising: -
a. a drive replacement logic unit, the drive replacement logic unit identifying a potential failing disk drive from the first set of disk drives that has not failed yet; b. a drive control unit, the drive control unit receiving an indication from the drive replacement logic unit to replace the potential failing disk drive with a spare disk drive from the second set of disk drives, the drive control unit powering-on the spare disk drive before the potential failing disk drive fails to replace the potential failing disk drive; and c. a data copying mechanism to copy data to the powered-on spare disk drive, the data being a copy of data stored on the potential failing disk, wherein data copying mechanism stores data received by the storage system to the potential failing disk drive and to the spare disk drive. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method for improving fault tolerance of a storage system, the storage system comprising a first set of disk drives and a second set of disk drives in power-off condition, the method comprising the steps of:
-
monitoring the first set of disk drives to identify a potential failing disk drive from the first set of disk drives; powering-on a spare disk drive from the second set of disk drives on receipt of signal before the potential failing disk drive fails to replace the potential failing disk drive from the first set of disk drives; and copying data to the spare disk drive from the second set of disk drives, the data being a copy of data stored on the potential failing disk drive, wherein copying data further comprises storing data received by the storage system to the potential failing disk drive and to the spare disk drive. - View Dependent Claims (27, 28, 29, 30, 31, 32)
-
Specification