Method and system for rapidly recovering data from a “dead” disk in a RAID disk group
First Claim
1. A method comprising:
- responsive to identifying a particular mass storage device in a redundancy group of mass storage devices as incapable of servicing client-initiated requests in a timely manner;
automatically allocating a spare mass storage device to replace the particular mass storage device in the redundancy group of mass storage devices;
generating a disk cookie for the spare mass storage device, the disk cookie used to generate a data validity tag for indicating the validity of data written to the spare mass storage device;
forwarding client-initiated write requests directed to the particular mass storage device to the spare mass storage device for servicing; and
initiating a device-to-device copy operation to systematically read data from the particular mass storage device and write the data to the spare mass storage device without overwriting data on the spare mass storage device with stale data from the particular mass storage device.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for rapidly recovering data from a failed disk in a RAID disk group are disclosed. According to one aspect of the present invention, a RAID-based storage system identifies a particular disk in a RAID disk group as a “dead” disk (e.g., incapable of servicing client-initiated requests in a timely manner). Accordingly, a spare disk is allocated to replace the “dead” disk and client-initiated read/write requests are directed to the spare disk for servicing. In addition, a disk-to-disk copy operation is initiated. Without overwriting valid data on the target disk with stale data from the “dead” disk, the disk-to-disk copy operation copies data from the “dead” disk to the target by directly reading data from the “dead” disk while reconstructing only the data that cannot be read directly from the “dead” disk.
-
Citations
26 Claims
-
1. A method comprising:
-
responsive to identifying a particular mass storage device in a redundancy group of mass storage devices as incapable of servicing client-initiated requests in a timely manner; automatically allocating a spare mass storage device to replace the particular mass storage device in the redundancy group of mass storage devices; generating a disk cookie for the spare mass storage device, the disk cookie used to generate a data validity tag for indicating the validity of data written to the spare mass storage device; forwarding client-initiated write requests directed to the particular mass storage device to the spare mass storage device for servicing; and initiating a device-to-device copy operation to systematically read data from the particular mass storage device and write the data to the spare mass storage device without overwriting data on the spare mass storage device with stale data from the particular mass storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A storage system comprising:
-
controller logic to automatically allocate a spare mass storage device to replace a particular mass storage device in a redundancy group of mass storage devices in response to identifying the particular mass storage device as incapable of servicing client-initiated access requests in a timely manner; data validity tag generation logic to generate data validity tags to indicate the validity of data written to the spare mass storage device, the data validity tags based at least in part on a disk cookie associated with the spare mass storage device; and read/write hardware logic to (i) forward client-initiated write requests directed to the particular mass storage device to the spare mass storage device for servicing, and (ii) initiate a device-to-device copy operation to systematically copy data from the particular mass storage device to the spare mass storage device without overwriting data on the spare mass storage device with stale data from the particular mass storage device. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for rapidly recovering data from a failing disk in
a RAID disk group, the method comprising: -
allocating a target disk, selected from one or more spare disks, to replace the failing disk in the RAID-disk group; generating a disk cookie for the target disk, the disk cookie used to generate a data validity tag for indicating the validity of data written to the target disk; preventing the failing disk from servicing client-initiated access requests by forwarding client-initiated write requests to the target disk for servicing, and forwarding client-initiated read requests to the target disk for servicing only if the client-initiated read request is directed to data at a disk block of the failing disk that has been copied to a corresponding disk block of the target disk as part of a disk-to-disk copy operation; and systematically copying data from the failing disk to the target disk, as part of a disk-to-disk copy operation, without overwriting valid data on the target disk. - View Dependent Claims (22, 23, 24, 25)
-
-
26. A machine-readable medium storing instructions that, when executed by the machine, cause the machine to:
-
automatically allocate a spare mass storage device to replace a particular mass storage device in a redundancy group of mass storage devices, the particular mass storage device having been identified as incapable of servicing client-initiated access requests in a timely manner; generate a disk cookie for the spare mass storage device, the disk cookie used to generate a data validity tag for indicating the validity of data written to the spare mass storage device; forward client-initiated write requests directed to the particular mass storage device to the spare mass storage device for servicing; and initiate a device-to-device copy operation to systematically copy data from the particular mass storage device to the spare mass storage device without overwriting data on the spare mass storage device with stale data from the particular mass storage device.
-
Specification