Reducing concurrency bottlenecks while rebuilding a failed drive in a data storage system
First Claim
1. A method of providing RAID (Redundant Array of Independent Disks) data protection for a storage object in a data storage system, wherein the data storage system includes a storage processor and a set of physical drives communicably coupled to the storage processor, the method comprising:
- generating a RAID mapping table, wherein the RAID mapping table contains a plurality of RAID extents, wherein each RAID extent contained in the RAID mapping table indicates a plurality of drive extents for storing host data written to the storage object and related parity information, and wherein each drive extent comprises a contiguous region of non-volatile data storage in one of the physical drives;
in response to detecting that one of the physical drives has failed, concurrently rebuilding RAID extents in a concurrent rebuild list, wherein each RAID extent in the concurrent rebuild list indicates a drive extent of the failed one of the physical drives, and wherein for each one of the RAID extents in the concurrent rebuild list rebuilding includes i) recovering host data previously stored in the drive extent of the failed one of the physical drives indicated by the RAID extent, and ii) writing the recovered host data to a spare drive extent allocated to the RAID extent;
in response to detecting that rebuilding of one of the RAID extents in the concurrent rebuild list has completed, removing that one of the RAID extents from the concurrent rebuild list, and selecting a next RAID extent to replace the removed RAID extent in the concurrent rebuild list byi) forming a candidate set of RAID extents, wherein each RAID extent in the candidate set indicates a drive extent of the failed physical drive and has not been rebuilt,ii) calculating a relatedness score for each RAID extent in the candidate set with respect to the RAID extents remaining in the concurrent rebuild list, wherein the relatedness score indicates an amount of limitation with regard to concurrently rebuilding the RAID extent in combination with the RAID extents remaining in the concurrent rebuild list, andiii) selecting as the new RAID extent to replace the removed RAID extent in the concurrent rebuild list a RAID extent in the candidate set having a lowest relatedness score of the RAID extents in the candidate set.
9 Assignments
0 Petitions
Accused Products
Abstract
A concurrent rebuild list indicates RAID extents to be concurrently rebuilt in response to a physical drive failure. When rebuilding of a RAID extent in the list completes, a next RAID extent to add to the list is selected that has a lowest relatedness score in a candidate set of RAID extents. The relatedness score indicates an amount of limitation with regard to concurrently rebuilding the candidate RAID extent in combination with the RAID extents remaining in the concurrent rebuild list. The relatedness score may be a sum of a weighted write score indicating limits on concurrent write operations when rebuilding a candidate RAID extent in combination with the RAID extents remaining in the concurrent rebuild list, and a read score indicating limits on concurrent read operations when rebuilding the candidate RAID extent in combination with the RAID extents remaining in the concurrent rebuild list.
40 Citations
18 Claims
-
1. A method of providing RAID (Redundant Array of Independent Disks) data protection for a storage object in a data storage system, wherein the data storage system includes a storage processor and a set of physical drives communicably coupled to the storage processor, the method comprising:
-
generating a RAID mapping table, wherein the RAID mapping table contains a plurality of RAID extents, wherein each RAID extent contained in the RAID mapping table indicates a plurality of drive extents for storing host data written to the storage object and related parity information, and wherein each drive extent comprises a contiguous region of non-volatile data storage in one of the physical drives; in response to detecting that one of the physical drives has failed, concurrently rebuilding RAID extents in a concurrent rebuild list, wherein each RAID extent in the concurrent rebuild list indicates a drive extent of the failed one of the physical drives, and wherein for each one of the RAID extents in the concurrent rebuild list rebuilding includes i) recovering host data previously stored in the drive extent of the failed one of the physical drives indicated by the RAID extent, and ii) writing the recovered host data to a spare drive extent allocated to the RAID extent; in response to detecting that rebuilding of one of the RAID extents in the concurrent rebuild list has completed, removing that one of the RAID extents from the concurrent rebuild list, and selecting a next RAID extent to replace the removed RAID extent in the concurrent rebuild list by i) forming a candidate set of RAID extents, wherein each RAID extent in the candidate set indicates a drive extent of the failed physical drive and has not been rebuilt, ii) calculating a relatedness score for each RAID extent in the candidate set with respect to the RAID extents remaining in the concurrent rebuild list, wherein the relatedness score indicates an amount of limitation with regard to concurrently rebuilding the RAID extent in combination with the RAID extents remaining in the concurrent rebuild list, and iii) selecting as the new RAID extent to replace the removed RAID extent in the concurrent rebuild list a RAID extent in the candidate set having a lowest relatedness score of the RAID extents in the candidate set. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A data storage system that provides RAID (Redundant Array of Independent Disks) data protection for a storage object, comprising:
-
at least one storage processor including processing circuitry and a memory; a set of physical drives communicably coupled to the storage processor; and wherein the storage processor is configured and arranged to; generate a RAID mapping table, wherein the RAID mapping table contains a plurality of RAID extents, wherein each RAID extent contained in the RAID mapping table indicates a plurality of drive extents for storing host data written to the storage object and related parity information, and wherein each drive extent comprises a contiguous region of non-volatile data storage in one of the physical drives; in response to detecting that one of the physical drives has failed, concurrently rebuild RAID extents in a concurrent rebuild list, wherein each RAID extent in the concurrent rebuild list indicates a drive extent of the failed one of the physical drives, and wherein each one of the RAID extents in the concurrent rebuild list is rebuilt at least in part by i) recovering host data previously stored in the drive extent of the failed one of the physical drives indicated by the RAID extent, and ii) writing the recovered host data to a spare drive extent allocated to the RAID extent; in response to detecting that rebuilding of one of the RAID extents in the concurrent rebuild list has completed, remove that one of the RAID extents from the concurrent rebuild list, and select a next RAID extent to replace the removed RAID extent in the concurrent rebuild list by operating to i) form a candidate set of RAID extents, wherein each RAID extent in the candidate set indicates a drive extent of the failed physical drive and has not been rebuilt, ii) calculate a relatedness score for each RAID extent in the candidate set with respect to the RAID extents remaining in the concurrent rebuild list, wherein the relatedness score indicates an amount of limitation with regard to concurrently rebuilding the RAID extent in combination with the RAID extents remaining in the concurrent rebuild list, and iii) select as the new RAID extent to replace the removed RAID extent in the concurrent rebuild list a RAID extent in the candidate set having a lowest relatedness score of the RAID extents in the candidate set. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product, comprising:
a non-transitory computer readable medium storing program code for providing RAID (Redundant Array of Independent Disks) data protection for a storage object in a data storage system, wherein the data storage system includes a storage processor and a set of non-volatile data storage devices communicably coupled to the storage processor, the set of instructions, when carried out by at least one processor in the storage processor, causing the storage processor to perform a method of; generating a RAID mapping table, wherein the RAID mapping table contains a plurality of RAID extents, wherein each RAID extent contained in the RAID mapping table indicates a plurality of drive extents for storing host data written to the storage object and related parity information, and wherein each drive extent comprises a contiguous region of non-volatile data storage in one of the physical drives; in response to detecting that one of the physical drives has failed, concurrently rebuilding RAID extents in a concurrent rebuild list, wherein each RAID extent in the concurrent rebuild list indicates a drive extent of the failed one of the physical drives, and wherein for each one of the RAID extents in the concurrent rebuild list rebuilding includes i) recovering host data previously stored in the drive extent of the failed one of the physical drives indicated by the RAID extent, and ii) writing the recovered host data to a spare drive extent allocated to the RAID extent; in response to detecting that rebuilding of one of the RAID extents in the concurrent rebuild list has completed, removing that one of the RAID extents from the concurrent rebuild list, and selecting a next RAID extent to replace the removed RAID extent in the concurrent rebuild list by i) forming a candidate set of RAID extents, wherein each RAID extent in the candidate set indicates a drive extent of the failed physical drive and has not been rebuilt, ii) calculating a relatedness score for each RAID extent in the candidate set with respect to the RAID extents remaining in the concurrent rebuild list, wherein the relatedness score indicates an amount of limitation with regard to concurrently rebuilding the RAID extent in combination with the RAID extents remaining in the concurrent rebuild list, and iii) selecting as the new RAID extent to replace the removed RAID extent in the concurrent rebuild list a RAID extent in the candidate set having a lowest relatedness score of the RAID extents in the candidate set. - View Dependent Claims (14, 15, 16, 17, 18)
Specification