Server disk error recovery system
First Claim
1. A method of detecting and correcting errors in a mass storage system including a processor, a cluster of data drives and a parity drive, wherein data is stored as a plurality of data strips in said cluster of data drives, each said data strip including a plurality of contiguous data slices logically distributed across said cluster of data drives, and wherein a plurality of parity slices, each parity slice corresponding to each said data strip, are stored in said parity drive, the method including the steps of:
- retrieving one said data strip from sad cluster of data drives;
detecting a data drive failure affecting an erroneous one of said data slices of said one data strip;
retrieving one of said parity slices corresponding to said one data strip from said parity drive;
reconstructing a corrected data slice from said one data strip and said one parity slice, said corrected data slice for replacing said one erroneous data slice; and
whereinsaid data slices of said one data strip have been distributed among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with intermediate, zone of said data drives.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a mass storage system suitable for incorporation in a video-on-demand server that is capable of detecting and correct errors without a substantial increase in processor capacity or memory buffer size, and without any increase in disk input/output (I/O) bandwidth. The mass storage system includes a server controller, a cluster of data disk drives and a parity drive associated with the cluster of data disk drives. The controller provides video data streams to a number of viewers. Data is stored as contiguous data strips in the cluster of data drives. Each data strip includes a plurality of contiguous data slices logically distributed across the cluster of data drives. A plurality of parity slices, each parity slice corresponding to each data strip, is stored in the parity drive. When the failure of one of the data drives is detected, the parity drive is read in place of the failed drive. Hence, all functional data drives are read along with the parity drive before the erroneous slice is needed. A replacement data slice is reconstructed from the parity slice and "good" data slices. Alternatively, the data drives of the mass storage system are partitioned into multiple sub-clusters of data drives to minimize the impact of a failed drive. Accordingly, the mass storage system includes multiple parity drives, each parity drive associated with a sub-cluster of data drives. Such an arrangement is useful because data reconstruction is limited to the data slices and parity slices of the affected sub-cluster.
262 Citations
22 Claims
-
1. A method of detecting and correcting errors in a mass storage system including a processor, a cluster of data drives and a parity drive, wherein data is stored as a plurality of data strips in said cluster of data drives, each said data strip including a plurality of contiguous data slices logically distributed across said cluster of data drives, and wherein a plurality of parity slices, each parity slice corresponding to each said data strip, are stored in said parity drive, the method including the steps of:
-
retrieving one said data strip from sad cluster of data drives; detecting a data drive failure affecting an erroneous one of said data slices of said one data strip; retrieving one of said parity slices corresponding to said one data strip from said parity drive; reconstructing a corrected data slice from said one data strip and said one parity slice, said corrected data slice for replacing said one erroneous data slice; and
whereinsaid data slices of said one data strip have been distributed among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with intermediate, zone of said data drives. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of detecting and correcting errors in a mass storage system including a processor, at least two clusters of data drives and a corresponding number of parity drives, wherein data is stared as a plurality of data strips in said cluster of data drives, each said data strip including a plurality of contiguous data slices logically distributed across one said cluster of data drives, and wherein a plurality of parity slices, each parity slice corresponding to each said data strip, are stored in one said parity drive, the method including the steps of:
-
retrieving one said data strip from said one cluster of data drives; detecting a data drive failure affecting an erroneous one of said data slices of said one data strip; detecting a data drive failure affecting an erroneous one of said data slices of said on data strip; retrieving one of said parity slices corresponding to said one data strip from said parity drive; reconstructing a corrected data slice from said one data strip and said one parity slice, said corrected data slice for replacing said one erroneous data slice; and
whereinsaid data slices of said one data strip have been distributed among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with an intermediate zone of said data drives. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A mass storage system useful in association with a video-on-demand server having a controller coupled to a plurality of viewers, said mass storage system comprising:
-
a cluster of data drives for storing a plurality of data strips, each said data strip including a plurality of contiguous data slices logically distributed across said cluster of data drives by distributing said data slices of said one data strip among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with an intermediate zone of said cluster of data drives; and a parity drive for storing a plurality of parity slices, each parity slice corresponding to each said data strip, and wherein said data strips and parity slices are useful for reconstructing replacement data slices. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A mass storage system useful in association with a video server having a controller coupled to a plurality of viewers, said mass storage system comprising:
-
a plurality of sub-clusters of data drives for storing a plurality of data strips, each said data strip including a plurality of contiguous data slices logically distributed across one said cluster of data drives by distributing said data slices of said one data strip among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with an intermediate zone of said cluster of data drives; and a corresponding plurality of parity drives for storing a plurality of parity slices, each parity slice corresponding to each said data strip, and wherein said data strips and parity slices are useful for reconstructing replacement data slices. - View Dependent Claims (18, 19, 20)
-
-
21. A method of detecting and correcting errors in a mass storage system including a processor, a cluster of data drives and a parity drive, wherein data is stored as a plurality of data strips in said cluster of data drives, each said data strip including a plurality of contiguous data slices, the method including the steps of:
-
storing one said data strip in said cluster of data drives by distributing said data slices of said one data strip among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with an intermediate zone of said cluster of data drives; and storing a parity slice corresponding to said one data strip in said parity drive. - View Dependent Claims (22)
-
Specification