Server disk error recovery system

US 5,649,093 A
Filed: 05/22/1995
Issued: 07/15/1997
Est. Priority Date: 05/22/1995
Status: Expired due to Term

First Claim

Patent Images

1. A method of detecting and correcting errors in a mass storage system including a processor, a cluster of data drives and a parity drive, wherein data is stored as a plurality of data strips in said cluster of data drives, each said data strip including a plurality of contiguous data slices logically distributed across said cluster of data drives, and wherein a plurality of parity slices, each parity slice corresponding to each said data strip, are stored in said parity drive, the method including the steps of:

retrieving one said data strip from sad cluster of data drives;

detecting a data drive failure affecting an erroneous one of said data slices of said one data strip;

retrieving one of said parity slices corresponding to said one data strip from said parity drive;

reconstructing a corrected data slice from said one data strip and said one parity slice, said corrected data slice for replacing said one erroneous data slice; and

whereinsaid data slices of said one data strip have been distributed among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with intermediate, zone of said data drives.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a mass storage system suitable for incorporation in a video-on-demand server that is capable of detecting and correct errors without a substantial increase in processor capacity or memory buffer size, and without any increase in disk input/output (I/O) bandwidth. The mass storage system includes a server controller, a cluster of data disk drives and a parity drive associated with the cluster of data disk drives. The controller provides video data streams to a number of viewers. Data is stored as contiguous data strips in the cluster of data drives. Each data strip includes a plurality of contiguous data slices logically distributed across the cluster of data drives. A plurality of parity slices, each parity slice corresponding to each data strip, is stored in the parity drive. When the failure of one of the data drives is detected, the parity drive is read in place of the failed drive. Hence, all functional data drives are read along with the parity drive before the erroneous slice is needed. A replacement data slice is reconstructed from the parity slice and "good" data slices. Alternatively, the data drives of the mass storage system are partitioned into multiple sub-clusters of data drives to minimize the impact of a failed drive. Accordingly, the mass storage system includes multiple parity drives, each parity drive associated with a sub-cluster of data drives. Such an arrangement is useful because data reconstruction is limited to the data slices and parity slices of the affected sub-cluster.

262 Citations

22 Claims

1. A method of detecting and correcting errors in a mass storage system including a processor, a cluster of data drives and a parity drive, wherein data is stored as a plurality of data strips in said cluster of data drives, each said data strip including a plurality of contiguous data slices logically distributed across said cluster of data drives, and wherein a plurality of parity slices, each parity slice corresponding to each said data strip, are stored in said parity drive, the method including the steps of:
- retrieving one said data strip from sad cluster of data drives;
  
  detecting a data drive failure affecting an erroneous one of said data slices of said one data strip;
  
  retrieving one of said parity slices corresponding to said one data strip from said parity drive;
  
  reconstructing a corrected data slice from said one data strip and said one parity slice, said corrected data slice for replacing said one erroneous data slice; and
  
  whereinsaid data slices of said one data strip have been distributed among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with intermediate, zone of said data drives.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein said mass storage system is a video-on-demand server and said data is an encoded video stream.
  - 3. The method of claim 1 wherein said data and parity drives are magnetic or optical drives.
  - 4. The method of claim 1 wherein each of said data drives has an innermost zone and an outermost zone, said data slices are evenly distributed over said respective innermost zones and said respective outermost zones of said data drives, and the location of said parity slice on said parity drive is independent of the location of said data slices on said data drives.
  - 5. The method of claim 1 wherein the step of retrieving said data strip is in accordance with a "just-in-time" scheduling protocol.
  - 6. The method of claim 1 further comprising the step of partitioning said cluster of data drives into multiple sub-clusters of data drives, each said sub-cluster associated with a parity drive.

7. A method of detecting and correcting errors in a mass storage system including a processor, at least two clusters of data drives and a corresponding number of parity drives, wherein data is stared as a plurality of data strips in said cluster of data drives, each said data strip including a plurality of contiguous data slices logically distributed across one said cluster of data drives, and wherein a plurality of parity slices, each parity slice corresponding to each said data strip, are stored in one said parity drive, the method including the steps of:
- retrieving one said data strip from said one cluster of data drives;
  
  detecting a data drive failure affecting an erroneous one of said data slices of said one data strip;
  
  detecting a data drive failure affecting an erroneous one of said data slices of said on data strip;
  
  retrieving one of said parity slices corresponding to said one data strip from said parity drive;
  
  reconstructing a corrected data slice from said one data strip and said one parity slice, said corrected data slice for replacing said one erroneous data slice; and
  
  whereinsaid data slices of said one data strip have been distributed among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with an intermediate zone of said data drives.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The method of claim 7 wherein said mass storage system is a video-on-demand server and said data is an encoded video stream.
  - 9. The method of claim 7 wherein said data and parity drives are magnetic or optical drives.
  - 10. The method of claim 7 wherein each of said data drives has an innermost zone and an outermost zone, said data slices are evenly distributed over said respective innermost zones an said respective outermost zones of said data drives, and the location of said parity slice on said parity drive is independent of the location of said data slices on said data drives.
  - 11. The method of claim 7 wherein the step of retrieving said data strip is in accordance with a "just-in-time" scheduling protocol.

12. A mass storage system useful in association with a video-on-demand server having a controller coupled to a plurality of viewers, said mass storage system comprising:
- a cluster of data drives for storing a plurality of data strips, each said data strip including a plurality of contiguous data slices logically distributed across said cluster of data drives by distributing said data slices of said one data strip among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with an intermediate zone of said cluster of data drives; and
  
  a parity drive for storing a plurality of parity slices, each parity slice corresponding to each said data strip, and wherein said data strips and parity slices are useful for reconstructing replacement data slices.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The storage system of claim 12 wherein said data strips include an encoded video stream.
  - 14. The storage system of claim 12 wherein said data and parity drives are magnetic or optical drives.
  - 15. The storage system of claim 12 wherein each of said data drives has an innermost zone and an outermost zone, said data slices are distributed over said respective innermost zones and said respective outermost zones of said data drives, and the location of said parity slice on said parity drive is independent of the location of said data slices on said data drives.
  - 16. The storage system of claim 12 wherein said cluster of data drives is partitioned into multiple sub-clusters of data drives, said storage system further comprising of a plurality of parity drives, each parity drive associated with each said sub-cluster.

17. A mass storage system useful in association with a video server having a controller coupled to a plurality of viewers, said mass storage system comprising:
- a plurality of sub-clusters of data drives for storing a plurality of data strips, each said data strip including a plurality of contiguous data slices logically distributed across one said cluster of data drives by distributing said data slices of said one data strip among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with an intermediate zone of said cluster of data drives; and
  
  a corresponding plurality of parity drives for storing a plurality of parity slices, each parity slice corresponding to each said data strip, and wherein said data strips and parity slices are useful for reconstructing replacement data slices.
- View Dependent Claims (18, 19, 20)
- - 18. The storage system of claim 17 wherein said data strips include an encoded video stream.
  - 19. The storage system of claim 17 wherein said data and parity drives are magnetic or optical drives.
  - 20. The storage system of claim 17 wherein each of said data drives has an innermost zone and an outermost zone, said data slices are evenly distributed over said respective innermost zones and said respective outermost zones of said data drives, and the location of said parity slice on said parity drive is independent of the location of said data slices on said data drives.

21. A method of detecting and correcting errors in a mass storage system including a processor, a cluster of data drives and a parity drive, wherein data is stored as a plurality of data strips in said cluster of data drives, each said data strip including a plurality of contiguous data slices, the method including the steps of:
- storing one said data strip in said cluster of data drives by distributing said data slices of said one data strip among different zones of said data drives so that the average retrieval rate of said data slices approximates the access rate associated with an intermediate zone of said cluster of data drives; and
  
  storing a parity slice corresponding to said one data strip in said parity drive.
- View Dependent Claims (22)
- - 22. The method of claim 21 wherein each of said data drives has an innermost zone and an outermost zone, said data slices are evenly distributed over said respective innermost zones and said respective outermost zones of said data drives, and the location of said parity slice on said parity drive is independent of the location of said data slices on said data drives.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pure Flash Incorporated
Original Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Inventors
Hanko, James G., Wall, Gerard A.
Primary Examiner(s)
Beausoliel, Jr., Robert W.
Assistant Examiner(s)
Wright, Norman M.

Application Number

US08/445,820
Time in Patent Office

785 Days
Field of Search

395/182.04, 395/182.05, 395/182.03, 395/182.01, 395/185.01, 395/184.01, 395/185.07, 348/7
US Class Current

714/6.24
CPC Class Codes

G06F 11/1076   Parity data used in redunda...

G11B 20/1833   by adding special lists or ...

H04N 7/17336   Handling of requests in hea...

Server disk error recovery system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

262 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Server disk error recovery system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

262 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links