Method and apparatus for management of faulty data in a raid system
First Claim
1. An apparatus for managing faulty data in a multichannel memory system having a memory array controller for controlling access to the memory system, for memory failure detection, for memory error detection, and for reconstruction of a failed memory channel, the memory system having at least three memory channels where each channel includes one or more modules, each module having a failure and read error detection means, and each memory module being separately replaceable upon failure, the apparatus, accessible to the memory controller, comprising:
- a) a bad data table (BDT) for storing addresses of non-recoverable data blocks in a replacement memory module, where the replacement memory module stores data reconstructed from a failed first memory module in the memory system, and addresses of other non-recoverable data blocks associated with other memory modules in the memory system, the non-recoverable data blocks arising upon an occurrence of a fault in a second memory module in the memory system prior to completion of reconstruction and storage of data in the replacement memory module;
b) a write circuit operable after the initiation of a reconstruction of a failed first memory module for writing filler data to a bad block location in the second memory module when a fault is detected in the second memory module during reconstruction of data stored in the first memory module, for writing filler data to an associated location in the replacement memory module and for writing to the BDT addresses of bad data blocks representing non-recoverable data; and
c) detection circuitry for detecting memory access requests to addresses stored in the BDT, for returning a non-recoverable data error signal to a host system if the access request is a read request, and, if the access request is a write request, permitting the write to the BDT listed address and deleting the listed address from the BDT.
3 Assignments
0 Petitions
Accused Products
Abstract
A RAID (3, 4, or 5) disk array memory system incorporates a method and apparatus for the management of faulty data that eliminates the unintentional creation of spurious data when a double fault occurs. For example, when a disk drive failure occurs in one channel and the failed disk is replaced, reconstruction of the data can be achieved because a parity drive channel is provided for correcting errors. If a read error occurs during reconstruction of the failed disk data, the block corresponding to the error block does not allow the reconstruction of the corresponding failed disk block. To prevent the misuse of the two data blocks, a bad data table (BDT) is constructed that lists the addresses of the block just read and the block to be reconstructed. Also a standard filler block is written into the two bad blocks and a new parity block is created. The addresses of all access requests to the memory array are compared with the BDT and, if not listed, the access proceeds. If an address is listed, an error signal is returned. For a listed write request, the bad block address is deleted from the BDT, new data written into the block and a new parity block computed and stored.
-
Citations
16 Claims
-
1. An apparatus for managing faulty data in a multichannel memory system having a memory array controller for controlling access to the memory system, for memory failure detection, for memory error detection, and for reconstruction of a failed memory channel, the memory system having at least three memory channels where each channel includes one or more modules, each module having a failure and read error detection means, and each memory module being separately replaceable upon failure, the apparatus, accessible to the memory controller, comprising:
-
a) a bad data table (BDT) for storing addresses of non-recoverable data blocks in a replacement memory module, where the replacement memory module stores data reconstructed from a failed first memory module in the memory system, and addresses of other non-recoverable data blocks associated with other memory modules in the memory system, the non-recoverable data blocks arising upon an occurrence of a fault in a second memory module in the memory system prior to completion of reconstruction and storage of data in the replacement memory module; b) a write circuit operable after the initiation of a reconstruction of a failed first memory module for writing filler data to a bad block location in the second memory module when a fault is detected in the second memory module during reconstruction of data stored in the first memory module, for writing filler data to an associated location in the replacement memory module and for writing to the BDT addresses of bad data blocks representing non-recoverable data; and c) detection circuitry for detecting memory access requests to addresses stored in the BDT, for returning a non-recoverable data error signal to a host system if the access request is a read request, and, if the access request is a write request, permitting the write to the BDT listed address and deleting the listed address from the BDT. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for managing faulty data in a memory system having memory error detection and single memory module correction means, the memory system having at least three memory modules capable of detecting failure and read errors, each memory module being separately replaceable upon failure, the method comprising the steps of:
-
a) detecting a first read error of a memory module; b) determining if the first read error is due to a failure of the memory module and, if so, replacing the failed memory module and using the single memory module correction means for reconstructing data and storing the reconstructed data in the replacement memory module; c) monitoring all other memory modules during the reconstruction process for a second read error that would prevent reconstructing data for a first block in the replaced memory module where the second read error is associated with a second block in a second memory module, if such a second read error is detected in a second memory module in the memory system at a second block during the reconstruction of a first block in the replaced memory module and, if neither the first nor second block is a parity block, i) entering addresses of the first and second blocks into a bad data table (BDT), ii) writing a filler block to both the first and the second blocks, and iii) computing and replacing a corresponding parity block in an associated stripe of the memory system using the first and the second blocks together with all corresponding data blocks in the associated stripe of the memory system, otherwise, i) writing a filler block to whichever of the first block and the second block is a data block, and ii) computing and writing a corresponding parity block to the other of the first and second block which is designated to contain parity information using the filler block with all corresponding data blocks, and, then continuing reconstructing data in the replaced memory module until finished; and d) monitoring all memory access requests to the memory system by comparing a requested address to the BDT listed addresses, and, if the address is not listed in the BDT, allowing the access request to proceed, otherwise checking if the request is a write request, and, if not, returning a non-recoverable data error signal, otherwise allowing the write request to proceed so that an addressed filler block is replaced by a valid data block, and then computing and entering a new corresponding parity data block, whereby data stored in blocks in a same group as the first and second block m a y be recovered if a single subsequent failure occurs in any module in the memory system using the computed parity information, filler data and remaining group data. - View Dependent Claims (9, 10, 11)
-
-
12. A memory array system with a faulty data management system handling multiple faults, comprising:
-
a) a memory array including at least three memory modules, each memory module being separately replaceable upon failure, and each module having a failure and read error detector; b) a memory array controller for coupling the memory array to a memory bus, for Controlling access to the memory array, for memory array failure detection and single memory module correction and for communicating memory status over the memory bus, the memory array controller further having, i) a bad data table (BDT) for storing addresses of non-recover able data blocks detected during a reconstruction of a first memory module, the reconstruction due to the occurrence of a fault in a first memory module requiring the replacement thereof and the bad data table for storing addresses of non-recoverable data blocks arising after a subsequent fault is detected in a second memory module prior to completion of the reconstruction of data stored in the first memory module, ii) write circuitry for writing filler data blocks to blocks corresponding to non-recoverable data blocks in the replaced first memory module and the second memory module and for writing non-recoverable data block addresses to the BDT, and iii) a comparator for detecting memory access requests to bad data blocks by comparing a memory access request address with the BDT stored addresses, for returning a non-recoverable error signal to a requesting agent if the access request is a read request and the address is stored in the BDT, and for permitting the access request to proceed if the request is for a write access. - View Dependent Claims (13, 14, 15, 16)
-
Specification