Fault-tolerant memory system with graceful degradation
First Claim
1. A method for tolerating faults during the storage of data words in a gigabit memory system having a multiplicity of individual memory integrated circuits (ICs) and a totality of storage words, each having a multiplicity of bytes to be stored in a plurality of different ICs, comprising the steps of:
- (a) dividing the totality of storage words into a first multiplicity of storage words and a second multiplicity of storage words;
(b) storing in the second multiplicity of storage words at least one updatable map of known-good space in each of the storage words in the memory;
(c1) first forming each data word with a first multi-byte portion of received user data, a second portion of at least one byte of EDAC-encoding data for detecting and correcting errors in the multiplicity of bytes of user data of the first portion of the same data word, and a third portion as a spare storage space with a sufficient length to allow a plurality of multiple-bit bursts of unusable storage bits in the user data and EDAC-encoding data portions of that same data word to be tolerated;
(c2) determining if at least one burst of unusable data bits exists in a next available storage word in the system memory;
(c3) then transferring the data bits in each burst of user data and EDAC-encoding data corresponding to an unusable memory burst portion in that word to the third portion of that same data word, prior to storage;
(d) after step (c3), storing each sequentially-received EDAC-encoded data word in that next-available one of the first multiplicity of storage words having sufficient known-good space for storage of a data word, as determined by reference to the at least one map in the second multiplicity of storage words;
(e) retrieving stored data from a sequence of the first multiplicity of storage words determined by reference to the at least one map in the second multiplicity of storage words; and
(f) then removing unusable burst and other errors in that data word, by (1) first transferring back to the proper burst locations within the same data word, as determined by reference to the associated map, the bit bursts from the third portion of that data word, (2) removing the third word portion to obtain a burst-transferred retrieved word, and then (3) utilizing the EDAC coding data of each retrieved burst-transferred word to correct at least one burst of user data error.
4 Assignments
0 Petitions
Accused Products
Abstract
A fault-tolerating memory system has a data memory with a large number (M+N) of data storage words each having a length greater than the length of user data to be stored in that word; the extra word length is used for at least an error-detecting-and-correcting (EDAC) code. The user data is stored in a smaller number (N) of the words, with the remaining number (M) of words being used to store a map of which portions, if any, of each word are not usable. The N words of user data storage can include S normal storage words and (N-S) spare words, each for use if one of the normal storage words has too many unusable portions. A portion of each word length can contain at least one spare word portion, to which a block of data can be moved if any bit of a like-sized portion of the normal storage word is unusable. The reliability of storage is greatly improved by extension of each word to add EDAC encoding and spare-bit portions, as well as by extension of depth to allow spare words to be present, along with high-reliability storage of word maps.
27 Citations
14 Claims
-
1. A method for tolerating faults during the storage of data words in a gigabit memory system having a multiplicity of individual memory integrated circuits (ICs) and a totality of storage words, each having a multiplicity of bytes to be stored in a plurality of different ICs, comprising the steps of:
-
(a) dividing the totality of storage words into a first multiplicity of storage words and a second multiplicity of storage words; (b) storing in the second multiplicity of storage words at least one updatable map of known-good space in each of the storage words in the memory; (c1) first forming each data word with a first multi-byte portion of received user data, a second portion of at least one byte of EDAC-encoding data for detecting and correcting errors in the multiplicity of bytes of user data of the first portion of the same data word, and a third portion as a spare storage space with a sufficient length to allow a plurality of multiple-bit bursts of unusable storage bits in the user data and EDAC-encoding data portions of that same data word to be tolerated; (c2) determining if at least one burst of unusable data bits exists in a next available storage word in the system memory; (c3) then transferring the data bits in each burst of user data and EDAC-encoding data corresponding to an unusable memory burst portion in that word to the third portion of that same data word, prior to storage; (d) after step (c3), storing each sequentially-received EDAC-encoded data word in that next-available one of the first multiplicity of storage words having sufficient known-good space for storage of a data word, as determined by reference to the at least one map in the second multiplicity of storage words; (e) retrieving stored data from a sequence of the first multiplicity of storage words determined by reference to the at least one map in the second multiplicity of storage words; and (f) then removing unusable burst and other errors in that data word, by (1) first transferring back to the proper burst locations within the same data word, as determined by reference to the associated map, the bit bursts from the third portion of that data word, (2) removing the third word portion to obtain a burst-transferred retrieved word, and then (3) utilizing the EDAC coding data of each retrieved burst-transferred word to correct at least one burst of user data error. - View Dependent Claims (2, 3, 4)
-
-
5. A method for storage of gigabytes of data, comprising the steps of:
-
(A) recording data in a memory by the steps of (1) receiving user data; (2) adding EDAC-encoding data, based upon the received user data, to form a storage word; (3) formatting the storage word by (a) adding a plurality of spare portions;
(b) recognizing any existing burst of unusable data bits in a data word about to be stored;
(c) transferring, prior to storage, the bits occurring in any plurality of unusable bit bursts of any of user data and EDAC-encoding data for a present word, to the spare portion of that word;(4) determining, by reference to an updatable map, acceptable storage word spaces having sufficient good bits available for storage of any formatted storage word; and (5) storing the formatted storage word in a next available acceptable space; and (B) playing data back from the memory by the steps of (1) retrieving the formatted storage word from its assigned memory storage space; (2) unformatting the storage word by transferring back to the proper burst locations within the present word the bits previously transferred into the spare portion and then removing all spare portions; (3) utilizing the EDAC-encoding data to detect if any error has been introduced into the user data, and to correct at least a portion of the detected error in that storage word; and (4) outputting the corrected user data. - View Dependent Claims (6, 7, 8, 9, 10, 11)
-
-
12. Data storage apparatus, comprising:
-
memory means for storing a gigabit multiplicity of multi-byte storage words, each having a user data portion of an integer number of nybbles in length and having a spare word portion; dynamic means for periodically mapping at least those of the storage words then having at least one burst of a plurality of bad storage bits therein, and for storing the bad-burst map until a next mapping; means for receiving and outputting user data; means for EDAC-encoding an assembled word of the received user data prior to storage and for adding a plurality of spare data portions to each data word for receiving up to a like plurality of bursts of data bits identified as having bit positions identical with those positions mapped as bad, and for operating upon a data word retrieved from said memory means first to restore transferred bit bursts to their original positions and then to detect error is the restored retrieved word and correct at least a portion of the detected error before transmittal to the outputting means; and means for determining, in cooperation with said mapping means, a location within said memory means into which, in a storage operation, to store a data word, including locations in the memory means spare word portion in the event that a storage word location initially selected for a data word can not acceptably contain the data word to be stored, and also for determining from which location, in a retrieval operation, to take a data word. - View Dependent Claims (13, 14)
-
Specification