Apparatus, system, and method for detecting and replacing failed data storage
First Claim
1. An apparatus to detect and replace failed data storage, the apparatus comprising:
- a read module that reads an error correcting code (ECC) chunk stored across a plurality of memory devices of an array of memory devices, the ECC chunk comprising both data and an error correcting code derived from the data and wherein at least one of the memory devices of the array stores parity data derived from data stored on each memory device of the plurality of memory devices storing the ECC chunk;
an ECC module that determines that the ECC chunk has more errors than are correctable using the error correcting code for the ECC chunk; and
an isolation module that tests individual memory devices of the plurality of memory devices to locate a failed memory device by;
substituting, within the data of the ECC chunk, replacement data for data of a first one of the memory devices to form a first substitute ECC chunk, the replacement data derived from the parity data;
substituting, within the data of the ECC chunk, replacement data for data of a second one of the memory devices to form a second substitute ECC chunk in response to the ECC module determining that the first substitute ECC chunk has more errors than are correctable using the error correcting code for the ECC chunk; and
determining that the second one of the memory devices is the failed memory device in response to the ECC module determining that errors in the second substitute ECC chunk are correctable using the error correcting code for the ECC chunk.
10 Assignments
0 Petitions
Accused Products
Abstract
An apparatus, system, and method are disclosed for detecting and replacing failed data storage. A read module reads data from an array of memory devices. The array includes two or more memory devices and one or more extra memory devices storing parity information from the memory devices. An ECC module determines, using an error correcting code (“ECC”), if one or more errors exist in tested data and if the errors are correctable using the ECC. The tested data includes data read by the read module. An isolation module selects a memory device in response to the ECC module determining that errors exists in the data read by the read module and that the errors are uncorrectable using the ECC. The isolation module also replaces data read from the selected memory device with replacement data and available data wherein the tested data includes the available data combined with the replacement data.
196 Citations
22 Claims
-
1. An apparatus to detect and replace failed data storage, the apparatus comprising:
-
a read module that reads an error correcting code (ECC) chunk stored across a plurality of memory devices of an array of memory devices, the ECC chunk comprising both data and an error correcting code derived from the data and wherein at least one of the memory devices of the array stores parity data derived from data stored on each memory device of the plurality of memory devices storing the ECC chunk; an ECC module that determines that the ECC chunk has more errors than are correctable using the error correcting code for the ECC chunk; and an isolation module that tests individual memory devices of the plurality of memory devices to locate a failed memory device by; substituting, within the data of the ECC chunk, replacement data for data of a first one of the memory devices to form a first substitute ECC chunk, the replacement data derived from the parity data; substituting, within the data of the ECC chunk, replacement data for data of a second one of the memory devices to form a second substitute ECC chunk in response to the ECC module determining that the first substitute ECC chunk has more errors than are correctable using the error correcting code for the ECC chunk; and determining that the second one of the memory devices is the failed memory device in response to the ECC module determining that errors in the second substitute ECC chunk are correctable using the error correcting code for the ECC chunk. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system to detect and replace failed data storage, the system comprising:
-
an array of memory devices, the array comprising two or more memory devices and one or more extra memory devices, the extra memory devices storing parity information from data stored on each memory device of the two or more memory devices; and a storage controller that; reads an error correcting code (ECC) chunk stored across the array of memory devices, the ECC chunk comprising both data and an error correcting code derived from the data; determines that the ECC chunk has more errors than are correctable using the error correcting code for the ECC chunk; tests individual memory devices of the plurality of memory devices to locate a failed memory device by iteratively substituting, within the data of the ECC chunk, replacement data derived from the parity information for individual memory devices of the array to form substitute ECC chunks until one of the substitute ECC chunks is correctable using the error correcting code for the ECC chunk; and determines that the memory device associated with the one substitute ECC chunk is the failed memory device.
-
-
22. A data storage method comprising:
-
reading an error correcting code (ECC) chunk stored across a plurality of memory devices of an array of memory devices, the ECC chunk comprising both data and an error correcting code derived from the data and wherein at least one of the memory devices of the array stores parity data derived from data stored on each memory device of the plurality of memory devices storing the ECC chunk; determining that the ECC chunk has more errors than are correctable using the error correcting code for the ECC chunk; testing individual memory devices of the plurality of memory devices to locate a failed memory device by iteratively substituting, within the data of the ECC chunk, replacement data derived from the parity data for individual memory devices of the plurality to form substitute ECC chunks until one of the substitute ECC chunks is correctable using the error correcting code for the ECC chunk; and determining that the memory device associated with the one substitute ECC chunk is the failed memory device.
-
Specification