Mechanism for correcting errors beyond the fault tolerant level of a raid array in a storage system
First Claim
Patent Images
1. A method for correcting unrecoverable errors on storage devices connected to a storage system, the method comprising:
- identifying, by the storage system during a data access request to the storage devices, a data block having an unrecoverable error, the unrecoverable error cannot be corrected by an underlying RAID protection technique at the storage system;
providing an indicator within a close proximity of the data block having the unrecoverable error that the data block is invalid, the indicator stored at a location so that when an I/O is issued to the data block having the unrecoverable error, the indicator is read as part of the same I/O;
protecting the indicator by the underlying RAID protection technique; and
recovering the data block having the unrecoverable error without initiating a consistency check operation in the storage system by obtaining a good copy of the data block having the unrecoverable error asynchronously to the data access request to the storage devices during which the data block having the unrecoverable error was identified.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the present invention provide novel, reliable and efficient technique for tracking, tolerating and correcting unrecoverable errors (i.e., errors that cannot be recovered by the existing RAID protection schemes) in a RAID array by reducing the need to perform drastic recovery actions, such as a file system consistency check, which typically disrupts client access to the storage system. Advantageously, ability to tolerate and correct errors in the RAID array beyond the fault tolerance level of the underlying RAID technique increases resiliency and availability of the storage system.
-
Citations
31 Claims
-
1. A method for correcting unrecoverable errors on storage devices connected to a storage system, the method comprising:
-
identifying, by the storage system during a data access request to the storage devices, a data block having an unrecoverable error, the unrecoverable error cannot be corrected by an underlying RAID protection technique at the storage system; providing an indicator within a close proximity of the data block having the unrecoverable error that the data block is invalid, the indicator stored at a location so that when an I/O is issued to the data block having the unrecoverable error, the indicator is read as part of the same I/O; protecting the indicator by the underlying RAID protection technique; and recovering the data block having the unrecoverable error without initiating a consistency check operation in the storage system by obtaining a good copy of the data block having the unrecoverable error asynchronously to the data access request to the storage devices during which the data block having the unrecoverable error was identified. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for correcting unrecoverable errors in a storage array connected to a storage system for storing data, the method comprising:
-
performing a data access request to the storage array; identifying, by a storage module, a data block having an error; initiating, by the storage module, a recovery operation to recover the data block that has the error; determining, by the storage module, that the storage array has an unrecoverable error, wherein unrecoverable errors are errors that cannot be corrected by an underlying RAID protection technique of the storage module; obtaining a good copy of the data block having the unrecoverable error asynchronously to the data access request to the storage array during which the data block having the unrecoverable error was identified, the obtaining comprising; comparing a file system write signature stored within a copy of the data block and a received file system write signature from a file system upon reading the copy of the data block; if a match is detected, then using the copy of the data block as the good copy to correct the unrecoverable error; and if a match is not detected, then not using the copy of the data block as the good copy to correct the unrecoverable error; and writing the good copy of the data block to correct the error in the storage array. - View Dependent Claims (23, 24)
-
-
25. A method for correcting unrecoverable errors in a storage array connected to a storage system for storing data, the method comprising:
-
identifying, by a RAID system during a data access request, a data block having an unrecoverable error that cannot be corrected by an underlying RAID technique at the storage system; providing an indicator within close proximity of the data block having the unrecoverable error that the data block has an error; protecting the indicator by the underlying RAID technique, thereby reliably maintaining information about data blocks having unrecoverable errors; obtaining a good copy of the data block having the unrecoverable error; and generating a new value for the data block, comprising; if the error is a media error or a missing block error, then writing an identifiable pattern to the data block and updating data integrity information to indicate that the data integrity information cannot be used to verify data in the data block; if the error is a checksum error, then filling the data block with original data read from a storage device where the data block resides and updating data integrity information to indicate that the data integrity information cannot be used to verify data in the data block; and if the error is a lost write error, then filling the data block with original data. - View Dependent Claims (26, 27)
-
-
28. A storage system for correcting unrecoverable errors in a storage array, the system comprising:
-
a storage module that maintains a data protection mechanism, the storage module configured to identify during a data access request, a data block having an unrecoverable error condition that cannot be corrected by the data protection mechanism; a pseudo-bad block management module configured to set an indicator within a close proximity to the data block having the unrecoverable error condition and to protect the indicator by the data protection mechanism; a module for opportunistic error recovery configured to perform error recovery to identify a good copy of the data block having the unrecoverable error asynchronously to the data access request during which the unrecoverable error was encountered, and to recover the data block having the unrecoverable error; a first mirroring module executed at the storage system, the first mirroring module configured to send copies of data blocks to a destination storage system connected to the storage system via a network; and a second mirroring module executed at the destination storage system configured to store copies of the data maintained at the storage system, the second mirroring module configured to provide a copy of the data blocks upon request by the first mirroring module for opportunistic error recovery, at least some of at least one of the storage module, the pseudo-bad block management module, the module for opportunistic error recovery, the first mirroring module, and the second mirroring module implemented at least in part via a processor of the storage system. - View Dependent Claims (29, 30, 31)
-
Specification