Layered redundancy encoding schemes for data storage
First Claim
1. A computer-implemented method for optimizing data storage comprising:
- applying, by one or more computer systems, a primary erasure coding scheme to original data stored on a storage system comprising a plurality of hardware tiers, at least one of which is a physical storage tier that comprises a plurality of hardware storage devices upon which at least a subset of the original data is stored, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes;
analyzing one or more modes of operation of at least the physical storage tier, the one or more modes corresponding to one or more types of partial failure of one or more hardware storage devices of the plurality of hardware storage devices in the physical storage tier;
determining, based at least in part on the analyzing of the one or more modes, a secondary erasure coding scheme for the physical storage tier; and
applying the secondary erasure coding scheme to the subset of the original data stored upon the one or more hardware storage devices of the physical storage tier, wherein the secondary redundancy encoding includes one or more erasure codes that alter a ratio between a size of the original data stored and the size of the primary encoded data needed to restore the original data.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for optimizing data storage are disclosed herein. In particular, methods and systems for implementing redundancy encoding schemes with data storage systems are described. The redundancy encoding schemes may be scheduled according to system and data characteristics. The schemes may span multiple tiers or layers of a storage system. The schemes may be generated, for example, in accordance with a transaction rate requirement, a data durability requirement or in the context of the age of the stored data. The schemes may be designed to rectify entropy-related effects upon data storage. The schemes may include one or more erasure codes or erasure coding schemes. Additionally, methods and systems for improving and/or accounting for failure correlation of various components of the storage system, including that of storage devices such as hard disk drives, are described.
85 Citations
20 Claims
-
1. A computer-implemented method for optimizing data storage comprising:
-
applying, by one or more computer systems, a primary erasure coding scheme to original data stored on a storage system comprising a plurality of hardware tiers, at least one of which is a physical storage tier that comprises a plurality of hardware storage devices upon which at least a subset of the original data is stored, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes; analyzing one or more modes of operation of at least the physical storage tier, the one or more modes corresponding to one or more types of partial failure of one or more hardware storage devices of the plurality of hardware storage devices in the physical storage tier; determining, based at least in part on the analyzing of the one or more modes, a secondary erasure coding scheme for the physical storage tier; and applying the secondary erasure coding scheme to the subset of the original data stored upon the one or more hardware storage devices of the physical storage tier, wherein the secondary redundancy encoding includes one or more erasure codes that alter a ratio between a size of the original data stored and the size of the primary encoded data needed to restore the original data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method for optimizing data storage comprising:
-
applying, by one or more computer systems, a primary redundancy encoding scheme to data stored across a plurality of system tiers of a storage system, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes; determining, based at least in part on analyzing one or more storage characteristics of at least a subset of the system tiers, at least one secondary redundancy encoding scheme; and applying the secondary redundancy encoding scheme to data stored on the subset of system tiers for which the storage characteristics are analyzed, wherein the secondary redundancy encoding scheme includes one or more erasure codes that alter a ratio between a size of the data stored and a size of the primary encoded data. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A data storage system, comprising:
-
a plurality of storage tiers; one or more processors; and memory, including instructions executable by the one or more processors to cause the computer system to at least; apply a primary redundancy encoding to original data stored across at least a subset of the plurality of storage tiers, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes; determine a secondary redundancy encoding from an analysis of reliability information for at least the subset of storage tiers upon which the primary encoded data is stored; and apply the secondary redundancy encoding to the primary encoded data, wherein the secondary redundancy encoding includes one or more erasure codes that, when applied to the stored data, alter a ratio between a size of the original data and a size of the primary encoded data. - View Dependent Claims (12, 13, 14)
-
-
15. One or more non-transitory computer-readable storage media having collectively stored thereon executable instructions that, when executed by one or more processors of a computing resource provider'"'"'s computer system, cause the computer system to at least:
-
cause a primary encoding, using a primary error correction coding scheme, of data stored upon a storage subsystem of the computer system, the storage subsystem including a plurality of storage tiers, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes; analyze one or more failure modes of at least a subset of the plurality of storage tiers; determine, based at least in part on the analysis of the one or more failure modes, at least one secondary error correction coding scheme to be applied to at least a subset of the plurality of storage tiers; and cause the determined secondary error correction coding scheme to be applied to the associated subset of storage tiers, wherein the secondary error correction coding scheme includes one or more erasure codes that alter a ratio between a size of the data and a size of the primary encoded data. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification