Layered redundancy encoding schemes for data storage
First Claim
1. A computer-implemented method for optimizing data storage comprising:
- applying, by one or more computer systems, a primary erasure coding scheme to data stored on a storage system comprising a plurality of hardware layers, at least one of which is a physical storage layer that comprises a plurality of hardware storage devices upon which at least a subset of the data is stored, and each of the remaining hardware layers of the plurality of hardware layers comprising a plurality of devices of one type, the one type selected from the group consisting of datacenters, storage servers, hardware storage devices, and storage device zones, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes;
determining, based at least in part on analyzing the correlated failure modes of at least the physical storage layer, a secondary erasure coding scheme for the physical storage layer; and
applying the secondary erasure coding scheme to the subset of the data stored upon the one or more hardware storage devices of the physical storage layer, wherein the secondary redundancy encoding includes one or more erasure codes that alter a stretch factor of the primary encoded data.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for optimizing data storage are disclosed herein. In particular, methods and systems for implementing redundancy encoding schemes with data storage systems are described. The redundancy encoding schemes may be scheduled according to system and data characteristics. The schemes may span multiple tiers or layers of a storage system. The schemes may be generated, for example, in accordance with a transaction rate requirement, a data durability requirement or in the context of the age of the stored data. The schemes may be designed to rectify entropy-related effects upon data storage. The schemes may include one or more erasure codes or erasure coding schemes. Additionally, methods and systems for improving and/or accounting for failure correlation of various components of the storage system, including that of storage devices such as hard disk drives, are described.
95 Citations
22 Claims
-
1. A computer-implemented method for optimizing data storage comprising:
-
applying, by one or more computer systems, a primary erasure coding scheme to data stored on a storage system comprising a plurality of hardware layers, at least one of which is a physical storage layer that comprises a plurality of hardware storage devices upon which at least a subset of the data is stored, and each of the remaining hardware layers of the plurality of hardware layers comprising a plurality of devices of one type, the one type selected from the group consisting of datacenters, storage servers, hardware storage devices, and storage device zones, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes; determining, based at least in part on analyzing the correlated failure modes of at least the physical storage layer, a secondary erasure coding scheme for the physical storage layer; and applying the secondary erasure coding scheme to the subset of the data stored upon the one or more hardware storage devices of the physical storage layer, wherein the secondary redundancy encoding includes one or more erasure codes that alter a stretch factor of the primary encoded data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method for optimizing data storage comprising:
-
applying, by one or more computer systems, a primary redundancy encoding scheme to data stored across a plurality of system layers of a storage system, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes; determining, based at least in part on analyzing one or more storage characteristics of at least a subset of the system layers, at least one secondary redundancy encoding scheme; and applying the secondary redundancy encoding scheme to data stored on the subset of system layers for which the storage characteristics are analyzed, wherein the secondary redundancy encoding scheme includes one or more erasure codes that alter a stretch factor of the primary encoded data. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A data storage system, comprising:
-
a plurality of storage layers; one or more processors; and memory, including instructions executable by the one or more processors to cause the computer system to at least; apply a primary redundancy encoding to data stored across at least a subset of the plurality of storage layers, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes; determine a secondary redundancy encoding from an analysis of reliability information for at least the subset of storage layers upon which the primary encoded data is stored; and apply the secondary redundancy encoding to the primary encoded data, wherein the secondary redundancy encoding includes one or more erasure codes that, when applied to the stored data, alter a stretch factor of the primary encoded data. - View Dependent Claims (14, 15, 16)
-
-
17. One or more non-transitory computer-readable storage media having collectively stored thereon executable instructions that, when executed by one or more processors of a computing resource provider'"'"'s computer system, cause the computer system to at least:
-
cause a primary encoding, using a primary error correction coding scheme, of data stored upon a storage subsystem of the computer system, the storage subsystem including a plurality of storage layers, thereby generating primary encoded data, the primary redundancy encoding including one or more erasure codes; analyze one or more failure modes of at least a subset of the plurality of storage layers to determine at least one secondary error correction coding scheme to be applied to at least a subset of the plurality of storage layers; and cause the determined secondary error correction coding scheme to be applied to the associated subset of storage layers, wherein the secondary error correction coding scheme includes one or more erasure codes that, when applied to the associated subset of storage layers, alter a stretch factor of the primary encoded data. - View Dependent Claims (18, 19, 20, 21, 22)
-
Specification