Failure-decoupled volume-level redundancy coding techniques
First Claim
1. A computer-implemented method, comprising:
- under the control of one or more computer systems configured with executable instructions,determining at least two failure-decorrelated subsets of a set of volumes such that;
each failure-decorrelated subset is capable of storing redundancy coded archives; and
failure of a first portion of one or more of the failure-decorrelated subsets, the first portion having at least one common member with a second portion of the one or more of the failure-decorrelated subsets that includes at least one member that does not exist in the first portion, does not affect data integrity of redundancy coded archives stored on at least the second portion of the one or more failure-decorrelated subsets;
processing archives to be stored on the set of volumes so as to determine, based on one or more characteristics of the archives, which of the failure-decorrelated subsets to commit the archives;
storing the processed archives on a first subset of volumes of the determined failure-decorrelated subset, the first subset of volumes having a number of members corresponding to a quorum quantity of a redundancy code to be applied to the processed archives;
applying the redundancy code to the processed archives to generate encoded shards; and
storing the encoded shards on a second subset of volumes of the determined failure-decorrelated subset, the second subset being outside the first subset.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques described and suggested herein include systems and methods for storing, indexing, and retrieving original data of data archives on data storage systems using redundancy coding techniques. For example, redundancy codes, such as erasure codes, may be applied to archives (such as those received from a customer of a computing resource service provider) so as allow the storage of original data of the individual archives available on a minimum of volumes, such as those of a data storage system, while retaining availability, durability, and other guarantees imparted by the application of the redundancy code. Sparse indexing techniques may be implemented so as to reduce the footprint of indexes used to locate the original data, once stored. The volumes may be apportioned into failure-decorrelated subsets, and archives stored thereto may be apportioned to such subsets.
-
Citations
20 Claims
-
1. A computer-implemented method, comprising:
under the control of one or more computer systems configured with executable instructions, determining at least two failure-decorrelated subsets of a set of volumes such that; each failure-decorrelated subset is capable of storing redundancy coded archives; and failure of a first portion of one or more of the failure-decorrelated subsets, the first portion having at least one common member with a second portion of the one or more of the failure-decorrelated subsets that includes at least one member that does not exist in the first portion, does not affect data integrity of redundancy coded archives stored on at least the second portion of the one or more failure-decorrelated subsets; processing archives to be stored on the set of volumes so as to determine, based on one or more characteristics of the archives, which of the failure-decorrelated subsets to commit the archives; storing the processed archives on a first subset of volumes of the determined failure-decorrelated subset, the first subset of volumes having a number of members corresponding to a quorum quantity of a redundancy code to be applied to the processed archives; applying the redundancy code to the processed archives to generate encoded shards; and storing the encoded shards on a second subset of volumes of the determined failure-decorrelated subset, the second subset being outside the first subset. - View Dependent Claims (2, 3, 4)
-
5. A system, comprising:
at least one computing device configured to implement one or more services, wherein the one or more services are configured to; determine, from a set of volumes, at least two failure-decorrelated subsets of the set of volumes, each of the failure-decorrelated subsets capable of storing data such that a failure of other failure-decorrelated subsets affects only data stored on a portion thereof; process incoming archives so as to determine which of the failure-decorrelated subsets to commit the archives; apply a redundancy code to the archives; and store the redundancy coded archives to the determined failure-decorrelated subset of volumes. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, when executed by one or more processors of a computer system, cause the computer system to at least:
-
generate a plurality of failure-decorrelated subsets of a set of volumes, each failure-decorrelated subset of the plurality; including at least one volume of the set of volumes that differs from any of the members of at least one of the other failure-decorrelated subsets; and having a number of members that is less than a number of members of the set of volumes; in response to receiving archives to be stored on the set of volumes, selecting, based on one or more attributes of a subset of the archives, at least one of the failure-decorrelated subsets on which to store the archives; and storing the archives on the selected failure-decorrelated subsets. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification