Data retrieval optimization for redundancy coded data storage systems with static redundancy ratios
First Claim
1. A computer-implemented method, comprising:
- under the control of one or more computer systems configured with executable instructions,generating, from a plurality of received archives using a redundancy code, a set of shards representing the plurality of archives, a minimum quorum quantity of the shards in the set being usable, by the redundancy code, to generate original data of the archives, the set of shards including at least;
identity shards that contain the original data of the plurality of archives, andencoded shards representing an encoded form of the original data; and
storing the set of shards on a set of storage devices, the set of storage devices having a quantity of storage devices that is associated with the minimum quorum quantity, such that a quantity of shards of the set of shards stored on the storage devices is an integer multiple of the quantity of storage devices, the integer multiple being two or greater;
in response to receiving a request for at least some of the stored plurality of archives, at least;
determining at least one of the respective storage devices on which a respective identity shard corresponding to the requested archives is stored;
determining performance characteristics for the determined storage device; and
if the determined performance characteristics are insufficient to complete retrieval of the respective identity shard within a timeframe determined for the retrieval, retrieve the requested archives by at least;
retrieving the requested archives from the determined storage devices having the corresponding identity shard; and
augmenting the retrieval of the requested archives by generating, using the redundancy code, original data corresponding to the requested archives from the shards stored on at least a portion of a remainder of the storage devices of the plurality of storage devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques described and suggested herein include systems and methods for improving data performance characteristics for data archives stored on data storage systems using redundancy coding techniques, without necessitating expansion of the implementing data storage system. For example, redundancy coded shards, which may include identity shards that contain unencoded original data of archives, may be configured such that a variable number of the shards can be leveraged to meet performance requirements for retrieval requests associated with the archives stored and/or encoded therein. Multiple shards may be assigned to devices in an existing infrastructure to improve performance characteristics without changing redundancy code parameters. Implementing systems may monitor random access rates, capabilities, and burdens, so as to adaptively account for changes to some or all of the monitored parameters.
-
Citations
20 Claims
-
1. A computer-implemented method, comprising:
under the control of one or more computer systems configured with executable instructions, generating, from a plurality of received archives using a redundancy code, a set of shards representing the plurality of archives, a minimum quorum quantity of the shards in the set being usable, by the redundancy code, to generate original data of the archives, the set of shards including at least; identity shards that contain the original data of the plurality of archives, and encoded shards representing an encoded form of the original data; and storing the set of shards on a set of storage devices, the set of storage devices having a quantity of storage devices that is associated with the minimum quorum quantity, such that a quantity of shards of the set of shards stored on the storage devices is an integer multiple of the quantity of storage devices, the integer multiple being two or greater; in response to receiving a request for at least some of the stored plurality of archives, at least; determining at least one of the respective storage devices on which a respective identity shard corresponding to the requested archives is stored; determining performance characteristics for the determined storage device; and if the determined performance characteristics are insufficient to complete retrieval of the respective identity shard within a timeframe determined for the retrieval, retrieve the requested archives by at least; retrieving the requested archives from the determined storage devices having the corresponding identity shard; and augmenting the retrieval of the requested archives by generating, using the redundancy code, original data corresponding to the requested archives from the shards stored on at least a portion of a remainder of the storage devices of the plurality of storage devices. - View Dependent Claims (2, 3, 4)
-
5. A system, comprising:
at least one computing device configured to implement one or more services, wherein the one or more services are configured to; process received archives to generate two or more sets of shards representing the plurality of archives, a minimum quorum quantity of the shards across the two or more sets of shards being usable to generate unavailable shards in the any set of the two or more sets of shards, the two or more sets of shards including at least; identity shards that contain the original data of the plurality of archives, and encoded shards representing an encoded form of the original data; and store each set of the two or more sets of shards across a set of storage devices, such that shards of at least two sets of the two or more sets are stored on each storage device of the set of storage devices; in response to receiving a request for retrieval of the archives, at least; retrieve the archives from a corresponding identity shard among the identity shards; and augment the retrieval of the archives by generating the requested archives from at least a subset of the encoded shards and at least a subset of a remainder of the identity shards. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, when executed by one or more processors of a computer system, cause the computer system to at least:
-
cause generation, by a redundancy code, of two or more sets of shards representing a plurality of archives to be stored by the computer system, a minimum quorum quantity of the shards across the two or more sets of shards being usable to generate unavailable shards in the any set of the two or more sets of shards, the two or more sets of shards including at least; identity shards that contain the original data of the plurality of archives, and encoded shards representing an encoded form of the original data; and cause storage of each set of the two or more sets of shards across a set of storage devices associated with the computer system, such that shards of at least two sets of the two or more sets are stored on each storage device of the set of storage devices; service requests for retrieving the archives, by at least; determining whether the storage device associated with an identity shard corresponding to the requested archives is capable of accommodating the requests within a timeframe determined for the requests; if the storage device is capable of accommodating the requests within the timeframe, then causing retrieval of the corresponding identity shard; and if the storage device is not capable of accommodating the requests within the timeframe, then at least; causing retrieval of at least a subset of a remainder of the shards; and causing generation of the requested archives from the subset of the remainder of the shards. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification