Throughput optimization for redundancy coded data storage systems
First Claim
1. A computer-implemented method, comprising:
- under the control of one or more computer systems configured with executable instructions,storing a plurality of archives on a plurality of volumes by at least;
generating a set of shards representing the plurality of volumes, a minimum quorum quantity of the shards in the set being usable, by a redundancy code, to generate original data of the archives, the set of shards including at least;
identity shards that contain the original data of the plurality of archives; and
encoded shards representing an encoded form of the original data; and
storing each shard of the set of shards on a respective storage device of a plurality of storage devices associated with the one or more computer systems; and
in response to receiving a request for at least some of the stored plurality of archives, at least;
determining at least one of the respective storage devices on which a respective identity shard corresponding to the requested archives is stored;
determining whether the determined respective storage device has sufficient throughput to service the request within a predetermined timeframe; and
if the determined storage device does not have sufficient throughput, retrieve the requested archives by at least;
retrieving the requested archives from the determined storage devices having the corresponding identity shard; and
retrieving the requested archives from at least a portion of a remainder of the storage devices of the plurality of storage devices by generating, using the redundancy code, original data corresponding to the requested archives from the shards stored thereon.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques described and suggested herein include systems and methods for optimizing throughput characteristics for data archives stored on data storage systems using redundancy coding techniques. For example, redundancy coded shards, which may include identity shards that contain unencoded original data of archives, may be configured such that a variable number of the shards can be leveraged to meet throughput requirements or time-to-retrieval limitations for retrieval requests associated with the archives stored and/or encoded therein. Implementing systems may monitor throughput rates, capabilities, and burdens, so as to adaptively account for changes to some or all of the monitored parameters.
86 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
under the control of one or more computer systems configured with executable instructions, storing a plurality of archives on a plurality of volumes by at least; generating a set of shards representing the plurality of volumes, a minimum quorum quantity of the shards in the set being usable, by a redundancy code, to generate original data of the archives, the set of shards including at least; identity shards that contain the original data of the plurality of archives; and encoded shards representing an encoded form of the original data; and storing each shard of the set of shards on a respective storage device of a plurality of storage devices associated with the one or more computer systems; and in response to receiving a request for at least some of the stored plurality of archives, at least; determining at least one of the respective storage devices on which a respective identity shard corresponding to the requested archives is stored; determining whether the determined respective storage device has sufficient throughput to service the request within a predetermined timeframe; and if the determined storage device does not have sufficient throughput, retrieve the requested archives by at least; retrieving the requested archives from the determined storage devices having the corresponding identity shard; and retrieving the requested archives from at least a portion of a remainder of the storage devices of the plurality of storage devices by generating, using the redundancy code, original data corresponding to the requested archives from the shards stored thereon. - View Dependent Claims (2, 3, 4)
-
-
5. A system, comprising:
at least one computing device configured to implement one or more services, wherein the one or more services are configured to; retrieve original data of an archive stored as a set of redundancy coded shards that include at least an identity shard having at least a portion of the original data and an encoded shard representing an encoded form of the original data, by at least; retrieving the original data from the identity shard; and augmenting the retrieval of the requested original data by generating the requested original data from the encoded shard, such that a total retrieval throughput of the original data is greater than that of the retrieval throughput from the identity shard alone. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, when executed by one or more processors of a computer system, cause the computer system to at least:
service requests for retrieving archives stored as a set of redundancy coded shards that include at least an identity shard having original data associated with the requests and an encoded shard output from processing the original data with a redundancy code, by at least; retrieving the identity shard from a first storage device so as to retrieve the original data associated with the requests; and if throughput from the first storage device is insufficient to service the requests within a predetermined timeframe, retrieving the encoded shard from a second storage device so as to augment the throughput by generating the requested original data from the encoded shard. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
Specification