Random access optimization for redundancy coded data storage systems
First Claim
Patent Images
1. A computer-implemented method, comprising:
- under the control of one or more computer systems configured with executable instructions,storing a plurality of archives on a plurality of volumes by at least;
generating a set of shards representing the plurality of volumes, a minimum quorum quantity of the shards in the set being usable, by a redundancy code, to generate original data of the archives, the set of shards including at least;
identity shards that contain the original data of the plurality of archives; and
encoded shards representing an encoded form of the original data; and
storing each shard of the set of shards on a respective storage device of a plurality of storage devices associated with the one or more computer systems, each respective storage device having a predetermined random access capability for data stored thereon;
in response to receiving a request for at least some of the stored plurality of archives, at least;
determining at least one of the respective storage devices on which a respective identity shard corresponding to the requested archives is stored;
determining a current random access rate for the determined storage device; and
if a random access burden imposed by the request plus the current random access rate is above the predetermined random access capability, retrieving the requested archives by at least;
retrieving the requested archives from the determined storage devices having the corresponding identity shard, such that the predetermined random access capability is not exceeded; and
retrieving the requested archives from at least a portion of a remainder of the storage devices of the plurality of storage devices by generating, using the redundancy code, original data corresponding to the requested archives from the shards stored thereon.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques described and suggested herein include systems and methods for optimizing random access characteristics for data archives stored on data storage systems using redundancy coding techniques. For example, redundancy coded shards, which may include identity shards that contain unencoded original data of archives, may be configured such that a variable number of the shards can be leveraged to meet random access requirements for retrieval requests associated with the archives stored and/or encoded therein. Implementing systems may monitor random access rates, capabilities, and burdens, so as to adaptively account for changes to some or all of the monitored parameters.
87 Citations
20 Claims
-
1. A computer-implemented method, comprising:
under the control of one or more computer systems configured with executable instructions, storing a plurality of archives on a plurality of volumes by at least; generating a set of shards representing the plurality of volumes, a minimum quorum quantity of the shards in the set being usable, by a redundancy code, to generate original data of the archives, the set of shards including at least; identity shards that contain the original data of the plurality of archives; and encoded shards representing an encoded form of the original data; and storing each shard of the set of shards on a respective storage device of a plurality of storage devices associated with the one or more computer systems, each respective storage device having a predetermined random access capability for data stored thereon; in response to receiving a request for at least some of the stored plurality of archives, at least; determining at least one of the respective storage devices on which a respective identity shard corresponding to the requested archives is stored; determining a current random access rate for the determined storage device; and if a random access burden imposed by the request plus the current random access rate is above the predetermined random access capability, retrieving the requested archives by at least; retrieving the requested archives from the determined storage devices having the corresponding identity shard, such that the predetermined random access capability is not exceeded; and retrieving the requested archives from at least a portion of a remainder of the storage devices of the plurality of storage devices by generating, using the redundancy code, original data corresponding to the requested archives from the shards stored thereon. - View Dependent Claims (2, 3, 4)
-
5. A system, comprising:
at least one computing device configured to implement one or more services, wherein the one or more services are configured to; in response to receiving a request for original data of an archive stored as a set of redundancy coded shards that include at least an identity shard having at least a portion of the original data and an encoded shard representing an encoded form of the original data, at least; determine whether the identity shard is capable of accommodating a random access burden associated with the request; if the identity shard is capable of accommodating the random access burden, then service the request by retrieving the requested original data from the identity shard; and if the identity shard is not capable of accommodating the random access burden, then service the request by at least; retrieving the requested original data from the identity shard; and augmenting the retrieval of the requested original data by generating the requested original data from the encoded shard. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, when executed by one or more processors of a computer system, cause the computer system to at least:
service requests for retrieving archives stored as a set of redundancy coded shards that include at least an identity shard having original data associated with the requests and an encoded shard output from processing the original data with a redundancy code, by at least; assessing a random access burden associated with the requests; determining whether a first storage device associated with the identity shard is capable of accommodating the random access burden; if the first storage device is capable of accommodating the random access burden, then service the request by retrieving the requested original data from the identity shard; and if the first storage device is not capable of accommodating the random access burden, then retrieving the requested original data by at least; retrieving the encoded shard from a second storage device capable of accommodating at least a portion of the random access burden; and generating the requested original data from the encoded shard. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
Specification