Inter-facility network traffic optimization for redundancy coded data storage systems
First Claim
1. A computer-implemented method, comprising:
- under the control of one or more computer systems configured with executable instructions,processing a plurality of archives by at least;
generating a set of shards representing a plurality of volumes associated with the one or more computer systems, a minimum quorum quantity of the shards in the set being usable, by a redundancy code, to generate original data of the archives, the set of shards including at least;
identity shards that contain the original data of the plurality of archives, andencoded shards representing an encoded form of the original data; and
storing each shard of the set of shards on a respective storage device of the plurality of storage devices, such that the original data of each archive of the plurality of archives is stored, in one or more of the identity shards, in no more than one data facility of a plurality of data facilities;
in response to receiving a request for an archive of the plurality of archives, at least;
determining a respective data storage facility of the plurality of data storage facilities on which the identity shard corresponding to the requested archive is stored;
determining whether the respective data storage facility has sufficient performance characteristics to service the request within a predetermined timeframe; and
if the determined data storage facility has sufficient performance characteristics, retrieving the requested archive from only the determined respective data storage facility so as to avoid data transfer between the determined data storage facility and a remainder of the plurality of data storage facilities.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques described and suggested herein include systems and methods for minimizing inter-facility data transfer during retrieval of data archives stored on data storage systems using redundancy coding techniques. For example, redundancy coded shards, which may include identity shards that contain unencoded original data of archives, may be configured such that a variable number of the shards can be leveraged to meet performance requirements or time-to-retrieval limitations for retrieval requests associated with the archives stored and/or encoded therein. Under some circumstances, implementing systems may monitor throughput rates, capabilities, and burdens, so as to preferentially retrieve data such that the identity shards are favored and fewer hosting data storage facilities are used for a given retrieval.
-
Citations
20 Claims
-
1. A computer-implemented method, comprising:
under the control of one or more computer systems configured with executable instructions, processing a plurality of archives by at least; generating a set of shards representing a plurality of volumes associated with the one or more computer systems, a minimum quorum quantity of the shards in the set being usable, by a redundancy code, to generate original data of the archives, the set of shards including at least; identity shards that contain the original data of the plurality of archives, and encoded shards representing an encoded form of the original data; and storing each shard of the set of shards on a respective storage device of the plurality of storage devices, such that the original data of each archive of the plurality of archives is stored, in one or more of the identity shards, in no more than one data facility of a plurality of data facilities; in response to receiving a request for an archive of the plurality of archives, at least; determining a respective data storage facility of the plurality of data storage facilities on which the identity shard corresponding to the requested archive is stored; determining whether the respective data storage facility has sufficient performance characteristics to service the request within a predetermined timeframe; and if the determined data storage facility has sufficient performance characteristics, retrieving the requested archive from only the determined respective data storage facility so as to avoid data transfer between the determined data storage facility and a remainder of the plurality of data storage facilities. - View Dependent Claims (2, 3, 4)
-
5. A system, comprising:
at least one computing device configured to implement one or more services, wherein the one or more services are configured to; in response to receiving a request to retrieve original data of an archive, the original data having been stored as a set of redundancy coded shards that include at least an identity shard having at least a portion of the original data and an encoded shard representing a redundancy coded form of the original data, retrieve the requested original data from the identity shard without accessing the encoded shard; determine if a performance characteristic associated with the retrieval is sufficient to complete the retrieval, from the identity shard and without accessing the encoded shard, within a timeframe determined for fulfilling the request; and if the performance characteristic is not sufficient, then augment the retrieval by generating the requested original data from at least the encoded shard. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, when executed by one or more processors of a computer system, cause the computer system to at least:
retrieve stored archives, the archives having been stored as a set of redundancy coded shards that include at least an identity shard having original data and an encoded shard output from processing the original data with a redundancy code, by at least; determining if performance characteristics associated with the identity shard are sufficient to complete retrieval of the original data within a timeframe determined for the retrieval; retrieving the identity shard from a first storage device so as to retrieve the original data; and if the performance characteristics are insufficient, retrieving at least the encoded shard from a second storage device so as to augment the retrieval of the requested original data by generating the requested original data from the encoded shard. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
Specification