Locality-sensitive data retrieval for redundancy coded data storage systems
First Claim
1. A computer-implemented method, comprising:
- storing a plurality of archives on a plurality of volumes by at least;
generating a first set of shards representing the plurality of volumes, a minimum quorum quantity of the shards in the set being usable, by a first redundancy code, to generate original data of the archives, the set of shards including at least;
identity shards that contain the original data of the plurality of archives; and
encoded shards representing a redundancy coded form of the original data;
generating a second set of shards for each set of the first set of shards, such that the second set of shards includes a respective shard of the first set of shards, and a second minimum quorum quantity of the second set of shards is usable, by a second redundancy code, to generate the respective shard of the first set of shards or any shard of the second set of shards;
storing each shard of the first set of shards in a respective data storage facility of a plurality of data storage facilities associated with the one or more computer systems, such that the original data of each archive of the plurality of archives is stored, in one or more of the identity shards, in no more than one data storage facility of a plurality of data facilities; and
storing each shard of the second set of shards, except for a respective shard also in the first set of shards, in one or more data storage devices associated with a respective data storage facility in which the respective shard of the first set of shards is stored; and
in response to receiving a request for an archive of the plurality of archives, at least;
determining the respective data storage facility of the plurality of data storage facilities on which the identity shard corresponding to the requested archive is stored;
determining at least a first geographic location associated with the request;
determining, based on at least a proximity between the first geographic location and a second geographic location associated with the respective data storage facility on which the identity shard is stored, whether the respective data storage facility has sufficient performance characteristics to service the request within a predetermined timeframe; and
if the determined data storage facility has sufficient performance characteristics, retrieving the requested archive from the determined respective data storage facility using the identity shard.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques described and suggested herein include systems and methods for optimizing retrieval, based on localities associated with a requestor and that of various components of a data storage system, of data archives stored on data storage systems using redundancy coding techniques. For example, redundancy coded shards, which may include identity shards that contain unencoded original data of archives, may be configured such that a variable number of the shards can be leveraged to meet performance requirements or time-to-retrieval limitations for retrieval requests associated with the archives stored and/or encoded therein. Under some circumstances, implementing systems may monitor relative geographic locations, among other performance-related metrics, so as to retrieve data such that fewer hosting data storage facilities are used for a given retrieval.
-
Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
storing a plurality of archives on a plurality of volumes by at least; generating a first set of shards representing the plurality of volumes, a minimum quorum quantity of the shards in the set being usable, by a first redundancy code, to generate original data of the archives, the set of shards including at least; identity shards that contain the original data of the plurality of archives; and encoded shards representing a redundancy coded form of the original data; generating a second set of shards for each set of the first set of shards, such that the second set of shards includes a respective shard of the first set of shards, and a second minimum quorum quantity of the second set of shards is usable, by a second redundancy code, to generate the respective shard of the first set of shards or any shard of the second set of shards; storing each shard of the first set of shards in a respective data storage facility of a plurality of data storage facilities associated with the one or more computer systems, such that the original data of each archive of the plurality of archives is stored, in one or more of the identity shards, in no more than one data storage facility of a plurality of data facilities; and storing each shard of the second set of shards, except for a respective shard also in the first set of shards, in one or more data storage devices associated with a respective data storage facility in which the respective shard of the first set of shards is stored; and in response to receiving a request for an archive of the plurality of archives, at least; determining the respective data storage facility of the plurality of data storage facilities on which the identity shard corresponding to the requested archive is stored; determining at least a first geographic location associated with the request; determining, based on at least a proximity between the first geographic location and a second geographic location associated with the respective data storage facility on which the identity shard is stored, whether the respective data storage facility has sufficient performance characteristics to service the request within a predetermined timeframe; and if the determined data storage facility has sufficient performance characteristics, retrieving the requested archive from the determined respective data storage facility using the identity shard. - View Dependent Claims (2, 3, 4)
-
-
5. A system, comprising:
at least one computing device that implements one or more services, wherein the one or more services at least; in response to receiving a request for original data of an archive stored as a set of redundancy coded shards that include at least an identity shard having at least a portion of the original data and an encoded shard representing a redundancy coded form of the original data, retrieve the requested original data by at least; determining, based at least on relative geographic locations of each of the redundancy coded shards, a random access memory burden on one or more devices storing the set of redundancy coded shards necessary to complete the retrieval within a predetermined timeframe; retrieving the requested original data from the identity shard; and if the determined random access memory burden exceeds a threshold value, increasing random access memory rate of the one or more devices associated with the retrieval by generating the requested original data from at least the encoded shard. - View Dependent Claims (6, 7, 8, 9, 10, 11, 20)
-
12. A non-transitory computer-readable storage medium having stored thereon executable instructions that, as a result of execution by one or more processors of a computer system, cause the computer system to at least:
service requests for retrieving archives stored as a set of redundancy coded shards that include at least an identity shard having original data associated with the requests and an encoded shard output from processing the original data with a redundancy code, by at least; determining, based at least on a geographic location of the requestor associated with the requests and a geographic location of the identity shard, throughput burden on one or more devices storing the set of redundancy coded shards necessary to complete retrieval of the original data within a predetermined timeframe; retrieving the identity shard from a first storage device so as to retrieve the original data associated with the requests; and if the determined the throughput burden exceeds a threshold value, retrieving at least the encoded shard from a second storage device so as to increase throughput rate of the one or more devices associated with the retrieval of the requested original data by generating the requested original data from the encoded shard. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
Specification