VOLUME-LEVEL REDUNDANCY CODING TECHNIQUES FOR SEQUENTIAL TRANSFER OPTIMIZED STORAGE DEVICES
First Claim
1. A computer-implemented method, comprising:
- under the control of one or more computer systems configured with executable instructions,processing a plurality of archives to be stored on a plurality of volumes so as to;
sort the plurality of archives according to at least one criterion shared by the plurality of archives; and
determine which archives of the sorted plurality of archives will be stored on each volume of the plurality of volumes;
generating indexes for the plurality of volumes, each index of the indexes reflecting the a subset of the sorted plurality of archives to be stored on a respective volume of the plurality of volumes;
store the sorted plurality of archives and the generated indexes on a subset of the plurality of volumes, thereby generating a plurality of shards;
applying a redundancy code to the sorted plurality of archives and the generated indexes to generate encoded shards; and
storing the encoded shards on corresponding volumes outside the subset of the plurality of volumes.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques described and suggested herein include systems and methods for storing, indexing, and retrieving original data of data archives on data storage systems using redundancy coding techniques. For example, redundancy codes, such as erasure codes, may be applied to archives (such as those received from a customer of a computing resource service provider) so as allow the storage of original data of the individual archives available on a minimum of volumes, such as those of a data storage system, while retaining availability, durability, and other guarantees imparted by the application of the redundancy code. Sparse indexing techniques may be implemented so as to reduce the footprint of indexes used to locate the original data, once stored.
73 Citations
20 Claims
-
1. A computer-implemented method, comprising:
under the control of one or more computer systems configured with executable instructions, processing a plurality of archives to be stored on a plurality of volumes so as to; sort the plurality of archives according to at least one criterion shared by the plurality of archives; and determine which archives of the sorted plurality of archives will be stored on each volume of the plurality of volumes; generating indexes for the plurality of volumes, each index of the indexes reflecting the a subset of the sorted plurality of archives to be stored on a respective volume of the plurality of volumes; store the sorted plurality of archives and the generated indexes on a subset of the plurality of volumes, thereby generating a plurality of shards; applying a redundancy code to the sorted plurality of archives and the generated indexes to generate encoded shards; and storing the encoded shards on corresponding volumes outside the subset of the plurality of volumes. - View Dependent Claims (2, 3, 4)
-
5. A system, comprising:
-
at least one computing device configured to implement one or more services, wherein the one or more services are configured to; sort a plurality of archives in a predetermined order for storage, in the predetermined order, on a plurality of volumes; process the plurality of archives with a redundancy code so as to generate a plurality of shards, a subset of which includes original data of the plurality of archives; and store the shards on the plurality of volumes such that a subset of the plurality of volumes includes the original data. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, when executed by one or more processors of a computer system, cause the computer system to at least:
-
determine an order for a plurality of archives in which the plurality of archives is to be stored on a plurality of volumes; generate a plurality of shards, each shard of the plurality of shards corresponding to a volume of the plurality of volumes, by applying a redundancy code to at least the plurality of archives such that; a subset of the plurality of shards contains original data of the plurality of archives, and the subset of the plurality of shards is stored on the plurality of volumes such that the plurality of archives is represented in the determined order within the subset of the plurality of shards; and storing the plurality of shards on corresponding volumes of the plurality of volumes. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification