SPARSE INDEX-BASED STORAGE, RETRIEVAL, AND MANAGEMENT OF STORED DATA
First Claim
1. A computer-implemented method, comprising:
- processing a plurality of archives to be stored on a plurality of volumes so as to sort the plurality of archives in a predetermined order, the predetermined order including at least an identification of groups of the plurality of archives to be correlated with subsets of the plurality of volumes;
generating indexes for the plurality of volumes, each index of the generated index including references to subindexes that identify a predetermined subset of the plurality of archives to be stored on an associated volume of the plurality of volumes, the predetermined subset being predetermined from a specified interval in the predetermined order;
storing the plurality of archives and the indexes on the plurality of volumes in the predetermined order;
processing the sorted plurality of archives and the generated indexes using a redundancy code so as to generate shards;
storing the plurality of archives and the indexes as shards on the plurality of volumes; and
at a time after receiving a request for a subset of the shards, retrieving the subset by at least;
locating, based on the predetermined order, at least one respective volume on which the requested subset is stored;
locating, based on an associated index for the respective volume, a subindex that identifies an archive of the predetermined subset of the plurality of archives that is prior to the requested subset; and
sequentially reading the respective volume starting from a location corresponding with the archive identified by the subindex until the requested subset is returned.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques described and suggested herein include systems and methods for storing, indexing, and retrieving original data of data archives on data storage systems using redundancy coding techniques. For example, redundancy codes, such as erasure codes, may be applied to archives (such as those received from a customer of a computing resource service provider) so as allow the storage of original data of the individual archives available on a minimum of volumes, such as those of a data storage system, while retaining availability, durability, and other guarantees imparted by the application of the redundancy code. Sparse indexing techniques may be implemented so as to reduce the footprint of indexes used to locate the original data, once stored.
7 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
processing a plurality of archives to be stored on a plurality of volumes so as to sort the plurality of archives in a predetermined order, the predetermined order including at least an identification of groups of the plurality of archives to be correlated with subsets of the plurality of volumes; generating indexes for the plurality of volumes, each index of the generated index including references to subindexes that identify a predetermined subset of the plurality of archives to be stored on an associated volume of the plurality of volumes, the predetermined subset being predetermined from a specified interval in the predetermined order; storing the plurality of archives and the indexes on the plurality of volumes in the predetermined order; processing the sorted plurality of archives and the generated indexes using a redundancy code so as to generate shards; storing the plurality of archives and the indexes as shards on the plurality of volumes; and at a time after receiving a request for a subset of the shards, retrieving the subset by at least; locating, based on the predetermined order, at least one respective volume on which the requested subset is stored; locating, based on an associated index for the respective volume, a subindex that identifies an archive of the predetermined subset of the plurality of archives that is prior to the requested subset; and sequentially reading the respective volume starting from a location corresponding with the archive identified by the subindex until the requested subset is returned. - View Dependent Claims (3, 4)
-
-
2. (canceled)
-
5. A system, comprising:
at least one computing device configured to implement one or more services, wherein the one or more services are configured to; sort a plurality of archives in a predetermined order, the predetermined order including at least an identification of groups of the plurality of archives to be correlated with subsets of a plurality of volumes; generate an index for the plurality of volumes that refers to subindexes, the subindexes corresponding to a subset of the plurality of archives at a specified interval; store the plurality of archives in the predetermined order; store the index; process the sorted plurality of archives and the generated indexes using a redundancy code so as to generate shards; store the plurality of archives and the indexes as shards on the plurality of volumes; and in response to a request for an archive, locating, from the stored index, an appropriate subindex and retrieving the archive by sequentially retrieving data starting from a location corresponding with the appropriate subindex. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
6. (canceled)
-
13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, if executed by one or more processors of a computer system, cause the computer system to at least:
-
sorting a plurality of archives in at least one predetermined order, the predetermined order including at least an identification of groups of the plurality of archives to be correlated with subsets of a plurality of volumes; using the predetermined order to generate one or more indices for the plurality of volumes that point, at a specified interval, to subindexes corresponding to a subset of the plurality of archives; storing the plurality of archives and the indices on the plurality of volumes in the predetermined order; processing the sorted plurality of archives and the generated indices using a redundancy code so as to generate shards; storing the plurality of archives and the indices as shards on the plurality of volumes; and in response to a request for an indexed archive, locating, in the generated indices and by using the predetermined order, an appropriate subindex and retrieving the archive by sequentially retrieving data starting from a location corresponding with the appropriate subindex. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
14. (canceled)
-
15. (canceled)
Specification