SPARSE INDEX-BASED STORAGE, RETRIEVAL, AND MANAGEMENT OF STORED DATA

US 20180225293A1
Filed: 12/19/2014
Published: 08/09/2018
Est. Priority Date: 12/19/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

processing a plurality of archives to be stored on a plurality of volumes so as to sort the plurality of archives in a predetermined order, the predetermined order including at least an identification of groups of the plurality of archives to be correlated with subsets of the plurality of volumes;

generating indexes for the plurality of volumes, each index of the generated index including references to subindexes that identify a predetermined subset of the plurality of archives to be stored on an associated volume of the plurality of volumes, the predetermined subset being predetermined from a specified interval in the predetermined order;

storing the plurality of archives and the indexes on the plurality of volumes in the predetermined order;

processing the sorted plurality of archives and the generated indexes using a redundancy code so as to generate shards;

storing the plurality of archives and the indexes as shards on the plurality of volumes; and

at a time after receiving a request for a subset of the shards, retrieving the subset by at least;

locating, based on the predetermined order, at least one respective volume on which the requested subset is stored;

locating, based on an associated index for the respective volume, a subindex that identifies an archive of the predetermined subset of the plurality of archives that is prior to the requested subset; and

sequentially reading the respective volume starting from a location corresponding with the archive identified by the subindex until the requested subset is returned.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques described and suggested herein include systems and methods for storing, indexing, and retrieving original data of data archives on data storage systems using redundancy coding techniques. For example, redundancy codes, such as erasure codes, may be applied to archives (such as those received from a customer of a computing resource service provider) so as allow the storage of original data of the individual archives available on a minimum of volumes, such as those of a data storage system, while retaining availability, durability, and other guarantees imparted by the application of the redundancy code. Sparse indexing techniques may be implemented so as to reduce the footprint of indexes used to locate the original data, once stored.

7 Citations

20 Claims

1. A computer-implemented method, comprising:
- processing a plurality of archives to be stored on a plurality of volumes so as to sort the plurality of archives in a predetermined order, the predetermined order including at least an identification of groups of the plurality of archives to be correlated with subsets of the plurality of volumes;
  
  generating indexes for the plurality of volumes, each index of the generated index including references to subindexes that identify a predetermined subset of the plurality of archives to be stored on an associated volume of the plurality of volumes, the predetermined subset being predetermined from a specified interval in the predetermined order;
  
  storing the plurality of archives and the indexes on the plurality of volumes in the predetermined order;
  
  processing the sorted plurality of archives and the generated indexes using a redundancy code so as to generate shards;
  
  storing the plurality of archives and the indexes as shards on the plurality of volumes; and
  
  at a time after receiving a request for a subset of the shards, retrieving the subset by at least;
  
  locating, based on the predetermined order, at least one respective volume on which the requested subset is stored;
  
  locating, based on an associated index for the respective volume, a subindex that identifies an archive of the predetermined subset of the plurality of archives that is prior to the requested subset; and
  
  sequentially reading the respective volume starting from a location corresponding with the archive identified by the subindex until the requested subset is returned.
- View Dependent Claims (3, 4)
- - 3. The computer-implemented method of claim 1, wherein the redundancy code is an erasure code that, when applied to the plurality of archives, generates a subset of the plurality of shards that corresponds with an identity matrix containing the original data.
  - 4. The computer-implemented method of claim 1, wherein the specified interval is independent of boundaries of the plurality of archives to be stored.

2. (canceled)

5. A system, comprising:
- at least one computing device configured to implement one or more services, wherein the one or more services are configured to;
  
  sort a plurality of archives in a predetermined order, the predetermined order including at least an identification of groups of the plurality of archives to be correlated with subsets of a plurality of volumes;
  
  generate an index for the plurality of volumes that refers to subindexes, the subindexes corresponding to a subset of the plurality of archives at a specified interval;
  
  store the plurality of archives in the predetermined order;
  
  store the index;
  
  process the sorted plurality of archives and the generated indexes using a redundancy code so as to generate shards;
  
  store the plurality of archives and the indexes as shards on the plurality of volumes; and
  
  in response to a request for an archive, locating, from the stored index, an appropriate subindex and retrieving the archive by sequentially retrieving data starting from a location corresponding with the appropriate subindex.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The system of claim 5, wherein the redundancy code is an erasure code that utilizes an identity matrix containing original data of the plurality of archives, and wherein the one or more services are further configured to store the plurality of archives and the index using, at least in part, as original data corresponding to the identity matrix.
  - 8. The system of claim 5, wherein the one or more services are further configured to:
    - store the plurality of archives in a first entity; and
      
      store the index in a second entity that is separate from the first entity.
  - 9. The system of claim 5, wherein the one or more services are further configured to store the plurality of archives in a plurality of volumes.
  - 10. The system of claim 9, wherein the one or more services are further configured to store the index as one or more volume indices on a respective volume of the plurality of volumes.
  - 11. The system of claim 10, wherein the one or more services are further configured to locate the appropriate subindex by first determining an appropriate volume index based on a location of the requested archive in the predetermined order.
  - 12. The system of claim 5, wherein the one or more services are further configured to at least receive the request using an application programming interface call.

6. (canceled)

13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, if executed by one or more processors of a computer system, cause the computer system to at least:
- sorting a plurality of archives in at least one predetermined order, the predetermined order including at least an identification of groups of the plurality of archives to be correlated with subsets of a plurality of volumes;
  
  using the predetermined order to generate one or more indices for the plurality of volumes that point, at a specified interval, to subindexes corresponding to a subset of the plurality of archives;
  
  storing the plurality of archives and the indices on the plurality of volumes in the predetermined order;
  
  processing the sorted plurality of archives and the generated indices using a redundancy code so as to generate shards;
  
  storing the plurality of archives and the indices as shards on the plurality of volumes; and
  
  in response to a request for an indexed archive, locating, in the generated indices and by using the predetermined order, an appropriate subindex and retrieving the archive by sequentially retrieving data starting from a location corresponding with the appropriate subindex.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable storage medium of claim 13, wherein the instructions further comprise instructions that, if executed by the one or more processors, cause the computer system to sort the plurality of archives in at least one predetermined order that includes grouping subsets of the plurality of archives by identities of customers, of the computer system, associated with the subsets.
  - 17. The non-transitory computer-readable storage medium of claim 13, wherein the instructions further comprise instructions that, cause the computer system to sort the plurality of archives in at least one predetermined order that includes grouping subsets of the plurality of archives by times at which the plurality of archives were uploaded.
  - 18. The non-transitory computer-readable storage medium of claim 13, wherein the instructions further comprise instructions that, if as a result of being executed by the one or more processors, cause the computer system to store the subindexes with the indices.
  - 19. The non-transitory computer-readable storage medium of claim 13, wherein the specified interval is a quantity of blocks on a data storage system on which the indices are stored.
  - 20. The non-transitory computer-readable storage medium of claim 13, wherein the specified interval is a quantity of the indices as stored on a data storage system.

14. (canceled)

15. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Donlan, Bryan James, Franklin, Paul David

Granted Patent

US 10,042,848 B1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/113 Details of archiving lifecy...

G06F 16/13 File access structures, e.g...

SPARSE INDEX-BASED STORAGE, RETRIEVAL, AND MANAGEMENT OF STORED DATA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

7 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SPARSE INDEX-BASED STORAGE, RETRIEVAL, AND MANAGEMENT OF STORED DATA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

7 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links