Archival data storage system
First Claim
1. A computer-implemented method comprising:
- under the control of one or more computer systems of an archival data storage system that are configured with executable instructions,receiving, over a network from a requestor system, a storage request to store a data object into the archival data storage system;
causing storage of the data object in the archival data storage system by at least;
encoding the data object with one or more encoding schemes to obtain a plurality of encoded data components, the one or more encoding schemes including at least redundancy coding; and
causing storage of the plurality of encoded data components in at least one archival data storage device associated with the archival data storage system;
providing a data object identifier associated with data object, the data object identifier including storage location information that at least describes the at least one archival data storage device storing the plurality of encoded data components;
receiving, in connection with a retrieval request to retrieve the data object, the data object identifier;
creating a retrieval job corresponding to the retrieval request;
adding the retrieval job to a collection of pending jobs, at least one pending job of the collection of pending jobs being associated with a different data object from the data object;
processing, in one or more batches, the collection of pending jobs; and
providing the retrieved data object.
1 Assignment
0 Petitions
Accused Products
Abstract
A cost-effective, durable and scalable archival data storage system is provided herein that allow customers to store, retrieve and delete archival data objects, among other operations. For data storage, in an embodiment, the system stores data in a transient data store and provides a data object identifier may be used by subsequent requests. For data retrieval, in an embodiment, the system creates a job corresponding to the data retrieval and provides a job identifier associated with the created job. Once the job is executed, data retrieved is provided in a transient data store to enable customer download. In various embodiments, jobs associated with storage, retrieval and deletion are scheduled and executed using various optimization techniques such as load balancing, batch processed and partitioning. Data is redundantly encoded and stored in self-describing storage entities increasing reliability while reducing storage costs. Data integrity is ensured by integrity checks along data paths.
-
Citations
23 Claims
-
1. A computer-implemented method comprising:
under the control of one or more computer systems of an archival data storage system that are configured with executable instructions, receiving, over a network from a requestor system, a storage request to store a data object into the archival data storage system; causing storage of the data object in the archival data storage system by at least; encoding the data object with one or more encoding schemes to obtain a plurality of encoded data components, the one or more encoding schemes including at least redundancy coding; and causing storage of the plurality of encoded data components in at least one archival data storage device associated with the archival data storage system; providing a data object identifier associated with data object, the data object identifier including storage location information that at least describes the at least one archival data storage device storing the plurality of encoded data components; receiving, in connection with a retrieval request to retrieve the data object, the data object identifier; creating a retrieval job corresponding to the retrieval request; adding the retrieval job to a collection of pending jobs, at least one pending job of the collection of pending jobs being associated with a different data object from the data object; processing, in one or more batches, the collection of pending jobs; and providing the retrieved data object. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A computer-implemented method comprising:
under the control of one or more computer systems configured with executable instructions, receiving a data retrieval request to retrieve a data object, the data retrieval request specifying a data object identifier, the data object at least partially represented by a plurality of encoded data components generated from the data object using one or more encoding schemes, the one or more encoding schemes including at least redundancy coding, the data object identifier including storage location information that at least describes at least one location associated with the plurality of encoded data components; creating a data retrieval job corresponding to the data retrieval request; adding the data retrieval job to a batch including least one other data retrieval job corresponding to a different data object than the data object; providing a job identifier associated with the data retrieval job that is usable for obtaining information about the data retrieval job; and after providing the job identifier, processing the batch so as to execute the data retrieval job using at least in part the data object identifier to provide access to the data object. - View Dependent Claims (8, 9, 10, 11, 12)
-
13. A system for providing archival data storage services, comprising:
-
one or more archival data storage devices; a transient data store; one or more processors; and memory, including executable instructions that, when executed by the one or more processors, cause the one or more processors to collectively at least; receive a data storage request to store a data object; cause storage of the data object in the transient store by at least; obtaining the data object from the transient store; encoding the data object with one or more encoding schemes to obtain a plurality of encoded data components, the one or more encoding schemes including at least redundancy coding; and causing storage of the plurality of encoded data components in at least some of the one or more archival data storage devices; adding the data storage request to a batch including at least one other data storage request corresponding to a different data object than the data object; provide a data object identifier associated with the data, the data object identifier encoding at least storage location information sufficient to locate the plurality of encoded data components associated with the data object; and after providing the data object identifier, cause processing of the batch so as to cause storage of the plurality of encoded data components in accordance with the storage location information. - View Dependent Claims (14, 15, 16, 17)
-
-
18. One or more non-transitory computer-readable storage media having collectively stored thereon executable instructions that, when executed by one or more processors of an archival data storage system, cause the system to at least:
-
receive a plurality of data retrieval requests, each of the plurality of data retrieval request specifying a data object identifier for a data object to be retrieved, the data object at least partially represented by a plurality of encoded data components generated from the data object using one or more encoding schemes, the one or more encoding schemes including at least redundancy coding, the data object identifier at least including information sufficient to locate the plurality of encoded data components; cause data retrieval jobs to be created, each corresponding to a received data retrieval request; cause job identifiers to be provided, each job identifier corresponding to a data retrieval job and being usable to obtaining information about the data retrieval job; cause aggregation of at least a subset of the data retrieval jobs to form a job batch; the subset of the data retrieval jobs corresponding to a plurality of data objects, and cause processing of the job batch corresponding to the subset of data retrieval jobs after causing the job identifiers to be provided. - View Dependent Claims (19, 20, 21, 22, 23)
-
Specification