Archival data organization and management
First Claim
1. A system for providing archival data storage, comprising:
- a plurality of data storage devices, each of the plurality of data storage devices storing a plurality of volume components, each of the plurality of volume components comprising a plurality of data components and corresponding to a volume, each volume corresponding to a volume identifier from a plurality of volume identifiers;
a plurality of data storage nodes, each of the plurality of data storage nodes being operably connected to one or more of the plurality of data storage devices and configured to provide information regarding the connected plurality of data storage devices and the plurality of data storage nodes;
a storage node registrar configured to receive information provided by the plurality of data storage nodes and to maintain a mapping that associates the plurality of volume identifiers with corresponding volume information, the mapping being based at least in part on the information received from the plurality of data storage nodes; and
a request processing sub-system configured to;
receive a request for a data object, the request specifying a volume identifier and an object identifier associated with the data object;
select, based at least in part on the volume identifier and the mapping maintained by the storage node registrar, one or more data storage nodes that store one or more volume components associated with the volume identifier;
for each of the selected one or more data storage nodes;
identify, based at least in part on the volume identifier, a volume component associated with the volume identifier that is stored by a data storage device operably connected to the data storage node;
retrieve, based at least in part on the object identifier, a data component stored in the identified volume component; and
provide the requested data object based at least in part on the retrieved data component from the selected one or more data storage nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and systems are provided herein that facilitate cost-effective, scalable and reliable archival data organization and management. In an embodiment, data are redundantly encoded and stored to provide data reliability. Further, encoded data may be stored in self-describing storage entities that provide information describing data stored therein. Information provided by self-describing storage entities may be used to construct a limited map that is usable to facilitate data placement and data location services during data storage and retrieval. Data reliability and durability is provided because information about data stored in the system is mostly contained in the storage entities themselves. Cost efficiency is provided because only a limited map is provided for efficiency purposes instead of a potentially large global index data structure.
-
Citations
27 Claims
-
1. A system for providing archival data storage, comprising:
-
a plurality of data storage devices, each of the plurality of data storage devices storing a plurality of volume components, each of the plurality of volume components comprising a plurality of data components and corresponding to a volume, each volume corresponding to a volume identifier from a plurality of volume identifiers; a plurality of data storage nodes, each of the plurality of data storage nodes being operably connected to one or more of the plurality of data storage devices and configured to provide information regarding the connected plurality of data storage devices and the plurality of data storage nodes; a storage node registrar configured to receive information provided by the plurality of data storage nodes and to maintain a mapping that associates the plurality of volume identifiers with corresponding volume information, the mapping being based at least in part on the information received from the plurality of data storage nodes; and a request processing sub-system configured to; receive a request for a data object, the request specifying a volume identifier and an object identifier associated with the data object; select, based at least in part on the volume identifier and the mapping maintained by the storage node registrar, one or more data storage nodes that store one or more volume components associated with the volume identifier; for each of the selected one or more data storage nodes; identify, based at least in part on the volume identifier, a volume component associated with the volume identifier that is stored by a data storage device operably connected to the data storage node; retrieve, based at least in part on the object identifier, a data component stored in the identified volume component; and provide the requested data object based at least in part on the retrieved data component from the selected one or more data storage nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method comprising:
under the control of one or more computer systems comprising a storage node and configured with executable instructions, receiving information, from a plurality of data storage nodes, each of the plurality of data storage nodes being operably connected to one or more of a plurality of data storage devices, regarding the connected plurality of data storage devices and the plurality of data storage nodes, wherein each of the plurality of data storage devices is configured to store a plurality of volume components, each of the plurality of volume components comprising plurality of data components and corresponding to a volume, each volume corresponding to a volume identifier from a plurality of volume identifiers; providing information, from the plurality of data storage nodes, to a storage node registrar, the information regarding the connected plurality of data storage devices, the plurality of data storage nodes, at least a network location of the storage node, and storage information of one or more data storage devices operably connected to the storage node; providing a mapping that associates the plurality of volume identifiers with corresponding volume information, the mapping being based at least in part on the information from the plurality of data storage nodes; maintaining a request processing sub-system configured for; receiving a request for a data object, the request specifying a volume identifier and an object identifier associated with the data component; causing the request to be fulfilled using one of the one or more data storage devices operably connected to the storage node; selecting, based at least in part on the volume identifier and the mapping, one or more data storage nodes that store one or more volume components associated with the volume identifier; identifying, for each of the selected one or more data storage nodes, based at least in part on the volume identifier, a volume component associated with the volume identifier that is stored by a data storage device operably connected to the data storage node; retrieving, based at least in part on the object identifier, a data component stored in the identified volume component, the data component being retrieved based at least in part on at least a portion of the provided information; and providing the requested data object based at least in part on the retrieved data component from the selected one or more data storage nodes. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
19. A computer-implemented method comprising:
under the control of one or more computer systems configured with executable instructions, receiving information, from a plurality of data storage nodes, each of the plurality of data storage nodes being operably connected to one or more of a plurality of data storage devices, regarding the connected plurality of data storage devices and the plurality of data storage nodes, wherein each of the plurality of data storage devices is configured to store a plurality of volume components, each of the plurality of volume components comprising a plurality of data components and corresponding to a volume, each volume corresponding to a volume identifier from a plurality of volume identifiers; providing information, from the plurality of data storage nodes, to a storage node registrar, the information regarding the connected plurality of data storage devices, the plurality of data storage nodes, maintaining a mapping that associates the plurality of volume identifiers with corresponding volume information, the mapping being based at least in part on the information received from the plurality of data storage nodes; maintaining a request processing sub-system configured for; receiving a request to store a data object, the request specifying a volume identifier and an object identifier associated with the data object; obtaining, in response to the received request to store the data object, one or more data components by causing an application of one or more data encoding schemes on the data object; selecting, based at least in part on the volume identifier and the mapping maintained by the storage node registrar, one or more data storage nodes to store the one or more data components and one or more volume components associated with the volume identifier based at least in part on information that is dynamically provided by the one or more data storage nodes; identify, for each of the selected one or more data storage nodes, based at least in part on the volume identifier, a volume component associated with the volume identifier that is stored by a data storage device operably connected to the data storage node; retrieving, based at least in part on the object identifier, a data component stored in the identified volume component; requesting the one or more data storage nodes to store the one or more data components; and providing the requested data object based at least in part on the retrieved data component from the selected one or more data storage nodes. - View Dependent Claims (20, 21, 22, 23, 24)
-
25. One or more non-transitory computer-readable storage media having stored thereon instructions for causing at least one computer system to provide archival data storage, the instructions comprising instructions that cause said at least one computer to:
-
as a result of a received request to store a data object; encode the data object into a first plurality of data components; determine a plurality of data storage devices, each of the plurality of data storage devices storing a plurality of volume components, each of the plurality of volume components comprising a second plurality of data components and corresponding to a volume, each volume corresponding to a volume identifier from a plurality of volume identifiers; store, on the determined plurality of data storage devices, the first plurality of data components based at least in part on aggregate information about a hierarchical data structure, portions of the aggregated information being received from the plurality of data storage devices, wherein the aggregated information comprises information about volume components stored in the plurality of data storage devices; cause the first plurality of data components to be stored in the hierarchical data structure; provide information, from a plurality of data storage nodes, to a storage node registrar, wherein each of the plurality of data storage nodes being operably connected to one or more of the plurality of data storage devices and configured to provide information regarding the connected plurality of data storage devices and the plurality of data storage nodes; provide a mapping, to the storage node registrar, that associates the plurality of volume identifiers with corresponding volume information, the mapping being based at least in part on the information received from the plurality of data storage nodes; maintain a request processing sub-system configured to; receive a request for a data object, the request specifying a volume identifier and an object identifier associated with the data object; select, based at least in part on the volume identifier and the mapping, one or more data storage nodes that store one or more volume components associated with the volume identifier; identify, for each of the selected one or more data storage nodes, based at least in part on the volume identifier, a volume component associated with the volume identifier that is stored by a data storage device operably connected to the data storage node; retrieve, based at least in part on the object identifier, a data component stored in the identified volume component; and provide the requested data object based at least in part on the retrieved data component from the selected one or more data storage nodes. - View Dependent Claims (26, 27)
-
Specification