Data block migration
First Claim
Patent Images
1. A method, comprising:
- receiving a request to add a new node from a data storage cluster, the data storage cluster maintaining a plurality of deduplicated data segments in a plurality of suitcases at particular nodes in the data storage cluster, wherein a plurality of blockmap files include information for locating which suitcases in the plurality of suitcases contain particular deduplicated data segments, wherein the plurality of suitcases include datastore suitcases created after optimizing a file, each datastore suitcase comprising a data structure including deduplicated data segments, index information, offset information, data reference count information, and last file reference information, wherein optimizing a file includes compressing the file;
generating a plurality of new keys associated with a mapping function separate from the plurality of blockmap files, the mapping function using a particular key to identify a particular node containing a particular suitcase, wherein the plurality of new keys are used to identify particular suitcases stored in particular nodes, including the new node, of the data storage cluster, wherein the plurality of blockmap files, being separate from the mapping function, do not contain references to the new keys;
copying data including suitcases and their corresponding deduplicated data segments from the plurality of existing nodes to the new node, in accordance with the mapping function and new keys, to rebalance data across the data storage cluster,wherein performing data access after data migration includes accessing a stub file corresponding to a virtual image of the optimized file, the stub file providing a suitcase identifier that specifies a node.
23 Assignments
0 Petitions
Accused Products
Abstract
Techniques and mechanisms are provided for migrating data blocks around a cluster during node addition and node deletion. Migration requires no downtime, as a newly added node is immediately operational while the data blocks are being moved. Blockmap files and deduplication dictionaries need not be updated.
36 Citations
20 Claims
-
1. A method, comprising:
-
receiving a request to add a new node from a data storage cluster, the data storage cluster maintaining a plurality of deduplicated data segments in a plurality of suitcases at particular nodes in the data storage cluster, wherein a plurality of blockmap files include information for locating which suitcases in the plurality of suitcases contain particular deduplicated data segments, wherein the plurality of suitcases include datastore suitcases created after optimizing a file, each datastore suitcase comprising a data structure including deduplicated data segments, index information, offset information, data reference count information, and last file reference information, wherein optimizing a file includes compressing the file; generating a plurality of new keys associated with a mapping function separate from the plurality of blockmap files, the mapping function using a particular key to identify a particular node containing a particular suitcase, wherein the plurality of new keys are used to identify particular suitcases stored in particular nodes, including the new node, of the data storage cluster, wherein the plurality of blockmap files, being separate from the mapping function, do not contain references to the new keys; copying data including suitcases and their corresponding deduplicated data segments from the plurality of existing nodes to the new node, in accordance with the mapping function and new keys, to rebalance data across the data storage cluster, wherein performing data access after data migration includes accessing a stub file corresponding to a virtual image of the optimized file, the stub file providing a suitcase identifier that specifies a node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 10, 11)
-
-
8. The wherein the new node is a storage device.
-
12. A system, comprising:
-
an interface configured to receive a request to add a new node from a data storage cluster, the data storage cluster maintaining a plurality of deduplicated data segments in a plurality of suitcases at particular nodes in the data storage cluster, wherein a plurality of blockmap files include information for locating which suitcases in the plurality of suitcases contain particular deduplicated data segments, wherein the plurality of suitcases include datastore suitcases created after optimizing a file, each datastore suitcase comprising a data structure including deduplicated data segments, index information, offset information, data reference count information, and last file reference information, wherein optimizing a file includes compressing the file; a processor configured to generate a plurality of new keys associated with a mapping function separate from the plurality of blockmap files, the mapping function using a particular key to identify a particular node containing a particular suitcase, wherein the plurality of new keys are used to identify particular suitcases stored in particular nodes, including the new node, of the data storage cluster, wherein the plurality of blockmap files, being separate from the mapping function, do not contain references to the new keys, the processor further configured to copy data including suitcases and their corresponding deduplicated data segments from the plurality of existing nodes to the new node, in accordance with the mapping function and new keys, to rebalance data across the data storage cluster, wherein the processor is further configured to perform data access after data migration, wherein performing data access includes accessing a stub file corresponding to a virtual image of the optimized file, the stub file providing a suitcase identifier that specifies a node. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A non-transitory computer readable medium comprising computer code for:
-
receiving a request to add a new node from a data storage cluster, the data storage cluster maintaining a plurality of deduplicated data segments in a plurality of suitcases at particular nodes in the data storage cluster, wherein a plurality of blockmap files include information for locating which suitcases in the plurality of suitcases contain particular deduplicated data segments, wherein the plurality of suitcases include datastore suitcases created after optimizing a file, each datastore suitcase comprising a data structure including deduplicated data segments, index information, offset information, data reference count information, last file reference information, wherein optimizing a file includes compressing the file; generating a plurality of new keys associated with a mapping function separate from the plurality of blockmap files, the mapping function using a particular key to identify a particular node containing a particular suitcase, wherein the plurality of new keys are used to identify particular suitcases stored in particular nodes, including the new node, of the data storage cluster, wherein the plurality of blockmap files, being separate from the mapping function, do not contain references to the new keys; copying data including suitcases and their corresponding deduplicated data segments from the plurality of existing nodes to the new node, in accordance with the mapping function and new keys, to rebalance data across the data storage cluster, wherein performing data access after data migration includes accessing a stub file corresponding to a virtual image of the optimized file, the stub file providing a suitcase identifier that specifies a node. - View Dependent Claims (19, 20)
-
Specification