Data block migration
First Claim
Patent Images
1. A method, comprising:
- receiving a request to add a new node from a data storage cluster, the data storage cluster maintaining a plurality of deduplicated data segments in a plurality of suitcases at particular nodes in the data storage cluster, wherein the plurality of suitcases include datastore suitcases created after optimizing a file, each datastore suitcase comprising a data structure including deduplicated data segments, index information, offset information, data reference count information, and last file reference information, wherein optimizing a file includes compressing the file;
generating a plurality of new keys associated with a mapping function, the mapping function using a particular key to identify a particular node containing a particular suitcase, wherein the plurality of new keys are used to identify particular suitcases stored in particular nodes, including the new node, of the data storage cluster,copying data including suitcases and their corresponding deduplicated data segments from the plurality of existing nodes to the new node, in accordance with the mapping function and new keys, to rebalance data across the data storage cluster,wherein performing data access after data migration includes accessing a stub file corresponding to a virtual image of the optimized file, the stub file providing a suitcase identifier that specifies a node.
15 Assignments
0 Petitions
Accused Products
Abstract
Techniques and mechanisms are provided for migrating data blocks around a cluster during node addition and node deletion. Migration requires no downtime, as a newly added node is immediately operational while the data blocks are being moved. Blockmap files and deduplication dictionaries need not be updated.
67 Citations
20 Claims
-
1. A method, comprising:
-
receiving a request to add a new node from a data storage cluster, the data storage cluster maintaining a plurality of deduplicated data segments in a plurality of suitcases at particular nodes in the data storage cluster, wherein the plurality of suitcases include datastore suitcases created after optimizing a file, each datastore suitcase comprising a data structure including deduplicated data segments, index information, offset information, data reference count information, and last file reference information, wherein optimizing a file includes compressing the file; generating a plurality of new keys associated with a mapping function, the mapping function using a particular key to identify a particular node containing a particular suitcase, wherein the plurality of new keys are used to identify particular suitcases stored in particular nodes, including the new node, of the data storage cluster, copying data including suitcases and their corresponding deduplicated data segments from the plurality of existing nodes to the new node, in accordance with the mapping function and new keys, to rebalance data across the data storage cluster, wherein performing data access after data migration includes accessing a stub file corresponding to a virtual image of the optimized file, the stub file providing a suitcase identifier that specifies a node. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system, comprising:
-
a processor; and memory comprising instructions to execute a method, the method comprising; receiving a request to add a new node from a data storage cluster, the data storage cluster maintaining a plurality of deduplicated data segments in a plurality of suitcases at particular nodes in the data storage cluster, wherein the plurality of suitcases include datastore suitcases created after optimizing a file, each datastore suitcase comprising a data structure including deduplicated data segments, index information, offset information, data reference count information, and last file reference information, wherein optimizing a file includes compressing the file; generating a plurality of new keys associated with a mapping function, the mapping function using a particular key to identify a particular node containing a particular suitcase, wherein the plurality of new keys are used to identify particular suitcases stored in particular nodes, including the new node, of the data storage cluster, copying data including suitcases and their corresponding deduplicated data segments from the plurality of existing nodes to the new node, in accordance with the mapping function and new keys, to rebalance data across the data storage cluster, wherein performing data access after data migration includes accessing a stub file corresponding to a virtual image of the optimized file, the stub file providing a suitcase identifier that specifies a node. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium comprising computer code for:
-
receiving a request to add a new node from a data storage cluster, the data storage cluster maintaining a plurality of deduplicated data segments in a plurality of suitcases at particular nodes in the data storage cluster, wherein the plurality of suitcases include datastore suitcases created after optimizing a file, each datastore suitcase comprising a data structure including deduplicated data segments, index information, offset information, data reference count information, and last file reference information, wherein optimizing a file includes compressing the file; generating a plurality of new keys associated with a mapping function, the mapping function using a particular key to identify a particular node containing a particular suitcase, wherein the plurality of new keys are used to identify particular suitcases stored in particular nodes, including the new node, of the data storage cluster; copying data including suitcases and their corresponding deduplicated data segments from the plurality of existing nodes to the new node, in accordance with the mapping function and new keys, to rebalance data across the data storage cluster, wherein performing data access after data migration includes accessing a stub file corresponding to a virtual image of the optimized file, the stub file providing a suitcase identifier that specifies a node. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification