Redundant data assignment in a data storage system
First Claim
1. A method of assigning data to storage device nodes in a data storage system, wherein the data storage system stores a number M of replicas of the data, wherein M is greater than or equal to 2, the method comprising:
- dividing the data into a plurality of groups of segments and for each group of segments,identifying storage device nodes that have sufficient resources available to accommodate a requirement of the data, the requirement including at least one of a reliability requirement, a capacity requirement and a performance requirement, andwhen a number of the storage device nodes identified by said identifying is greater than M, assigning the data to M randomly selected storage device nodes from among those identified, and when the number of the identified storage device nodes is equal to M, assigning the data to the M identified storage device nodes, and when the number of the identified storage device nodes is less than M, dividing the group of data segments thereby forming a group of data segments having a reduced requirement and identifying storage device nodes that have sufficient resources available to accommodate the reduced requirement, andthe method further comprising adding a new storage device node to the data storage system includingidentifying an existing storage device node that is heavily loaded in comparison to other ones of existing storage device nodes;
moving data stored at the identified existing storage device node to the new storage device node; and
determining whether the new storage device node is sufficiently loaded in comparison to the existing storage device nodes and when the new storage device node is not sufficiently loaded, repeating said steps of identifying the existing storage device node and moving the data until the new storage device node is sufficiently loaded.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides techniques for assignment and layout of redundant data in data storage system. In one aspect, the data storage system stores a number M of replicas of the data. Nodes that have sufficient resources available to accommodate a requirement of data to be assigned to the system are identified. When the number of nodes is greater than M, the data is assigned to M randomly selected nodes from among those identified. The data to be assigned may include a group of data segments and when the number of nodes is less than M, the group is divided to form a group of data segments having a reduced requirement. Nodes are then identified that have sufficient resources available to accommodate the reduced requirement. In other aspects, techniques are providing for adding a new storage device node to a data storage system having a plurality of existing storage device nodes and for removing data from a storage device node in such a data storage system.
41 Citations
22 Claims
-
1. A method of assigning data to storage device nodes in a data storage system, wherein the data storage system stores a number M of replicas of the data, wherein M is greater than or equal to 2, the method comprising:
-
dividing the data into a plurality of groups of segments and for each group of segments, identifying storage device nodes that have sufficient resources available to accommodate a requirement of the data, the requirement including at least one of a reliability requirement, a capacity requirement and a performance requirement, and when a number of the storage device nodes identified by said identifying is greater than M, assigning the data to M randomly selected storage device nodes from among those identified, and when the number of the identified storage device nodes is equal to M, assigning the data to the M identified storage device nodes, and when the number of the identified storage device nodes is less than M, dividing the group of data segments thereby forming a group of data segments having a reduced requirement and identifying storage device nodes that have sufficient resources available to accommodate the reduced requirement, and the method further comprising adding a new storage device node to the data storage system including identifying an existing storage device node that is heavily loaded in comparison to other ones of existing storage device nodes; moving data stored at the identified existing storage device node to the new storage device node; and determining whether the new storage device node is sufficiently loaded in comparison to the existing storage device nodes and when the new storage device node is not sufficiently loaded, repeating said steps of identifying the existing storage device node and moving the data until the new storage device node is sufficiently loaded. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A data storage system comprising:
-
storage device nodes to store M replicas of data, wherein M is greater than or equal to 2; at least one central processing unit (CPU) configured to; divide the data into a plurality of groups of segments and for each group of segments, identify storage device nodes that have sufficient resources available to accommodate a requirement of the data, the requirement including at least one of a reliability requirement, a capacity requirement and a performance requirement, and when a number of the storage device nodes identified by said identifying is greater than M, assign the data to M randomly selected storage device nodes from among those identified, and when the number of the identified storage device nodes is equal to M, assign the data to the M identified storage device nodes, and when the number of the identified storage device nodes is less than M, divide the group of data segments thereby forming a group of data segments having a reduced requirement and identifying storage device nodes that have sufficient resources available to accommodate the reduced requirement, and in response to addition of a new storage device node, the at least one CPU is configured to further; identify an existing storage device node that is heavily loaded in comparison to other ones of existing storage device nodes; move data stored at the identified existing storage device node to the new storage device node; and determine whether the new storage device node is sufficiently loaded in comparison to the existing storage device nodes and when the new storage device node is not sufficiently loaded, repeating identifying the existing storage device node and moving the data until the new storage device node is sufficiently loaded. - View Dependent Claims (21, 22)
-
Specification