Data consistency management in large computing clusters
First Claim
1. A method comprising:
- determining a rebuild time parameter that characterizes a time for copying stored data from a first storage device to a second storage device of a computing system, the computing system comprising a plurality of computing nodes;
determining a data loss parameter corresponding to a storage device of a computing node of the plurality of computing nodes;
determining a storage device group having a maximum number of storage devices selected from storage devices of the plurality of computing nodes by;
identifying a maximum data loss probability value determined based at least in part on the rebuild parameter and a data loss parameter corresponding to the storage device,incrementally adding the storage device to the storage device group, andcomparing an estimated data loss probability value of the storage device group having the storage device added against the maximum data loss probability value, wherein the maximum number of storage devices for the storage device group is determined when the estimated data loss probability value exceeds the maximum data loss probability value; and
assigning a dataset to the storage device group, wherein the dataset and a replica of the dataset are stored in the storage device group.
1 Assignment
0 Petitions
Accused Products
Abstract
Storage device groups are formed with respect to data consistency policies and/or quantified probabilities. A method embodiment commences upon identifying a computing system having a plurality of storage devices that are accessed by a plurality of computing nodes. A user interface serves for collecting policies, data loss parameters, and data rebuild parameters. Based on the policies and/or values of the data loss parameters, and values of the data rebuild parameters, sets of storage device groups are formed to achieve particular data loss and rebuild time properties. Data storage containers such as files or virtual disks that hold persistent datasets are assigned to respective storage device groups that are appropriate to the nature of the dataset. The objective pertaining to an acceptable likelihood of data loss as well as the objective of an acceptable time for rebuild are achieved as a result of assignments of certain storage devices into a group.
-
Citations
20 Claims
-
1. A method comprising:
-
determining a rebuild time parameter that characterizes a time for copying stored data from a first storage device to a second storage device of a computing system, the computing system comprising a plurality of computing nodes; determining a data loss parameter corresponding to a storage device of a computing node of the plurality of computing nodes; determining a storage device group having a maximum number of storage devices selected from storage devices of the plurality of computing nodes by; identifying a maximum data loss probability value determined based at least in part on the rebuild parameter and a data loss parameter corresponding to the storage device, incrementally adding the storage device to the storage device group, and comparing an estimated data loss probability value of the storage device group having the storage device added against the maximum data loss probability value, wherein the maximum number of storage devices for the storage device group is determined when the estimated data loss probability value exceeds the maximum data loss probability value; and assigning a dataset to the storage device group, wherein the dataset and a replica of the dataset are stored in the storage device group. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer readable medium, embodied in a non-transitory computer readable medium, the non-transitory computer readable medium having stored thereon a sequence of instructions which, when stored in memory and executed by one or more processors causes the one or more processors to perform a set of acts, the acts comprising:
-
determining a rebuild time parameter that characterizes a time for copying stored data from a first storage device to a second storage device of a computing system, the computing system comprising a plurality of computing nodes; determining a data loss parameter corresponding to a storage device of a computing node of the plurality of computing nodes; determining a storage device group having a maximum number of storage devices selected from storage devices of the plurality of computing nodes by; identifying a maximum data loss probability value determined based at least in part on the rebuild parameter and a data loss parameter corresponding to the storage device, incrementally adding the storage device to the storage device group, and comparing an estimated data loss probability value of the storage device group having the storage device added against the maximum data loss probability value, wherein the maximum number of storage devices for the storage device group is determined when the estimated data loss probability value exceeds the maximum data loss probability value; and assigning a dataset to the storage device group, wherein the dataset and a replica of the dataset are stored in the storage device group. - View Dependent Claims (18)
-
-
19. A system comprising:
-
a storage medium having stored thereon a sequence of instructions; and one or more processors that execute the instructions to cause the one or more processors to perform a set of acts, the acts comprising, determining a rebuild time parameter that characterizes a time for copying stored data from a first storage device to a second storage device of a computing system, the computing system comprising a plurality of computing nodes; determining a data loss parameter corresponding to a storage device of a computing node of the plurality of computing nodes; determining a storage device group having a maximum number of storage devices selected from storage devices of the plurality of computing nodes by; identifying a maximum data loss probability value determined based at least in part on the rebuild parameter and a data loss parameter corresponding to the storage device, incrementally adding the storage device to the storage device group, and comparing an estimated data loss probability value of the storage device group having the storage device added against the maximum data loss probability value, wherein the maximum number of storage devices for the storage device group is determined when the estimated data loss probability value exceeds the maximum data loss probability value; and assigning a dataset to the storage device group, wherein the dataset and a replica of the dataset are stored in the storage device group. - View Dependent Claims (20)
-
Specification