Cluster storage collection based data management
First Claim
Patent Images
1. In a distributed system for storing data across a network to multiple data storage nodes, a method comprising:
- determining a bounded bandwidth available for data repair in the distributed system;
creating a specific number of stripes on each data storage node of the multiple data storage nodes, the stripes for placement and replication of data objects across respective ones of the data storage nodes, the specific number of stripes on each data storage node being a function of the bounded bandwidth; and
wherein the creating further comprises;
(a) calculating a target chunk size based on a replication degree such that data reliability of storing data across the multiple data storage nodes using a random data placement scheme is optimized, the calculating comprising analyzing data reliability of the distributed system in view of the bounded bandwidth by estimating a mean time to data loss (MTTDL) for an object (MTTDLobj) of multiple objects as a function of a harmonic sum of MTTDL in each state i of multiple states i, the distributed system comprising the multiple objects, each state i representing a state of the distributed system when i data storage nodes fail and lost replicas on the i data storage nodes have not been repaired; and
(b) allocating disk storage space on each node as a function of the target chunk size.
2 Assignments
0 Petitions
Accused Products
Abstract
Cluster storage collection-based data management is described. In one aspect, and in a distributed system for storing data across a network to multiple data storage nodes, a bounded bandwidth available for data repair in the distributed system is determined. A specific number of stripes are then created on each data storage node of the multiple data storage nodes. The stripes are for placement and replication of data objects across respective ones of the data storage nodes. The specific number of stripes created on each data storage node is a function of the determined bounded data repair bandwidth.
-
Citations
20 Claims
-
1. In a distributed system for storing data across a network to multiple data storage nodes, a method comprising:
-
determining a bounded bandwidth available for data repair in the distributed system; creating a specific number of stripes on each data storage node of the multiple data storage nodes, the stripes for placement and replication of data objects across respective ones of the data storage nodes, the specific number of stripes on each data storage node being a function of the bounded bandwidth; and wherein the creating further comprises; (a) calculating a target chunk size based on a replication degree such that data reliability of storing data across the multiple data storage nodes using a random data placement scheme is optimized, the calculating comprising analyzing data reliability of the distributed system in view of the bounded bandwidth by estimating a mean time to data loss (MTTDL) for an object (MTTDLobj) of multiple objects as a function of a harmonic sum of MTTDL in each state i of multiple states i, the distributed system comprising the multiple objects, each state i representing a state of the distributed system when i data storage nodes fail and lost replicas on the i data storage nodes have not been repaired; and (b) allocating disk storage space on each node as a function of the target chunk size. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. In a distributed system for storing data across a network to multiple data storage nodes, a computing device comprising:
a processor coupled to a memory, the memory comprising computer-program instructions executable by the processor for performing operations including; determining a bounded bandwidth available for data repair in the distributed system; creating a specific number of stripes on each data storage node of the multiple data storage nodes, the stripes for placement and replication of data objects across respective ones of the data storage nodes, the specific number of stripes on each data storage node being a function of the bounded bandwidth; and wherein the creating further comprises; (a) calculating a target chunk size based on a replication degree such that data reliability of storing data across the multiple data storage nodes using a random data placement scheme is optimized, the calculating comprising analyzing data reliability of the distributed system in view of the bounded bandwidth by estimating a mean time to data loss (MTTDL) for an object (MTTDLobj) of multiple objects as a function of a harmonic sum of MTTDL in each state i of multiple states i, the distributed system comprising the multiple objects, each state i representing a state of the distributed system when i data storage nodes fail and lost replicas on the i data storage nodes have not been repaired; and (b) allocating disk storage space on each node as a function of the target chunk size. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
17. In a distributed system for storing data across a network to multiple data storage nodes, one or more computer-readable media having encoded thereon computer-program instructions executable by a processor for performing operations comprising:
-
determining a bounded bandwidth available for data repair in the distributed system; creating a specific number of stripes on each data storage node of the multiple data storage nodes, the stripes for placement and replication of data objects across respective ones of the data storage nodes, the specific number of stripes on each data storage node being a function of the bounded bandwidth; and wherein the creating further comprises; (a) calculating a target chunk size based on a replication degree such that data reliability of storing data across the multiple data storage nodes using a random data placement scheme is optimized, the calculating comprising analyzing data reliability of the distributed system in view of the bounded bandwidth by estimating a mean time to data loss (MTTDL) for an object (MTTDLobj) of multiple objects as a function of a harmonic sum of MTTDL in each state i of multiple states i, the distributed system comprising the multiple objects, each state i representing a state of the distributed system when i data storage nodes fail and lost replicas on the i data storage nodes have not been repaired; and (b) allocating disk storage space on each node as a function of the target chunk size. - View Dependent Claims (18, 19, 20)
-
Specification