Distributed data storage with reduced storage overhead using reduced-dependency erasure codes
First Claim
1. A device for redundantly storing computer data, comprising:
- a non-transitory memory that stores machine instructions; and
a processor coupled to the non-transitory memory that executes the machine instructions to;
generate a first set of representations of a plurality of storage segments;
generate a second set of representations of a plurality of regeneration constraints, wherein each of the regeneration constraints is configured such that when data is lost each of the regeneration constraints regenerates a portion of the plurality of storage segments so that the lost data is recovered, and each of the regeneration constraints regenerates and recovers a lost portion of the plurality of storage segments;
group the first set of representations into a plurality of discrete groups;
create a plurality of associations correlating each of the second set of representations with one of the first set of representations in each discrete group of a subset of the plurality of discrete groups;
generate a parity check matrix based on the first set of representations, the second set of representations, and the plurality of associations; and
construct a generator matrix based on the parity check matrix, each of the plurality of discrete groups corresponding to a respective storage node of a plurality of storage nodes, and the plurality of associations randomly distributed among the plurality of discrete groups.
7 Assignments
0 Petitions
Accused Products
Abstract
A system that implements a near-optimal, reduced-dependency erasure code construction to redundantly distribute computer data across multiple storage nodes includes a memory that stores machine instructions and a processor that executes the machine instructions to group storage segments into discrete groups, each of which corresponds to an individual storage node. The processor further executes the machine instructions to represent regeneration constraints and associate the constraints with storage segments in multiple storage nodes. The processor also executes the machine instructions to generate a parity check matrix based on the regeneration constraints, the associations and the storage segments. The processor additionally executes the machine instructions to construct a generator matrix based on the parity check matrix.
20 Citations
20 Claims
-
1. A device for redundantly storing computer data, comprising:
-
a non-transitory memory that stores machine instructions; and a processor coupled to the non-transitory memory that executes the machine instructions to; generate a first set of representations of a plurality of storage segments; generate a second set of representations of a plurality of regeneration constraints, wherein each of the regeneration constraints is configured such that when data is lost each of the regeneration constraints regenerates a portion of the plurality of storage segments so that the lost data is recovered, and each of the regeneration constraints regenerates and recovers a lost portion of the plurality of storage segments; group the first set of representations into a plurality of discrete groups; create a plurality of associations correlating each of the second set of representations with one of the first set of representations in each discrete group of a subset of the plurality of discrete groups; generate a parity check matrix based on the first set of representations, the second set of representations, and the plurality of associations; and construct a generator matrix based on the parity check matrix, each of the plurality of discrete groups corresponding to a respective storage node of a plurality of storage nodes, and the plurality of associations randomly distributed among the plurality of discrete groups. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for redundantly storing computer data, comprising:
-
generating a first set of representations, each of the first set of representations corresponding to a respective storage segment of a plurality of storage segments; grouping the first set of representations into a plurality of discrete groups, each of the plurality of discrete groups corresponding to a respective storage node of a plurality of storage nodes; generating a second set of representations, each of the second set of representations corresponding to a respective regeneration constraint of a plurality of regeneration constraints, wherein each of the regeneration constraints is configured such that when data is lost each of the regeneration constraints regenerates a portion of the plurality of storage segments so that the lost data is recovered, and each of the regeneration constraints regenerates and recovers a lost portion of the plurality of storage segments; creating a plurality of associations that correlate each respective regeneration constraint with one respective storage segment corresponding to each discrete group of a subset of the plurality of discrete groups, the plurality of associations substantially equally randomly distributed among the plurality of discrete groups; generating a parity check matrix based on the first set of representations, the second set of representations, and the plurality of associations; and constructing a generator matrix at least in part based on the parity check matrix. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product for redundantly storing computer data, comprising:
-
a non-transitory, computer-readable storage medium encoded with instructions, which when executed by a processor, cause the processor to perform; dividing an amount of storage data into a first number of data segments; apportioning a second number of redundant segments, a plurality of storage segments including the first number of data segments and the second number of redundant segments; generating a first set of representations corresponding to the plurality of storage segments; grouping the first set of representations into a plurality of discrete groups corresponding to a plurality of storage nodes; generating a second set of representations corresponding to a plurality of regeneration constraints, wherein each of the regeneration constraints is configured such that when data is lost each of the regeneration constraints regenerates a portion of the plurality of storage segments so that the lost data is recovered, and each of the regeneration constraints regenerates and recovers a lost portion of the plurality of storage segments; creating a plurality of associations that correlate each of the plurality of regeneration constraints with one of the plurality of storage segments corresponding to each discrete group of a subset of the plurality of discrete groups, the plurality of associations substantially equally randomly distributed among the plurality of discrete groups; generating a parity check matrix based on the first set of representations, the second set of representations, and the plurality of associations; and constructing a generator matrix at least in part based on the parity check matrix. - View Dependent Claims (20)
-
Specification