Local erasure codes for data storage
First Claim
Patent Images
1. A method comprising:
- arranging a set of data symbols into a plurality of data groups so that each data group in the plurality of data groups includes a subset of the set of data symbols;
for each data group in the plurality of data groups;
generating a group parity symbol based on the subset of the set of data symbols included in the data group; and
including the group parity symbol in the data group;
storing the plurality of data groups across a first set of machines;
generating at least two global parity symbols, wherein each of the at least two global parity symbols is generated based on all the data symbols in the set of data symbols;
including the at least two global parity symbols in a global parity group;
storing the global parity group across a second set of machines that is different than the first set of machines; and
recovering from failures, wherein the failures comprise;
up to a first failure associated with any one symbol included in each data group of the plurality of data groups;
up to a number of additional failures associated with additional symbols of the plurality of data groups and the at least two global parity symbols in the global parity group, wherein the number of additional failures is up to a number of the at least two global parity symbols; and
wherein generating the at least two global parity symbols comprises solving b equations with P=0, b−
1, wherein the b equations comprise;
setting a first sum plus a second sum equal to zero, wherein;
the first sum comprises a sum from j=1 to g of;
a sum from i=1 to r of;
a product of ω
i and λ
j, raised to a power of 2p, and multiplied by Xi,j;
andthe second sum comprises a sum from i=1 to b of;
a product of ω
i and λ
g+1, raised to a power of 2p, and multiplied by Yi,wherein Xi,j is a data symbol, Yi is a global parity symbol, g is a number of the plurality of data groups, r is a number of data symbols per data group, b is the number of global parity symbols, {λ
1, . . . , Δ
g+1} is a collection of elements such that any b of them are linearly independent in a particular field, and co is a primitive element of the particular field.
2 Assignments
0 Petitions
Accused Products
Abstract
In some examples, an erasure code can be implemented to provide for fault-tolerant storage of data. Maximally recoverable cloud codes, resilient cloud codes, and robust product codes are examples of different erasure codes that can be implemented to encode and store data. Implementing different erasure codes and different parameters within each erasure code can involve trade-offs between reliability, redundancy, and locality. In some examples, an erasure code can specify placement of the encoded data on machines that are organized into racks.
50 Citations
20 Claims
-
1. A method comprising:
-
arranging a set of data symbols into a plurality of data groups so that each data group in the plurality of data groups includes a subset of the set of data symbols; for each data group in the plurality of data groups; generating a group parity symbol based on the subset of the set of data symbols included in the data group; and including the group parity symbol in the data group; storing the plurality of data groups across a first set of machines; generating at least two global parity symbols, wherein each of the at least two global parity symbols is generated based on all the data symbols in the set of data symbols; including the at least two global parity symbols in a global parity group; storing the global parity group across a second set of machines that is different than the first set of machines; and recovering from failures, wherein the failures comprise; up to a first failure associated with any one symbol included in each data group of the plurality of data groups; up to a number of additional failures associated with additional symbols of the plurality of data groups and the at least two global parity symbols in the global parity group, wherein the number of additional failures is up to a number of the at least two global parity symbols; and wherein generating the at least two global parity symbols comprises solving b equations with P=0, b−
1, wherein the b equations comprise;setting a first sum plus a second sum equal to zero, wherein; the first sum comprises a sum from j=1 to g of; a sum from i=1 to r of; a product of ω
i and λ
j, raised to a power of 2p, and multiplied by Xi,j;and the second sum comprises a sum from i=1 to b of; a product of ω
i and λ
g+1, raised to a power of 2p, and multiplied by Yi,wherein Xi,j is a data symbol, Yi is a global parity symbol, g is a number of the plurality of data groups, r is a number of data symbols per data group, b is the number of global parity symbols, {λ
1, . . . , Δ
g+1} is a collection of elements such that any b of them are linearly independent in a particular field, and co is a primitive element of the particular field. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system, comprising:
-
one or more processors; a memory that includes a plurality of computer-executable components, the plurality of computer-executable components comprising an encoder module to; arrange a first plurality of machines into a grid of rows and columns; arrange a set of data symbols into a plurality of data groups so that each data group in the plurality of data groups includes a subset of the set of data symbols; for each data group in the plurality of data groups; generate a group parity symbol based on the subset of the set of data symbols included in the data group; and include the group parity symbol in the data group; store each of the plurality of data groups in a column of the first plurality of machines; generate, as part of a global parity group, a number of first global parity symbols that is equal to a number of the columns, wherein each of the first global parity symbols is generated based on all the data symbols in the set of data symbols; generate a second global parity symbol, wherein the second global parity symbol is a parity of the first global parity symbols; include the second global parity symbol in the global parity group; store the global parity group across a second plurality of machines that is different than the first plurality of machines; and wherein generating the first global parity symbols comprises solving r equations with P=0, . . . ,r−
1, wherein the r equations comprise;setting a first sum plus a second sum equal to zero, wherein; the first sum comprises a sum from j=1 to g of; a sum from i=1 to r of; a product of ω
i and λ
j, raised to a power of 2p, and multiplied by Xi,j;and the second sum comprises a sum from i=1 to r of; a product of ω
i and λ
g+1, raised to a power of 2p, and multiplied by Yi,wherein Xi,j is a data symbol, Yi is a global parity symbol, g is a number of the rows, r is a number of the columns, {λ
1, . . . , λ
g+1} is a collection of elements such that any g of them are linearly independent in a particular field, and ω
is a primitive element of the particular field. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
one or more processors; and memory storing computer-executable instructions that, when executed, cause the one or more processors to perform acts comprising; arranging a set of data symbols into a plurality of data groups so that each data group in the plurality of data groups includes a subset of the set of data symbols; for each data group in the plurality of data groups; generating a group parity symbol based on the subset of the set of data symbols included in the data group; and including the group parity symbol in the data group; storing the plurality of data groups across a first set of machines; generating at least two global parity symbols, wherein each of the at least two global parity symbols is generated based on all the data symbols in the set of data symbols; including the at least two global parity symbols in a global parity group; storing the global parity group across a second set of machines that is different than the first set of machines; and recovering from failures, wherein the failures comprise; up to a first failure associated with any one symbol included in each data group of the plurality of data groups; up to a number of additional failures associated with additional symbols of the plurality of data groups and the at least two global parity symbols, wherein the number of additional failures is up to a number of the at least two global parity symbols and; wherein generating the at least two global parity symbols comprises solving b equations with P=0, b−
1, wherein the b equations comprise;setting a first sum plus a second sum equal to zero, wherein; the first sum comprises a sum from j=1 to g of; a sum from i=1 to r of; a product of ω
i and λ
j, raised to a power of 2p, and multiplied by Xi,j;and the second sum comprises a sum from i=1 to b of; a product of ω
i and λ
g+1, raised to a power of 2p, and multiplied by Yi,wherein Xi,j is a data symbol, Yi is a global parity symbol, g is a number of the plurality of data groups, r is a number of data symbols per data group, b is the number of global parity symbols, {λ
1, . . . , λ
g+1} is a collection of elements such that any b of them are linearly independent in a particular field, and ω
is a primitive element of the particular field. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification