Distributed storage system data management and security
First Claim
1. A method for erasure coding, comprising:
- executing, by one or more processors configured to execute code stored in non-transitory processor readable media, encoding of data partitioned into K information chunks with an error correcting code C to produce N codeword chunks, wherein N is a composite natural number, K is a natural number lower than N and the error correcting code C is based on component codes;
h outer codes, wherein the i-th code generates bin chunks as a function of ki information chunks, where n is a natural number being a divisor of N, parameters h, bi and ki are natural numbers such that the code C has the highest possible minimum distance, and index i takes h different values; and
h inner codes, wherein the codes are nested and each code generates N/n chunks as a function of chunks produced by the outer codes;
distributing, by the one or more processors, the N codeword chunks over a set of storage nodes, wherein mapping of the N codeword chunks to the storage nodes is optimized to balance network load and to reduce predicted retrieval latency by computing;
i) availability coefficients for the storage nodes using statistical data, wherein each availability coefficient characterizes predicted average download speed for a respective storage node; and
ii) relevance coefficients for the codeword chunks;
reconstructing, by the one or more processors, the information chunks from the codeword chunks requested from the storage nodes;
repairing, by the one or more processors, failed storage nodes by reconstructing erased codeword chunks as a function of available codeword chunks belonging to the same set of the N codeword chunks; and
updating, by one or more processors, one or several information chunks, wherein the number of corresponding codeword chunks to be updated is minimized.
8 Assignments
0 Petitions
Accused Products
Abstract
A system and method for distributing data over a plurality of remote storage nodes. Data are split into segments and each segment is encoded into a number of codeword chunks. None of the codeword chunks contains any of the segments. Each codeword chunk is packaged with at least one encoding parameter and identifier, and metadata are generated for at least one file and for related segments of the at least one file. The metadata contains information to reconstruct from the segments, and information for reconstructing from corresponding packages. Further, metadata are encoded into package(s), and correspond to a respective security level and a protection against storage node failure. A plurality of packages are assigned to remote storage nodes to optimize workload distribution. Each package is transmitted to at least one respective storage node as a function iteratively accessing and retrieving the packages of metadata and file data.
-
Citations
14 Claims
-
1. A method for erasure coding, comprising:
-
executing, by one or more processors configured to execute code stored in non-transitory processor readable media, encoding of data partitioned into K information chunks with an error correcting code C to produce N codeword chunks, wherein N is a composite natural number, K is a natural number lower than N and the error correcting code C is based on component codes; h outer codes, wherein the i-th code generates bin chunks as a function of ki information chunks, where n is a natural number being a divisor of N, parameters h, bi and ki are natural numbers such that the code C has the highest possible minimum distance, and index i takes h different values; and h inner codes, wherein the codes are nested and each code generates N/n chunks as a function of chunks produced by the outer codes; distributing, by the one or more processors, the N codeword chunks over a set of storage nodes, wherein mapping of the N codeword chunks to the storage nodes is optimized to balance network load and to reduce predicted retrieval latency by computing;
i) availability coefficients for the storage nodes using statistical data, wherein each availability coefficient characterizes predicted average download speed for a respective storage node; and
ii) relevance coefficients for the codeword chunks;reconstructing, by the one or more processors, the information chunks from the codeword chunks requested from the storage nodes; repairing, by the one or more processors, failed storage nodes by reconstructing erased codeword chunks as a function of available codeword chunks belonging to the same set of the N codeword chunks; and updating, by one or more processors, one or several information chunks, wherein the number of corresponding codeword chunks to be updated is minimized. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for erasure coding, comprising:
-
one or more processors in communication with non-transitory processor readable media, wherein the non-transitory processor readable media store instructions that, when executed by the one or more processors, causes the one or more processors to; execute encoding of data partitioned into K information chunks with an error-correcting code C to produce N codeword chunks, wherein N is a composite natural number, K is a natural number lower than N and the error correcting code C is based on component codes; h outer codes, wherein the i-th code generates bin chunks as a function of ki information chunks, where n is a natural number being a divisor of N, parameters h, bi and ki are natural numbers such that the code C has the highest possible minimum distance, and index i takes h different values; and h inner codes, wherein the codes are nested and each code generates N/n chunks as a function of chunks produced by the outer codes; distribute the N codeword chunks over a set of storage nodes, wherein mapping of the N codeword chunks to the storage nodes is optimized to balance network load and to reduce predicted retrieval latency by computing;
i) availability coefficients for the storage nodes using statistical data, wherein each availability coefficient characterizes predicted average download speed for a respective storage node; and
ii) relevance coefficients for the codeword chunks;reconstruct the information chunks from the codeword chunks requested from the storage nodes; repair failed storage nodes by reconstructing erased codeword chunks as a function of available codeword chunks belonging to the same set of the N codeword chunks; and update one or several information chunks, wherein the number of corresponding codeword chunks to be updated is minimized. - View Dependent Claims (14)
-
Specification