MDS ERASURE CODE CAPABLE OF REPAIRING MULTIPLE NODE FAILURES

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
2Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A maximum distance separable (MDS) erasure code capable of repairing multiple node failures, the erasure code being a C(k, r, p) code which stores original information data blocks and parity data blocks by constructing a (p−
 l)*(k+r) matrix, in which, p is a prime larger than both k and r, k is an arbitrary integer between 2 and p, and r is smaller than or equal to 5;
whereinboth an addition operation and a subtraction operation of the C(k, r, p) code are substituted by an XOR operation;
an original data block is split into k columns of the original information data blocks with each column containing p−
l bits;
r columns of the parity data blocks that are linearly independent from one another are generated from the k columns of the original information data blocks; and
after being split, the original information data blocks and the parity data blocks are linearly independent.
1 Assignment
0 Petitions
Accused Products
Abstract
An MDS erasure code capable of repairing multiple node failures, being a C(k, r, p) code which stores original information data blocks and parity data blocks by constructing a (p−l)*(k+r) matrix, in which, p is a prime larger than both k and r, k is an arbitrary integer between 2 and p, and r is smaller than or equal to 5. Both an addition operation and a subtraction operation of the C(k, r, p) code are substituted by an XOR operation. An original data block is split into k columns of the original information data blocks with each column containing p−l bits. r columns of the parity data blocks that are linearly independent from one another are generated from the k columns of the original information data blocks. After being changed, the original information data blocks and the parity data blocks are linearly independent.
15 Citations
View as Search Results
Storage controller, data processing chip, and data processing method  
Patent #
US 10,210,044 B2
Filed 05/11/2018

Current Assignee
Huawei Technologies Co. Ltd.

Sponsoring Entity
Huawei Technologies Co. Ltd.

Distributed storage method and system  
Patent #
US 10,447,763 B2
Filed 12/08/2016

Current Assignee
Nanning Fugui Precision Industrial Co. Ltd.

Sponsoring Entity
Nanning Fugui Precision Industrial Co. Ltd.

OPTIMIZING XORBASED CODES  
Patent #
US 20090164762A1
Filed 12/20/2007

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

System and method for symmetric triple parity for failing storage devices  
Patent #
US 7,613,984 B2
Filed 12/29/2006

Current Assignee
NetApp Inc.

Sponsoring Entity
NetApp Inc.

System and method for enabling efficient recovery of data in a storage array  
Patent #
US 20060074954A1
Filed 09/30/2004

Current Assignee
Google LLC

Sponsoring Entity
International Business Machines Corporation

System and method for tolerating multiple storage device failures in a storage system using horizontal and vertical parity layouts  
Patent #
US 20060129873A1
Filed 11/24/2004

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Lossy data compression exploiting distortion side information  
Patent #
US 20060170571A1
Filed 12/07/2005

Current Assignee
Massachusetts Institute of Technology

Sponsoring Entity
Massachusetts Institute of Technology

Nested Multiple Erasure Correcting Codes for Storage Arrays  
Patent #
US 20120221926A1
Filed 02/28/2011

Current Assignee
SK Hynix Inc.

Sponsoring Entity
International Business Machines Corporation

Nway parity technique for enabling recovery from up to N storage device failures  
Patent #
US 8,402,346 B2
Filed 09/25/2009

Current Assignee
NetApp Inc.

Sponsoring Entity
NetApp Inc.

PARTIALMAXIMUM DISTANCE SEPARABLE (PMDS) ERASURE CORRECTING CODES FOR STORAGE ARRAYS  
Patent #
US 20130205181A1
Filed 02/02/2012

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

System and method for efficient horizontal maximum distance separable raid  
Patent #
US 8,522,125 B1
Filed 04/11/2011

Current Assignee
IP3 2018 Series 300 of Allied Security Trust I

Sponsoring Entity
The Research Foundation for The State University of New York

Extended row diagonal parity with optimal decoding procedure  
Patent #
US 8,595,606 B1
Filed 07/14/2011

Current Assignee
IP3 2018 Series 300 of Allied Security Trust I

Sponsoring Entity
The Research Foundation for The State University of New York

RAID ERASURE CODE APPLIED TO PARTITIONED STRIPE  
Patent #
US 20140208022A1
Filed 01/21/2013

Current Assignee
Kaminario Limited

Sponsoring Entity
Kaminario Limited

METHOD FOR DATA RECOVERY  
Patent #
US 20150095747A1
Filed 09/29/2014

Current Assignee
Alexander Barg, Itzhak Tamo

Sponsoring Entity
Alexander Barg, Itzhak Tamo

TECHNIQUES TO EFFICIENTLY COMPUTE ERASURE CODES HAVING POSITIVE AND NEGATIVE COEFFICIENT EXPONENTS TO PERMIT DATA RECOVERY FROM MORE THAN TWO FAILED STORAGE UNITS  
Patent #
US 20150347231A1
Filed 06/02/2014

Current Assignee
Intel Corporation

Sponsoring Entity
Vinodh Gopal, Ozturk Erdinc

7 Claims
 1. A maximum distance separable (MDS) erasure code capable of repairing multiple node failures, the erasure code being a C(k, r, p) code which stores original information data blocks and parity data blocks by constructing a (p−
 l)*(k+r) matrix, in which, p is a prime larger than both k and r, k is an arbitrary integer between 2 and p, and r is smaller than or equal to 5;
wherein both an addition operation and a subtraction operation of the C(k, r, p) code are substituted by an XOR operation; an original data block is split into k columns of the original information data blocks with each column containing p−
l bits;r columns of the parity data blocks that are linearly independent from one another are generated from the k columns of the original information data blocks; and after being split, the original information data blocks and the parity data blocks are linearly independent.  View Dependent Claims (2, 3, 4, 5, 6, 7)
 l)*(k+r) matrix, in which, p is a prime larger than both k and r, k is an arbitrary integer between 2 and p, and r is smaller than or equal to 5;
1 Specification
This application is a continuationinpart of International Patent Application No. PCT/CN2015/071114 with an international filing date of Jan. 20, 2015, designating the United States, now pending, the contents of which, including any intervening amendments thereto, are incorporated herein by reference. Inquiries from the public to applicants or assignees concerning this document or the related applications should be directed to: Matthias Scholl P.C., Attn.: Dr. Matthias Scholl Esq., 245 First Street, 18th Floor, Cambridge, Mass. 02142.
1. Field of the Invention
The invention relates to the field of the distributed storage system, and more particularly to a maximum distance separable (MDS) erasure code capable of repairing multiple node failures.
2. Description of the Related Art
A typical method for overcoming storage node failure in the distributed storage system is introducing a redundancy by (n, k) MDS erasure code, which splits a file into k original information blocks and generates nk parity blocks from the k original information blocks so as to reconstruct the original file by gathering any k blocks from the n encoding blocks. However, the common MDS code has high encoding complexity and high updating complexity. In addition, the faulttolerance thereof is low and at the most, two failure nodes can be recovered.
In view of the abovedescribed problems, it is one objective of the invention to provide an MDS erasure code capable of repairing multiple node failures. The MDS erasure code of the invention has high faulttolerance.
To achieve the above objective, in accordance with one embodiment of the invention, there is provided an MDS erasure code capable of repairing multiple node failures. The MDS erasure code is a C(k, r, p) code which stores original information data blocks and parity data blocks by constructing a (p−l)* (k+r) matrix, in which, p is a prime larger than both k and r, k is an arbitrary integer between 2 and p, and r is smaller than or equal to 5. Both an addition operation and a subtraction operation of the C(k, r, p) code are substituted by an XOR operation. An original data block is split into k columns of the original information data blocks with each column containing p−l bits. r columns of the parity data blocks that are linearly independent from one another are generated from the k columns of the original information data blocks. The original information data blocks and the parity data blocks after being changed are linearly independent.
In a class of this embodiment, the MDS erasure code comprises a construction process comprising:
A) splitting original data B into k original information data blocks with each data block containing L=p−l bits;
B) constructing the parity data blocks; and
C) distributing a total n blocks of the original information data blocks and the parity data blocks to n nodes for storage.
In a class of this embodiment, in A), the original information data blocks are represented by SS=(SS_{0},SS_{1}, . . . SS_{k−1}), where SS _{j }is denoted as s_{0,j}s_{1,j }. . . s_{p−2,j}, s_{p−1,j}=s_{0,j}+s_{1,j}+ . . . s_{p−2,j }is calculated to obtain S=(S_{0}, S_{1}, . . . S_{k−1}), where S_{j }is denoted as s_{0,j}s_{1,j }. . . s_{p−1,j }and in which j=0,1, . . . k−1.
In a class of this embodiment, in B), the parity data blocks are represented by CC=(CC_{0}, CC_{1}, . . . CC_{r−1}), C_{j}=S_{0}+x^{j}S_{1}+x^{j}S_{1}+x^{j=2}S_{2}+ . . . x^{j=(k−1)}S_{k−1}, c_{p−1,j}=c_{0,j}+c_{1,j}+ . . . c_{p−2,j}, in which j=0,1, . . . r−1, multiplication by x^{j=(k−1) }represents cyclically shifting to the left, and + represents the XOR operation.
In a class of this embodiment, in C), each node stores data, and the data stored in the nodes are represented by (SS_{0}, SS_{1}, . . . SS_{k−1}, CC_{0}, CC_{1}, . . . CC_{r−1}).
In a class of this embodiment, the MDS erasure code further comprises a decoding process comprising: collecting l parity data blocks and k−l available original information data blocks when l originial information data blocks S_{j }fail; substracting the k−l available original information data blocks from each of the l parity data blocks to obtain l linear equations; and calculating an inverse matrix of an encoding matrix corresponding to the l linear equations, and putting known data into the inverse matrix to finish decoding.
In a class of this embodiment, the decoding process is capable of recovering five node failures.
Advantages of the MDS erasure code according to embodiments of the invention are summarized as follows: the MDS erasure code of the invention largely improves the faulttolerance capacity of the system, possesses low computational complexity and small computational overhead, and greatly reduces the computational delay of the system, thus, saving time and resource, decreasing the cost, and being suitable for the actual storage system.
The invention is described hereinbelow with reference to accompanying drawings, in which the sole figure is a flow diagram of a construction process of an MDS code capable of repairing multiple node failures in accordance with one embodiment of the invention.
For further illustrating the invention, experiments detailing an MDS erasure code capable of repairing multiple node failures are described below. It should be noted that the following examples are intended to describe and not to limit the invention.
Related terms are defined as follows:
MDS: Maximum Distance Separable
RDP: RowDiagonal Parity
An MDS erasure code capable of repairing multiple node failures is provided. The MDS erasure code is a C(k, r, p) code which stores original information data blocks and parity data blocks by constructing a (p−l)*(k+r) matrix, in which, p is a prime larger than both k and r, k is an arbitrary integer between 2 and p, and r is smaller than or equal to 5. Both an addition operation and a subtraction operation of the C(k, r, p) code are substituted by an XOR operation. An original data block is split into k columns of the original information data blocks with each column containing p−l bits. r columns of the parity data blocks that are linearly independent from one another are generated from the k columns of the original information data blocks. The original information data blocks and the parity data blocks after being changed are linearly independent.
The MDS erasure code of the invention comprises a construction process comprising: A) splitting original data B into k original information data blocks with each data block containing L=p−l bits; B) constructing the parity data blocks; and C) distributing a total n blocks of the original information data blocks and the parity data blocks to n nodes for storage.
In A), the original information data blocks are represented by SS=(SS_{0}, SS_{1}, . . . SS_{k−1}), s_{p−1,j}=s_{0,j}+s_{1,j}+ , . . . s_{p−2,j }is calculated to obtain S=(S_{0}, S_{1}, . . . S_{k−1}), in which j=0,1, . . . k−1.
In B), the parity data blocks are represented by CC=(CC_{0}, CC_{1}, . . . CC_{r−1}), C_{j}=S_{0}+x^{j}S_{1}+x^{j=2}S_{2}+ . . . x^{j=(k−1)}S_{k−1}, c_{p−1,j}=c_{0,j}+c_{1,j}+ . . . c_{p−2,j}, in which j=0,1, . . . r−1, multiplication by x^{j=(k−1) }represents cyclically shifting to the left, and + represents the XOR operation.
In C), each node stores data, and the data stored in the nodes are represented by (SS_{0}, SS_{1}, . . . SS_{k−1}, CC_{0}, CC_{1}, . . . CC_{r−1}).
The MDS erasure code of the invention further comprises a decoding process comprising: collecting l parity data blocks and k−l available original information data blocks when l originial information data blocks S_{j }fail; substracting the k−l available original information data blocks from each of the l parity data blocks to obtain l linear equations; and calculating an inverse matrix of an encoding matrix corresponding to the l linear equations, and putting known data into the inverse matrix to finish decoding.
The decoding process is capable of recovering five node failures.
In one embodiment, the MDS code is the C(k, r, p) code, all addition and subtraction operations in the context can be substituted by the XOR operation. The C(k, r, p) code is used to store original information data blocks and parity data blocks by constructing the (p−1)×(k+r) matrix, in which, p is a primer larger than k and r, k is an arbitrary integer between 2 and p, and r is smaller or equal to 5.
The original data block is split into k columns of the original information data blocks with each column containing p−l bits. Let s_{i,j }denote an ith bit in a jth column of original information data block, in which, i=0,1, . . . p−2. To facilitate the calculation of the parity data blocks, let s_{p−1,j}=s_{0,j}+s_{1,j}+ . . . s_{p−2,j}, SS_{j }is denoted as s_{0,j}s_{1,j }. . . s_{p−2,j}, S_{j }is denoted as s_{0,j}s_{1,j }. . . s_{p−1,j}, in which, j=0,1, . . . k−1.
r columns of linearly independent parity data blocks are generated according to k columns of the original information data blocks. Let c_{i,j }denote the ith bit in the jth column of parity data block, i=0,1, . . . p−2, let c_{p−1,j}=c_{0,j}+c_{1,j}+ . . . c_{p−2,j}, CC_{j }is denoted as c_{0,j}c_{1,j }. . . c_{p−2,j}, C_{j }is denoted as c_{0,j}C_{1,j }. . . c_{p−1,j}, j=0,1, . . . r−1. To enable the original information data blocks to be linearly independent from the parity data blocks after data change, the jth column of parity data block can be derived from the following equation:
C_{j}=S_{0}+x^{j}S_{1}+x^{j=2}S_{2}+ . . . x^{j=(k−1)}S_{k−1}, in which multiplication by x^{j=(k−1) }denotes cyclically shifting by (k−l)j bits, and herein the cyclically shifting is defined as cyclically shifting to the left. After C_{j }is obtained, let c_{p=1,j}=c_{0,j}+c_{1,j}+ . . . c_{p−2,j}. Actually, a primary method to calculate the parity data blocks is to multiply the original information data block by a Vandermonde matrix, which is specifically as follows:
The parity data blocks constructed by this method satisfy the linearly independence from one another, and only the XOR operation and the cyclically shifting are adopted.
Construction process of the C(k, r, p) code:
The C(k, r, p) code is applied in a system containing n nodes and each node stores one original information data block or parity data block. A file is split into k original information data blocks of equal size and stored in k nodes. The k nodes are called systematic nodes. In addition, the encoded r parity data blocks are stored in the remaining r nodes, and these nodes are called parity nodes. And n=k+r.
Constructing process of the C(k, r, p) code is illustrated as
1) The original data B is split into k data blocks with each data block containing L=p−1 bits of data. The original information data are denoted as SS=(SS_{0}, SS_{1}, . . . SS_{k−1}), s_{p−1,j}=s_{0,j}+s_{1,j}+ . . . s_{p=2,j }is calculated to obtain S=(S_{0}, S_{1}, . . . S_{k−1}), in which, j=0,1, . . . k−1.
2) Construction of the parity data blocks:
CC =(CC_{0}, CC_{1}, . . . CC_{r−1}), C_{j}=S_{0}+x^{j}S_{1}+x^{j=2}S_{2}+ . . . x^{j=(k−1)}S_{k−1}, C_{p−1,j}=c_{0,j}+c_{1,j}+ . . . c_{p−2,j}, in which j=0,1, . . . r−1, multiplication by x^{j=(k−1) }denotes the cyclically shifting to the left, and +represents the XOR operation.
3) data are distributed to each node for storage, and the data stored at the nodes are (SS_{0}, SS_{1}, . . . SS_{k−1}, CC_{0}, CC_{1}, . . . CC_{r−1}).
That is, s_{p−1,j }and c_{p−1,j }appeared in the above context are not stored, and the appearances thereof are only for computation convenience.
For example, given that k=4, r=3, and p=5, and a C(4,3,5) code is constructed. The original information data blocks are SS_{0}, SS_{1}, SS_{2}, and SS_{3}, respectively, and the parity data blocks are CC_{0}, CC_{1}, and CC_{2}, respectively, and this code is able to recover at most three node failures.
The computational process of the parity data blocks are as follows:
First, s_{p−1,j }is calculated based on SS_{j}.
A first parity data block is constructed according to C_{0}=S_{0}+S_{1}+S_{2}+ . . . S_{k−1}.
A second parity data block is constructed according to C_{1}=S_{0}+xS_{1}+x S_{2}+ . . . x^{k−1}S_{k−1}.
A third parity data block is constructed according to C_{2}=S_{0}+x^{2}S_{1}+x^{4}S_{2}+ . . . x^{6}S_{k−1}.
Finally, c_{p−1,j }is calculated based on CC_{j}.
For another example, SS_{0}=1111, SS_{1}=0111, SS_{2}=1001, and SS_{3}=0101.
First, s_{p−1,j }is calculated based on SS_{j}.
A first parity data block is constructed according to C_{0}=S_{0}+S_{1}+S_{2}+ . . . S_{−1}.
A second parity data block is constructed according to C_{1}=S_{0}+xS_{1}+x^{2}S_{2}+ . . . x^{k−1}S_{k−1}.
A third parity data block is constructed according to C_{2}=S_{0}+x^{2}S_{1}x^{4}S_{2}+ . . . x^{6}S_{k−1}.
Finally, c_{p−1,j }is calculated based on CC_{j}.
Reconstruction process of the C(k, r, p) code is as follows:
The C(k, r, p) code only adopts the simple XOR operation, and it only requires gathering any k data blocks during data reconstruction. When the original information data blocks are damaged, the parity data blocks are utilized to perform the decoding calculation.
The basic idea of the decoding process of the C(k, r, p) code is introduced herein. Because each parity data block C_{j }is a result of a linear combination of cyclically shifting of all S_{j}. Given that l original information data blocks S_{j }fail, l parity data blocks and k−l available original information data blocks are gathered, and all the k−l available original information data blocks are subtracted from each of the l parity data blocks to obtain l linear equations. The inverse matrix of the encoding matrix corresponding to the l linear equations is computed and then known data are put into the inverse matrix to accomplish the decoding.
The decoding process of the C(4, 3, 5) code is as follows:
Given that S_{0}, S_{3}, C_{0}, C_{1}, and C_{2 }are available while S_{1 }and S_{2 }fail, then S_{0}, S_{3}, C_{0}, and C_{1 }are adopted to repair the failure nodes.
Let f_{0}=C_{0}−S_{0}−S_{3}=S_{1}+S_{2 }and f_{1}=C_{0}−S_{0}−x^{3}S_{3}=xS_{1}+x^{2}S_{2}. Because f_{0}=C_{0}−S_{0}−S_{3 }and f_{1}=C_{0}−S_{0}−x^{3}S_{3}, f_{0 }and f_{1 }are known.
That is, S_{1 }and S_{2 }can be denoted as follows:
Since f_{0 }and f_{1 }are known, it only requires to calculate an inverse of
and
is calculated as follows:
Thus, S_{1}=(x^{3}+x+1) f_{0}+(x^{2}+1) f_{1 }and S_{2}=(x^{3}+X) f_{0}+(x^{2}+1) f_{1}.
The decoding results are S_{1}=01111 and S_{2}=10010, thus the decoding is correct.
In the above, the circumstance of repairing two node failures are described, and this codec method can also be applied to at most five node failures.
Performance evaluation of the C(k, r, p) code
Encoding complexity:
Because different codes have different requirements on the number of the original information data blocks and the bit number of each data block, to make the comparison convenient, the average encoding complexities at each bit are compared among different coding modes. The EVENODD code has two parity data blocks, and each parity bit in the two parity columns is the XOR operation result of information passing through straights lines with a slope of 0 or 1. The average encoding complexity of each bit of the EVENODD node is
The RDP code has two parity data blocks, the first parity data block is obtained by the XOR operation of k original data blocks, as each data block has a length of L bits, (k−l)L XOR operations are performed. While the second parity data block is obtained by the XOR operation of k data blocks in pandiagonal, and similarly (k−l)L XOR operations are performed. BBV code is a code capable of repairing multiple node failures, and the average encoding complexity of each bit thereof is
For C(k, r, p) code, the system has (nk) parity data blocks and each parity data block is obtained by the XOR operation of k original data blocks. Thus, the encoding of each parity data block requires (k−l)L XOR operations, and the average encoding complexity of each bit of the C(k, r, p) code is
Decoding Complexity:
Because different codes have different requirements on the number of the original data blocks and the bit number of each data block, to make the comparison convenient, the average encoding complexities at each bit are compared among different coding modes. Since the common MDS codes can only repair two node failures, herein the recovery of two node failures is discussed.
The RDP code is decoded by iteration and not related to the calculation of finite field itself. The average decoding complexity at each bit of the RDP code is
The average decoding complexity at each bit of the EVENODD code is larger than
The average decoding complexity at each bit of the C(k, r, p) code is
Thus, the general encoding complexity of the C(k, r, p) code is equivalent to those of the EVENODD code and the RDP code and approaches 1, while the general encoding complexity of the BBV code that is capable of recovering at most two node failures approaches 2. Thus, the encoding complexity of the C(k, r, p) code is relatively optimal.
For the decoding, the general decoding complexity of the C(k, r, p) code is equivalent to that of the RDP code, that is, the C(k, r, p) code is relatively optimal.
Comparison of encoding and decoding complexities among different codes
Compared with the common MDS codes, the C(k, r, p) code features its capability of recovering at most five node failures. The simple and operable XOR operation is adopted, so that both the encoding complexity and the decoding complexity are relatively low. Furthermore, the number of the original information data blocks are not fixed and can be arbitrary integer between 2 and p. Compared with the EVENODD code and the RDP code that are only able to recover two failure nodes, the C(k, r, p) code improves the faulttolerance of the system and is able to repair at most five node failures with hardly changing the encoding complexity and the decoding complexity. Compared with the BBV code that is able to recover more than two failure nodes, the C(k, r, p) code has much lower encoding complexity and decoding complexity under the same condition of recovering the multiple failure nodes.
The C(k, r, p) code possesses optimized encoding and decoding complexities, the faulttolerance of the system is greatly improved. Besides, the number of the original information data blocks is not fixed and can be arbitrary integer between 2 and p, thus the C(k, r, p) code is much flexible and realizes optimized compromise between the storage overhead and the system reliability.
Unless otherwise indicated, the numerical ranges involved in the invention include the end values. While particular embodiments of the invention have been shown and described, it will be obvious to those skilled in the art that changes and modifications may be made without departing from the invention in its broader aspects, and therefore, the aim in the appended claims is to cover all such changes and modifications as fall within the true spirit and scope of the invention.