Efficient method for the reconstruction of digital information

US 7,472,334 B1
Filed: 10/15/2004
Issued: 12/30/2008
Est. Priority Date: 10/15/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A computerized method for encoding digital information for protection from data loss in storage or memory or in transmission on communication paths using a linear transformation defined by an (m+k)×

m coding matrix A over a Galois Field GF(2^q), said encoding method comprising the steps of;

assembling an m×

l vector x comprised of components x_jfrom m data chunks representing the digital information, each chunk comprising q hyperwords each of an identical but arbitrary number of bits; and

multiplying said vector x by said matrix A, comprised of elements A_ij, using the operations provided by a MultiplyAndAdd(y_i, A_ij, x_j) subroutine to produce an (m+k)×

l vector y of m+k chunks y_ithat are resilient to the erasure of any k chunks, said operations includingjumping to or otherwise executing a predetermined sequence of instructions that are unique to the binary value of A_ij, each of said predetermined sequence of instructions consisting of a bitwise XOR of a hyperword of chunk x_jwith and stored in a hyperword of chunk y_i.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Improved method of encoding and repairing data for reliable storage and transmission using erasure codes, which is efficient enough for implementation in software as well as hardware. A systematic linear coding matrix over GF(2^q) is used which combines parity for fast correction of single erasures with the capability of correcting k erasures. Finite field operations involving the coding and repair matrices are redefined to consist of bitwise XOR operations on words of arbitrary length. The elements of the matrix are selected to reduce the number of XOR operations needed and buffers are aligned for optimal processor cache efficiency. Decode latency is reduced by pre-calculating repair matrices, storing them in a hashed table and looking them up using a bit mask identifying the erasures to be repaired.

61 Citations

View as Search Results

36 Claims

1. A computerized method for encoding digital information for protection from data loss in storage or memory or in transmission on communication paths using a linear transformation defined by an (m+k)×
- m coding matrix A over a Galois Field GF(2^q), said encoding method comprising the steps of;
  
  assembling an m×
  
  l vector x comprised of components x_jfrom m data chunks representing the digital information, each chunk comprising q hyperwords each of an identical but arbitrary number of bits; and
  
  multiplying said vector x by said matrix A, comprised of elements A_ij, using the operations provided by a MultiplyAndAdd(y_i, A_ij, x_j) subroutine to produce an (m+k)×
  
  l vector y of m+k chunks y_ithat are resilient to the erasure of any k chunks, said operations includingjumping to or otherwise executing a predetermined sequence of instructions that are unique to the binary value of A_ij, each of said predetermined sequence of instructions consisting of a bitwise XOR of a hyperword of chunk x_jwith and stored in a hyperword of chunk y_i.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein the number of bits occupied by the q hyperwords is an integer multiple of the cache line size of a processor implementing said method.
  - 3. The method of claim 1, further comprising the step of generating subroutine MultiplyAndAdd(d, f, s) by performing the steps of:
    - generating source code for an empty subroutine body;
      
      adding source code to the subroutine body to dispatch to a sequence of instructions that is unique to each of the values of f=0 through 2^q−
      
      1;
      
      adding said sequence of instructions to the subroutine body for each of the values of f=0 through 2^q−
      
      1, by repeating for each specific value of f=x in a Galois Field GF(2^q) the steps of;
      
      constructing a q×
      
      q matrix τ
      
      representing the operation of multiplication by x in the Galois Field GF(2);
      
      testing the value of each element τ
      
      _ijof said matrix τ
      
      ; and
      
      appending instructions to store the XOR of the i-th hyperword of chunk d with the j-th hyperword of chunk s into the i-th hyperword of chunk d for each nonzero value of said element τ
      
      _ij.

4. A computerized method for constructing a (m+k)×
- m coding matrix A for use in linear erasure codes which is optimized for single erasure recovery, comprising the steps of;
  
  constructing from elements of a Galois Field GF(2^q) an augmented coding matrix comprised of;
  
  an m×
  
  m identity sub-matrix I;
  
  a l×
  
  m row sub-matrix P, the elements of said row matrix P having the value 1 in the Galois Field; and
  
  a (k−
  
  1)×
  
  m sub-matrix C, the elements c_ijof said matrix C chosen so that all sub-matrices of A formed by deleting k rows are non-singular; and
  
  encoding information data using the augmented coding matrix and transmitting on a communications channel.
- View Dependent Claims (5, 6, 7)
- - 5. The method of claim 4, further comprising the step of choosing the elements c_ijso as to minimize the sum over all i,j of a function W(c_ij), said function W(c_ij) defined as the count of the non-zero elements in the q×
    - q matrix τ
      
      representing the operation of multiplication by c_ijin the Galois Field GF(2).
  - 6. The method of claim 5, wherein said step of choosing the elements c_ijso as to minimize said sum over all i,j of the function W(c_ij) further comprises the steps of:
    - enumerating all partitionings of 2^qvalues in GF(2^q) into a set S_xcomprised of k−
      
      1 values and a set S_ycomprised of m values to obtain an enumeration of partitionings;
      
      computing a sum over all ij of the function W(c_ij) where c_ij=1/(x_i+y_j) for x_i S_xand y_j S_y; and
      
      choosing from the enumeration of partitionings an optimal partitioning for which said sum over all i,j of the function W(c_ij) is minimal;
      
      constructing the values c_ijfrom said optimal partitioning.
  - 7. The method of claim 6, wherein:
    - k=2; and
      
      said step of enumerating all partitionings of the 2^qvalues in GF(2^q) is limited to enumerations that satisfy S_x={0}.

8. A computerized method for recovering stored or transmitted digital information form storage or memory or from a transmitter on a communication path that has been encoded with a linear erasure correcting code defined by a (m+k)×
- m coding matrix A over a Galois Field GF(2^q), said encoded digital information represented by m+k data chunks, each of which is associated with a row index, said method for recovering from 1 to k chunks comprising the steps of;
  
  constructing a set F containing the row indices associated with the data chunks to be recovered and additional row indices so as to increase the number of elements in the set F to k;
  
  constructing an (m+k)×
  
  l vector x from said chunks such that the i-th component of vector x is the data chunk associated with row index i;
  
  constructing a row deleted m×
  
  l vector from said vector x by deleting each row of the vector x with a row index in set F;
  
  calculating an (m+k)×
  
  m repair matrix R_Ffrom matrix A, said calculation of the matrix R_Ffurther comprising the steps ofconstructing an m×
  
  m matrix by deleting each row of the matrix A with a row index in set F,calculating the inverse of said row deleted m×
  
  m matrix, andmultiplying the matrix A by said inverse of the row deleted m×
  
  m matrix to produce the (m+k)×
  
  m repair matrix R_F; and
  
  using the (m+k)×
  
  m repair matrix R_Fas a coding matrix in a linear transformation of the m data chunks of said row deleted m×
  
  l vector to produce an (m+k)×
  
  m vector,where row i is the recovered data chunk associated with row index i.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The method of claim 8, further comprising the steps of:
    - storing pre-calculated rows of the repair matrix R_Fin a repair table indexed by a hash table for fast lookup; and
      
      looking up rows of the repair matrix for recovering the chunks identified by a set F of row indices through the steps ofcreating a bit mask by setting to 1 only those bits at bit positions that equal the row indices in set F,computing the value of a hash function h(x), where x is the bit mask,indexing into said hash table using said value and retrieving a repair table index, andindexing into said repair table using said repair table index and retrieving said rows of the repair matrix for recovering the chunks with row indices in set F.
  - 10. The method of claim 9, wherein the step of storing pre-calculated rows of the repair matrices into said repair table indexed by said hash table further comprises the steps of:
    - allocating space for a hash table and a repair table and initializing the contents of said hash table and repair table to indicate an initial empty state; and
      
      iteratively adding content to the repair table and the hash table to facilitate the recovery of any e chunks, where e in each iteration takes the values k, k−
      
      1, . . . , 0.
  - 11. The method of claim 10, wherein said step of iterative adding further comprises the steps of enumerating all valid bitmask values with e bits set and for each enumerated bitmask value, performing the steps comprising:
    - computing or otherwise obtaining a repair matrix R_Fsuitable for recovering the chunks that are described by the bitmask value;
      
      adding rows of said repair matrix to the repair table at the next available location;
      
      computing the value of the hash function h(x) from the bitmask value; and
      
      storing the location of said rows of said repair matrix in the hash table at the hash table index equal to said value of the hash function.
  - 12. The method of claim 11, wherein the number of chunks n=m+k and the hash function h(x) is calculated from the steps comprising:
    - initializing a variable Sum to zero and a variable BitPos to one;
      
      returning the value of Sum as the value of the hash function if the value of x is zero or after repetitive operations on x have diminished its value to zero, said repetitive operations comprising;
      
      setting the value of Sum to Sum×
      
      n+BitPos if the rightmost bit position of x is set,logically shifting x right by 1 bit position, andincrementing BitPos.

13. A computerized system for encoding digital information for protection from data loss in storage or memory or in transmission on communication paths using a linear transformation defined by an (m+k)×
- m coding matrix A over a Galois Field GF(2^q), comprising;
  
  a processor programmed to accomplish the steps of;
  
  assembling an m×
  
  l vector x comprised of components x_jfrom m data chunks representing the digital information, each chunk comprising q hyperwords each of an identical but arbitrary number of bits;
  
  multiplying said vector x by said matrix A, comprised of elements A_ij, using the operations provided by a MultiplyAndAdd(y_i, A_ij, x_j) subroutine to produce an (m+k)×
  
  l vector y of m+k chunks y_ithat are resilient to the erasure of any k chunks, said operations including jumping to or otherwise executing a predetermined sequence of instructions that are unique to the binary value of A_ijeach of said predetermined sequence of instructions consisting of a bitwise XOR of a hyperword of chunk x_jwith and stored in a hyperword of chunk y_i.
- View Dependent Claims (14, 15)
- - 14. The system of claim 13, wherein the number of bits occupied by the q hyperwords is an integer multiple of the cache line size of the processor.
  - 15. The system of claim 13, wherein the steps accomplished by the processor further comprise generating subroutine MultiplyAndAdd(d, f, s) by performing the steps of:
    - generating source code for an empty subroutine body;
      
      adding source code to the subroutine body to dispatch to a sequence of instructions that is unique to each of the values of f=0 through 2^q−
      
      1;
      
      adding said sequence of instructions to the subroutine body for each of the values of f=0 through 2^q−
      
      1, by repeating for each specific value of f=x in a Galois Field GF(2^q) the steps of;
      
      constructing a q×
      
      q matrix τ
      
      representing the operation of multiplication by x in the Galois Field GF(2);
      
      testing the value of each element τ
      
      _ijof said matrix τ
      
      ; and
      
      appending instructions to store the XOR of the i-th hyperword of chunk d with the j-th hyperword of chunk s into the i-th hyperword of chunk d for each nonzero value of said element τ
      
      _ij.

16. A computerized system for constructing a (m+k)×
- m coding matrix A for use in linear erasure codes which is optimized for single erasure recovery, comprising;
  
  a processor programmed to construct from elements of a Galois Field GF(2^q) an augmented coding matrix comprised of;
  
  an m×
  
  m identity sub-matrix I;
  
  a l×
  
  m row sub-matrix P, the elements of said row matrix P having the value 1 in the Galois Field; and
  
  a (k−
  
  1)×
  
  m sub-matrix C, the elements c_ijof said matrix C chosen so that all sub-matrices of A formed by deleting k rows are non-singular.
- View Dependent Claims (17, 18, 19)
- - 17. The system of claim 16, wherein the processor is further programmed to accomplish the step of choosing the elements c_ijso as to minimize the sum over all i,j of a function W(c_ij), said function W(c_ij) defined as the count of the non-zero elements in the q×
    - q matrix τ
      
      representing the operation of multiplication by c_ijin the Galois Field GF(2).
  - 18. The system of claim 17, wherein said step of choosing the elements c_ijso as to minimize said sum over all i,j of the function W(c_ij) further comprises the steps of:
    - enumerating all partitionings of 2^qvalues in GF(2^q) into a set S_xcomprised of k−
      
      1 values and a set S_ycomprised of m values to obtain an enumeration of partitionings;
      
      computing a sum over all i,j of the function W(c_ij) where c_ij=1/(x_i+y_j) for x_i S_xand y_j S_y;
      
      choosing from the enumeration of partitionings an optimal partitioning for which said sum over all i,j of the function W(c_ij) is minimal; and
      
      constructing the values c_ijfrom said optimal partitioning.
  - 19. The system of claim 18, wherein:
    - k=2; and
      
      said step accomplished by the processor of enumerating all partitionings of the 2^qvalues in GF(2^q) is limited to enumerations that satisfy S_x={0}.

20. A computerized system for recovering stored or transmitted digital information that has been encoded with a linear erasure correcting code defined by a (m+k)×
- m coding matrix A over a Galois Field GF(2^q), said encoded digital information represented by m+k data chunks, each of which is associated with a row index, comprising;
  
  a processor programmed to recover from 1 to k chunks by accomplishing the steps of;
  
  constructing a set F containing the row indices associated with the data chunks to be recovered and additional row indices so as to increase the number of elements in the set F to k;
  
  constructing an (m+k)×
  
  l vector x from said chunks such that the i-th component of vector x is the data chunk associated with row index i;
  
  constructing a row deleted m×
  
  l vector from said vector x by deleting each row of the vector x with a row index in set F;
  
  calculating an (m+k)×
  
  m repair matrix R_Ffrom matrix A, said calculation of the matrix R_Ffurther comprising the steps ofconstructing an m×
  
  m matrix by deleting each row of the matrix A with a row index in set F,calculating the inverse of said row deleted m×
  
  m matrix, andmultiplying the matrix A by said inverse of the row deleted m×
  
  m matrix to produce the (m+k)×
  
  m repair matrix R_F; and
  
  using the (m+k)×
  
  m repair matrix R_Fas a coding matrix in a linear transformation of the m data chunks of said row deleted m×
  
  l vector to produce an (m+k)×
  
  m vector, where row i is the recovered data chunk associated with row index i.
- View Dependent Claims (21, 22, 23, 24)
- - 21. The system of claim 20, wherein the processor is further programmed to accomplish the steps of:
    - storing pre-calculated rows of the repair matrix R_Fin a repair table indexed by a hash table for fast lookup; and
      
      looking up rows of the repair matrix for recovering the chunks identified by a set F of row indices through the steps ofcreating a bit mask by setting to 1 only those bits at bit positions that equal the row indices in set F,computing the value of a hash function h(x), where x is the bit mask,indexing into said hash table using said value and retrieving a repair table index, andindexing into said repair table using said repair table index and retrieving said rows of the repair matrix for recovering the chunks with row indices in set F.
  - 22. The system of claim 21, wherein the step accomplished by the processor of storing precalculated rows of the repair matrices into said repair table indexed by said hash table further comprises the steps of:
    - allocating space for a hash table and a repair table and initializing the contents of said hash table and repair table to indicate an initial empty state; and
      
      iteratively adding content to the repair table and the hash table to facilitate the recovery of any c chunks, where e in each iteration takes the values k, k−
      
      1, . . . , 0.
  - 23. The system of claim 22, wherein said step accomplished by the processor of iterative adding further comprises the steps of enumerating all valid bitmask values with e bits set and for each enumerated bitmask value, performing the steps comprising:
    - computing or otherwise obtaining a repair matrix R_Fsuitable for recovering the chunks that are described by the bitmask value;
      
      adding rows of said repair matrix to the repair table at the next available location;
      
      computing the value of the hash function h(x) from the bitmask value; and
      
      storing the location of said rows of said repair matrix in the hash table at the hash table index equal to said value of the hash function.
  - 24. The system of claim 23, wherein the number of chunks n—
    - m+k and the hash function h(x) is calculated by the processor by accomplishing the steps comprising;
      
      initializing a variable Sum to zero and a variable BitPos to one;
      
      returning the value of Sum as the value of the hash function if the value of x is zero or after repetitive operations on x have diminished its value to zero, said repetitive operations comprising;
      
      setting the value of Sum to Sum×
      
      n+BitPos if the rightmost bit position of x is set, logically shifting x right by I bit position, and incrementing BitPos.

25. A computer storage medium having computer-executable instructions for performing a method for encoding digital information for protection from data loss in storage or memory or in transmission on communication paths using a linear transformation defined by an (m+k)×
- m coding matrix A over a Galois Field GF(2^q), said encoding method comprising the steps of;
  
  assembling an m×
  
  l vector x comprised of components x;
  
  from m data chunks representing the digital information, each chunk comprising q hyperwords each of an identical but arbitrary number of bits;
  
  multiplying said vector x by said matrix A, comprised of elements A_ij, using the operations provided by a MultiplyAndAdd(y_i, A_ij, x_j) subroutine to produce an (m+k)×
  
  l vector y of m+k chunks y_ithat are resilient to the erasure of any k chunks, said operations includingjumping to or otherwise executing a predetermined sequence of instructions that are unique to the binary value of A_ij, each of said predetermined sequence of instructions consisting of a bitwise XOR of a hyperword of chunk x_jwith and stored in a hyperword of chunk y_i.
- View Dependent Claims (26, 27)
- - 26. The computer storage medium of claim 25, wherein the number of bits occupied by the q hyperwords is an integer multiple of the cache line size of a processor implementing said method.
  - 27. The computer storage medium of claim 25, further comprising the step of generating subroutine MultiplyAndAdd(d, f, s) by performing the steps of:
    - generating source code for an empty subroutine body;
      
      adding source code to the subroutine body to dispatch to a sequence of instructions that is unique to each of the values of f=0 through 2^q−
      
      1;
      
      adding said sequence of instructions to the subroutine body for each of the values of f=0 through 2^q−
      
      1, by repeating for each specific value of f=x in a Galois Field GF(2^q) the steps of;
      
      constructing a q×
      
      q matrix τ
      
      representing the operation of multiplication by x in the Galois Field GF(2);
      
      testing the value of each element τ
      
      _ijof said matrix τ
      
      ; and
      
      appending instructions to store the XOR of the i-th hyperword of chunk d with the j-th hyperword of chunk s into the i-th hyperword of chunk d for each nonzero value of said element τ
      
      _ij.

28. A computer storage medium having computer-executable instructions for performing a method for constructing a (m+k)×
- n coding matrix A for use in linear erasure codes which is optimized for single erasure recovery, comprising the steps of;
  
  constructing from elements of a Galois Field GF(2^q) an augmented coding matrix comprised ofan m×
  
  m identity sub-matrix I;
  
  a l×
  
  m row sub-matrix P, the elements of said row matrix P having the value I in the Galois Field; and
  
  a (k−
  
  1)×
  
  m sub-matrix C, the elements c_ijof said matrix C chosen so that all sub-matrices of A formed by deleting k rows are non-singular.
- View Dependent Claims (29, 30, 31)
- - 29. The computer storage medium of claim 28, further comprising the step of choosing the elements c_ijso as to minimize the sum over all ij of a function W(c_ij), said function W(c_ij) defined as the count of the non-zero elements in the q×
    - q matrix τ
      
      representing the operation of multiplication by c_ijin the Galois Field GF(2).
  - 30. The computer storage medium of claim 29, wherein said step of choosing the elements c_ijso as to minimize said sum over all i,j of the function W(c_ij) further comprises the steps of:
    - enumerating all partitionings of 2^qvalues in GF(2^q) into a set S_xcomprised of k−
      
      1 values and a set S_ycomprised of m values to obtain an enumeration of partitionings;
      
      computing a sum over all i,j of the function W(c_ij} where c_ij=1/(x_i+y_j) for x_i S_xand y_j S_y; and
      
      choosing from the enumeration of partitionings an optimal partitioning for which said sum over all i,j of the function W(c_ij) is minimal;
      
      constructing the values c_ijfrom said optimal partitioning.
  - 31. The computer storage medium of claim 30, wherein:
    - k=2; and
      
      said step of enumerating all partitionings of the 2^qvalues in GF(2^q) is limited to enumerations that satisfy S_x={ }.

32. The computer storage medium having computer-executable instructions for performing a method for recovering stored or transmitted digital information that has been encoded with a linear erasure correcting code defined by a (m+k)×
- m coding matrix A over a Galois Field GF(2^q), said encoded digital information represented by m+k data chunks, each of which is associated with a row index, said method for recovering from 1 to k chunks comprising the steps of;
  
  constructing a set F containing the row indices associated with the data chunks to be recovered and additional row indices so as to increase the number of elements in the set F to k;
  
  constructing an (m+k)×
  
  l vector x from said chunks such that the i-th component of vector x is the data chunk associated with row index i;
  
  constructing a row deleted m×
  
  l vector from said vector x by deleting each row of the vector x with a row index in set F;
  
  calculating an (m+k)×
  
  m repair matrix R_Ffrom matrix A, said calculation of the matrix R_Ffurther comprising the steps ofconstructing an m×
  
  m matrix by deleting each row of the matrix A with a row index in set F,calculating the inverse of said row deleted m×
  
  m matrix, andmultiplying the matrix A by said inverse of the row deleted m×
  
  m matrix to produce the (m+k)×
  
  m repair matrix R_F; and
  
  using the (m+k)×
  
  m repair matrix R_Fas a coding matrix in a linear transformation of the m data chunks of said row deleted m×
  
  l vector to produce an (m+k)×
  
  m vector, where row i is the recovered data chunk associated with row index i.
- View Dependent Claims (33, 34, 35, 36)
- - 33. The computer storage medium of claim 32, further comprising the steps of:
    - storing pre-calculated rows of the repair matrix R_Fin a repair table indexed by a hash table for fast lookup; and
      
      looking up rows of the repair matrix for recovering the chunks identified by a set F of row indices through the steps ofcreating a bit mask by setting to 1 only those bits at bit positions that equal the row indices in set F,computing the value of a hash function h(x), where x is the bit mask,indexing into said hash table using said value and retrieving a repair table index, andindexing into said repair table using said repair table index and retrieving said rows of the repair matrix for recovering the chunks with row indices in set F.
  - 34. The computer storage medium of claim 33, wherein the step of storing pre-calculated rows of the repair matrices into said repair table indexed by said hash table further comprises the steps of:
    - allocating space for a hash table and a repair table and initializing the contents of said hash table and repair table to indicate an initial empty state; and
      
      iteratively adding content to the repair table and the hash table to facilitate the recovery of any e chunks, where e in each iteration takes the values k, k−
      
      1, . . . , 0.
  - 35. The computer storage medium of claim 34, wherein said step of iterative adding step further comprises the steps of enumerating all valid bitmask values with e bits set and for each enumerated bitmask value, performing the steps comprising:
    - computing or otherwise obtaining a repair matrix R_Fsuitable for recovering the chunks that are described by the bitmask value;
      
      adding rows of said repair matrix to the repair table at the next available location;
      
      computing the value of the hash function h(x) from the bitmask value; and
      
      storing the location of said rows of said repair matrix in the hash table at the hash table index equal to said value of the hash function.
  - 36. The computer storage medium of claim 35, wherein the number of chunks n=m+k and the hash function h(x) is calculated from the steps comprising:
    - initializing a variable Sum to zero and a variable BitPos to one;
      
      returning the value of Sum as the value of the hash function if the value of x is zero or after repetitive operations on x have diminished its value to zero, said repetitive operations comprising;
      
      setting the value of Sum to Sum×
      
      n+BitPos if the rightmost bit position of x is set,logically shifting x right by 1 bit position, andincrementing BitPos.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Myron Zimmerman, Thomas P. Scott
Original Assignee
Myron Zimmerman, Thomas P. Scott
Inventors
Zimmerman, Myron, Scott, Thomas P.
Primary Examiner(s)
Chaudry; M. Mujtaba K
Assistant Examiner(s)
Rizk; Sam

Application Number

US10/966,984
Time in Patent Office

1,537 Days
Field of Search

714/781, 714/785
US Class Current

714/785
CPC Class Codes

H03M 13/151 using error location or err...

Efficient method for the reconstruction of digital information

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

61 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient method for the reconstruction of digital information

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links