×

Joint de-duplication-erasure coded distributed storage

  • US 10,318,389 B2
  • Filed: 07/15/2016
  • Issued: 06/11/2019
  • Est. Priority Date: 07/15/2016
  • Status: Active Grant
First Claim
Patent Images

1. A non-transitory computer-readable storage device storing computer executable instructions that when executed by a computer control the computer to perform a method for deduplicating and erasure coding a message, the method comprising:

  • accessing the message, generating a set of message chunks by chunking the message using a first chunking approach;

    generating a set of outer-precoded parity symbols and a set of outer-precoded data symbols from the set of message chunks using an outer precode, where the outer precode is a low density parity check (LDPC) precode that uses an LDPC parity check matrix, where the LDPC parity check matrix has a column weight and a row weight;

    selectively adapting the LDPC parity check matrix by;

    computing a set of chunk statistics associated with a first set of erasure codes;

    generating a worst-case (WC) symbol characterization for a member of the first set of erasure codes based, at least in part, on a WC average number of symbol erasures or a WC average number of symbol errors;

    computing an average number of symbol errors for the first set of erasure codes based, at least in part, on a set of deduplication parameters, the set of chunk statistics, and the worst-case symbol characterization;

    choosing a column weight or a row weight such that a failure probability Pf is less than a decoding threshold;

    upon determining that there has been a change in a chunk pool;

    generating a new LDPC parity check matrix based, at least in part, on the column weight and the row weight; and

    replacing the LDPC parity check matrix with the new LDPC parity check matrix;

    storing the set of outer-precoded parity symbols in a data storage system;

    generating a set of unique data symbols by deduplicating the set of outer-precoded data symbols based, at least in part, on a chunk identification (ID) table, where the chunk ID table stores a unique chunk ID associated with a unique data symbol stored in the data storage system, a chunk size associated with the unique data symbol, or a chunk reference count associated with the unique chunk ID, where the chunk ID table is stored in a data storage device with a faster access time than the data storage system;

    storing a copy of the set of unique data symbols in the data storage system;

    generating a set of inner-precoded data symbols from the set of unique data symbols using an inner-precode;

    generating the first set of erasure codes from the set of inner-precoded data symbols using an unequal error protection (UEP) rateless Luby transform (LT) code based, at least in part, on the chunk ID table; and

    storing the first set of erasure codes in the data storage system.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×