Joint de-duplication-erasure coded distributed storage
First Claim
1. A non-transitory computer-readable storage device storing computer executable instructions that when executed by a computer control the computer to perform a method for deduplicating and erasure coding a message, the method comprising:
- accessing the message,generating a set of message chunks by chunking the message using a first chunking approach;
generating a set of outer-precoded parity symbols and a set of outer-precoded data symbols from the set of message chunks using an outer precode, where the outer precode is a low density parity check (LDPC) precode that uses an LDPC parity check matrix, where the LDPC parity check matrix has a column weight and a row weight;
selectively adapting the LDPC parity check matrix by;
computing a set of chunk statistics associated with a first set of erasure codes;
generating a worst-case (WC) symbol characterization for a member of the first set of erasure codes based, at least in part, on a WC average number of symbol erasures or a WC average number of symbol errors;
computing an average number of symbol errors for the first set of erasure codes based, at least in part, on a set of deduplication parameters, the set of chunk statistics, and the worst-case symbol characterization;
choosing a column weight or a row weight such that a failure probability Pf is less than a decoding threshold;
upon determining that there has been a change in a chunk pool;
generating a new LDPC parity check matrix based, at least in part, on the column weight and the row weight; and
replacing the LDPC parity check matrix with the new LDPC parity check matrix;
storing the set of outer-precoded parity symbols in a data storage system;
generating a set of unique data symbols by deduplicating the set of outer-precoded data symbols based, at least in part, on a chunk identification (ID) table;
storing a copy of the set of unique data symbols in the data storage system;
generating a set of inner-precoded data symbols from the set of unique data symbols using an inner-precode;
generating the first set of erasure codes from the set of inner-precoded data symbols; and
storing the first set of erasure codes in the data storage system.
5 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus deduplicate and erasure code a message in a data storage system. One example apparatus includes a first chunking circuit that generates a set of data chunks from a message, an outer precoding circuit that generates a set of precoded data chunks and a set of parity symbols from the set of data chunks, a second chunking circuit that generates a set of chunked parity symbols from the set of parity symbols, a deduplication circuit that generates a set of deduplicated data chunks by deduplicating the set of precoded chunks or the set of chunked parity symbols, an unequal error protection (UEP) circuit that generates an encoded message from the set of deduplicated data chunks, and a storage circuit that controls the data storage system to store the set of deduplicated data chunks, the set of parity symbols, or the encoded message.
40 Citations
27 Claims
-
1. A non-transitory computer-readable storage device storing computer executable instructions that when executed by a computer control the computer to perform a method for deduplicating and erasure coding a message, the method comprising:
-
accessing the message, generating a set of message chunks by chunking the message using a first chunking approach; generating a set of outer-precoded parity symbols and a set of outer-precoded data symbols from the set of message chunks using an outer precode, where the outer precode is a low density parity check (LDPC) precode that uses an LDPC parity check matrix, where the LDPC parity check matrix has a column weight and a row weight; selectively adapting the LDPC parity check matrix by; computing a set of chunk statistics associated with a first set of erasure codes; generating a worst-case (WC) symbol characterization for a member of the first set of erasure codes based, at least in part, on a WC average number of symbol erasures or a WC average number of symbol errors; computing an average number of symbol errors for the first set of erasure codes based, at least in part, on a set of deduplication parameters, the set of chunk statistics, and the worst-case symbol characterization; choosing a column weight or a row weight such that a failure probability Pf is less than a decoding threshold; upon determining that there has been a change in a chunk pool; generating a new LDPC parity check matrix based, at least in part, on the column weight and the row weight; and replacing the LDPC parity check matrix with the new LDPC parity check matrix; storing the set of outer-precoded parity symbols in a data storage system; generating a set of unique data symbols by deduplicating the set of outer-precoded data symbols based, at least in part, on a chunk identification (ID) table; storing a copy of the set of unique data symbols in the data storage system; generating a set of inner-precoded data symbols from the set of unique data symbols using an inner-precode; generating the first set of erasure codes from the set of inner-precoded data symbols; and storing the first set of erasure codes in the data storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. An apparatus for deduplicating and erasure coding a message, the apparatus comprising:
-
a processor; a data storage system; a memory that stores a chunk index (ID) table; a set of circuits; and an interface that connects the processor, the memory, the data storage system, and the set of circuits, the set of circuits comprising; a first chunking circuit that generates a set of data chunks from the message using a variable length chunking approach; an outer precoding circuit that generates a set of precoded data chunks and a set of parity symbols from the set of data chunks using a low density parity check (LDPC) code that uses an LDPC parity check matrix, where the LDPC parity check matrix has a column weight and a row weight, where the outer precoding circuit selectively adapts the LDPC parity check matrix by; computing a set of chunk statistics associated with a first set of erasure codes; generating a worst-case (WC) symbol characterization for a member of the first set of erasure codes based, at least in part, on a WC average number of symbol erasures or a WC average number of symbol errors; computing an average number of symbol errors for the first set of erasure codes based, at least in part, on a set of deduplication parameters, the set of chunk statistics, and the worst-case symbol characterization; choosing a column weight or a row weight such that a failure probability Pf is less than a decoding threshold; upon determining that there has been a change in a chunk pool, the outer precoding circuit is further configured to; generate a new LDPC parity check matrix based, at least in part, on the column weight and the row weight; and replace the LDPC parity check matrix with the new LDPC parity check matrix; a second chunking circuit that generates a set of chunked parity symbols from the set of parity symbols using the variable length chunking approach; a deduplication circuit that generates a set of unique deduplicated data chunks by deduplicating the set of precoded chunks or the set of chunked parity symbols based, at least in part, on the chunk ID table; an unequal erasure protection (UEP) circuit that generates an encoded message from the set of deduplicated data chunks; and a storage circuit that controls the data storage system to store the set of deduplicated data chunks, the set of parity symbols, or the encoded message in the data storage system. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A method comprising:
-
accessing a data set; generating a chunked data set by chunking the data set; generating an outer precoded chunked data set and a set of parity symbols by precoding the chunked data set using an outer precode, where the outer precode is a low density parity check (LDPC) precode that uses an LDPC parity check matrix, where the LDPC parity check matrix has a column weight and a row weight; selectively adapting the LDPC parity check matrix by; computing a set of chunk statistics associated with an erasure coded unique data set; generating a worst-case (WC) symbol characterization for a member of the erasure coded unique data set, at least in part, on a WC average number of symbol erasures or a WC average number of symbol errors; computing an average number of symbol errors for the erasure coded unique data set based, at least in part, on a set of deduplication parameters, the set of chunk statistics, and the worst-case symbol characterization; choosing a column weight or a row weight such that a failure probability Pf is less than a decoding threshold; upon determining that there has been a change in a chunk pool; generating a new LDPC parity check matrix based, at least in part, on the column weight and the row weight; and replacing the LDPC parity check matrix with the new LDPC parity check matrix; storing the set of parity symbols in a cloud-based data storage system; generating a set of unique data chunks by deduplicating the outer precoded chunked data set; storing the set of unique data chunks in the cloud-based data storage system; generating an inner precoded unique data set by precoding the set of unique data chunks with an inner precode; generating the erasure coded unique data set by erasure coding the inner precoded unique data set; and distributing the erasure coded unique data set across the cloud-based data storage system.
-
Specification