Adaptive erasure codes
First Claim
1. A non-transitory computer-readable storage medium storing computer executable instructions that when executed by a computer control the computer to perform a method for encoding a message to be stored in a distributed data storage system, the method comprising:
- accessing a plurality of messages, where a member of the plurality of messages has a message size;
upon determining that the message size of the member of the plurality of messages is greater than or equal to a threshold size;
selecting a Fountain coding approach as a selected coding approach;
upon determining that the message size of the member of the plurality of messages is less than the threshold size;
computing an interleaving overhead;
computing a coding overhead;
upon determining that the interleaving overhead is greater than one and the coding overhead is less than a threshold coding overhead amount;
selecting a Fountain coding approach as the selected coding approach;
upon detecting that the interleaving overhead is not greater than one and the coding overhead is not less than the threshold coding overhead amount;
selecting a Reed Solomon coding approach as the selected coding approach;
upon detecting that a number of parities is less than a threshold number of parities, where the threshold number of parities is four when a STAR coding approach is employed, and where the threshold number of parities is three when a non-STAR coding approach is employed;
using an exclusive or (XOR) based block-maximum distance separable (MDS) coding approach as the selected coding approach;
upon detecting that the number of parities is greater than or equal to the threshold number of parities;
using a classical construction coding approach as the selected coding approach;
upon determining that the data storage system is within a threshold reliability level;
automatically and dynamically generating an adapted coding approach by adjusting a set of coding parameters, where the set of coding parameters is based, at least in part, on the selected coding approach;
upon determining that the data storage system is not within the threshold reliability level, encoding the message using the adapted coding approach, andstoring the encoded message in the distributed data storage system.
8 Assignments
0 Petitions
Accused Products
Abstract
Methods, apparatus, and other embodiments associated with adaptive use of erasure codes for distributed data storage systems are described. One example method includes accessing a message, where the message has a message size, selecting an encoding strategy as a function of the message size, data storage device failure statistics, data storage device wear periods, data storage space constraints, or overhead constraints, and where the encoding strategy includes an erasure code approach, generating an encoded message using the encoding strategy, generating an encoded block, where the encoded block includes the encoded message and metadata associated with the message, and storing the encoded block in the data storage system. Example methods and apparatus may employ Reed Solomon erasure codes or Fountain erasure codes. Example methods and apparatus may display to a user the storage capacity and durability of the data storage system.
15 Citations
20 Claims
-
1. A non-transitory computer-readable storage medium storing computer executable instructions that when executed by a computer control the computer to perform a method for encoding a message to be stored in a distributed data storage system, the method comprising:
-
accessing a plurality of messages, where a member of the plurality of messages has a message size; upon determining that the message size of the member of the plurality of messages is greater than or equal to a threshold size; selecting a Fountain coding approach as a selected coding approach; upon determining that the message size of the member of the plurality of messages is less than the threshold size; computing an interleaving overhead; computing a coding overhead; upon determining that the interleaving overhead is greater than one and the coding overhead is less than a threshold coding overhead amount; selecting a Fountain coding approach as the selected coding approach; upon detecting that the interleaving overhead is not greater than one and the coding overhead is not less than the threshold coding overhead amount; selecting a Reed Solomon coding approach as the selected coding approach; upon detecting that a number of parities is less than a threshold number of parities, where the threshold number of parities is four when a STAR coding approach is employed, and where the threshold number of parities is three when a non-STAR coding approach is employed; using an exclusive or (XOR) based block-maximum distance separable (MDS) coding approach as the selected coding approach; upon detecting that the number of parities is greater than or equal to the threshold number of parities; using a classical construction coding approach as the selected coding approach; upon determining that the data storage system is within a threshold reliability level; automatically and dynamically generating an adapted coding approach by adjusting a set of coding parameters, where the set of coding parameters is based, at least in part, on the selected coding approach; upon determining that the data storage system is not within the threshold reliability level, encoding the message using the adapted coding approach, and storing the encoded message in the distributed data storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer-readable storage device storing computer executable instructions that when executed by a computer control the computer to perform a method for encoding a message to be stored in a data storage system, the method comprising:
-
accessing a plurality of messages, where a member of the plurality of messages has a message size; upon determining that the message size of the member of the plurality of messages is greater than or equal to a threshold size; selecting a first coding approach as a selected coding approach; upon determining that the message size of the member of the plurality of messages is less than the threshold size; computing an interleaving overhead; computing a coding overhead; and selecting the first coding approach or a second, different coding approach as the selected coding approach based, at least in part, on the interleaving overhead or the coding overhead; upon determining that the data storage system is within a threshold reliability level; automatically and dynamically generating an adapted coding approach by adjusting a set of coding parameters, where the set of coding parameters is based, at least in part, on the selected coding approach; upon determining that the data storage system is not within the threshold reliability level; generating an encoded message by encoding the message using the adapted coding approach, and storing the encoded message in the data storage system. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification