Data deduplication with adaptive erasure code redundancy
First Claim
1. A non-transitory computer-readable storage medium storing computer-executable instructions that when executed by a computer cause the computer to perform a method, the method comprising:
- accessing a message produced by a data deduplication system;
identifying a property of the message, andgenerating W erasure code symbols for the message, where the erasure code symbols are generated according to an X/Y erasure code policy, W, X and Y being integers, W being greater than or equal to X, W being less than or equal to Y, and where W, X or Y depend, at least in part, on a property of the message.
8 Assignments
0 Petitions
Accused Products
Abstract
Example apparatus and methods combine erasure coding with data deduplication to simultaneously reduce the overall redundancy in data while increasing the redundancy of unique data. In one embodiment, an efficient representation of a data set is produced by deduplication. The efficient representation reduces duplicate data in the data set. Redundancy is then added back into the data set using erasure coding. The redundancy that is added back in adds protection to the unique data associated with the efficient representation. How much redundancy is added back in and what type of redundancy is added back in may be controlled based on an attribute (e.g., value, reference count, symbol size, number of symbols) of the unique data. Decisions concerning how much and what type of redundancy to add back in may be adapted over time based, for example, on observations of the efficiency of the overall system.
16 Citations
32 Claims
-
1. A non-transitory computer-readable storage medium storing computer-executable instructions that when executed by a computer cause the computer to perform a method, the method comprising:
-
accessing a message produced by a data deduplication system; identifying a property of the message, and generating W erasure code symbols for the message, where the erasure code symbols are generated according to an X/Y erasure code policy, W, X and Y being integers, W being greater than or equal to X, W being less than or equal to Y, and where W, X or Y depend, at least in part, on a property of the message. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. An apparatus, comprising:
-
a processor; a memory; a set of logics; and an interface that connects the processor, the memory, and the set of logics; the set of logics comprising; a first logic that produces a set of n erasure code symbols for a message received from a data deduplication system, where the message has k symbols, where n is a function of a first attribute of the message, n and k being numbers, n>
k;a second logic that selectively stores members of the n erasure code symbols on z different data storage devices, where z is a function of a second attribute of the message, z being a number. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
Specification