Enhanced reliability in deduplication technology over storage clouds
First Claim
Patent Images
1. A method implemented in a computer infrastructure comprising a combination of hardware and software, the method comprising:
- performing, by a computer processor, a file deduplication process comprising;
determining, by the computer processor, a weight for each of a plurality of duplicate files based on parameters associated with a respective storage device of each of the plurality of duplicate files; and
designating, by the computer processor, one of the plurality of duplicate files as a master copy based on the determined weight,wherein the weight is additionally based on a respective weighting factor associated with each one of the parameters,further comprising obtaining numerical values for the each one of the parameters and the respective weighting factors,the parameters comprise static parameters having predefined values, andthe parameters comprise dynamic parameters having values that are obtained from the storage devices.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for enhancing reliability in deduplication over storage clouds are provided. A method includes: determining a weight for each of a plurality of duplicate files based on parameters associated with a respective storage device of each of the plurality of duplicate files; and designating one of the plurality of duplicate files as a master copy based on the determined weight.
-
Citations
24 Claims
-
1. A method implemented in a computer infrastructure comprising a combination of hardware and software, the method comprising:
performing, by a computer processor, a file deduplication process comprising; determining, by the computer processor, a weight for each of a plurality of duplicate files based on parameters associated with a respective storage device of each of the plurality of duplicate files; and designating, by the computer processor, one of the plurality of duplicate files as a master copy based on the determined weight, wherein the weight is additionally based on a respective weighting factor associated with each one of the parameters, further comprising obtaining numerical values for the each one of the parameters and the respective weighting factors, the parameters comprise static parameters having predefined values, and the parameters comprise dynamic parameters having values that are obtained from the storage devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A system comprising:
-
one or more computer processors; one or more computer readable hardware storage device; program instructions stored on the one or more computer readable hardware storage device for execution by at least one of the one or more processors, the program instructions comprising; program instructions to identify duplicate files stored at different storage devices; program instructions to determine a weight for each one of the duplicate files based on parameters associated with the storage devices; program instructions to designate one of the duplicate files as a master copy based on the determined weights; and program instructions to determine that two of the duplicate files have an equal highest weight, wherein the designating one of the duplicate files as the master copy comprises selecting one of the two based on a tie-breaker parameter. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer program product comprising
one or more computer readable hardware storage device and program instructions stored on the one or more computer readable hardware storage device, the program instructions comprising: -
program instructions to determine a hash value for each of a plurality of files; program instructions to determine a set of duplicate files based on the hash values; and program instructions to deduplicate the set of duplicate files, wherein the deduplicating comprises; determining a weight for each one of the duplicate files, wherein the weight is based on parameters associated with storage devices; and designating a master copy of the set based on the weight of each one of the duplicate files; nominating remaining files in the set, other than the master copy, for deletion, wherein the parameters comprise static parameters having predefined values, and the parameters comprise dynamic parameters having values that are obtained from the storage devices. - View Dependent Claims (16, 17, 18)
-
-
19. A method of file deduplication of a plurality of files, comprising:
providing a computer infrastructure operable to; determine a hash value for each of the plurality of files; define sets of the plurality files based on the hash values; and for each respective one of the sets, perform a file deduplication process comprising; determine a highest weight file in the respective set, wherein the weight is based on parameters associated with storage devices; designate the highest weight file as a master copy for the respective set; and nominate remaining files in the respective set, other than the master copy, for deletion, wherein the parameters comprise static parameters having predefined values, and the parameters comprise dynamic parameters having values that are obtained from the storage devices. - View Dependent Claims (20, 21)
-
22. A computer system for file deduplication, the system comprising:
-
a CPU, a computer readable memory and a computer readable storage media; program instructions to identify a set of duplicate files; program instructions to determine a weight for each one of the duplicate files; and program instructions to designate a master copy of the set based on the weight of each one of the duplicate files; and program instructions to delete remaining files of the set, other than the master copy, and replace the remaining files with respective pointers pointing to the master copy, wherein the program instructions are stored on the computer readable storage media for execution by the CPU via the computer readable memory; the weight is based on parameters associated with storage devices and weighting factors defined for the parameters; and the parameters are related to at least one of reliability, health, and user preference of the storage devices. - View Dependent Claims (23, 24)
-
Specification