Packing deduplicated data into finite-sized containers
First Claim
Patent Images
1. A method for rehydrating deduplicated data, by packing the deduplicated data into a plurality of finite-sized containers using a processor device, comprising:
- calculating a similarity score between a plurality of similarly compared files of the deduplicated data, the similarity score indicating an overall deduplication ratio between the similarly compared files of the deduplicated data;
wherein the similarly compared files are at least 1 Gigabyte (GB) in size, wherein calculating the similarity score further includes calculating an nth percentage threshold of common data intersections shared between the plurality of similarly compared files of the deduplicated data, and wherein a transitive closure between the plurality of similarly compared files of the deduplicated data is determined,using the similarity score for grouping the plurality of similarly compared files of the deduplicated data into subsets for destaging each of the subsets from a deduplication system to one of the plurality of finite-sized containers;
wherein a sum a data space of all of the plurality of the plurality of finite-sized containers is substantially equal to the overall deduplication ratio,receiving an indication by a user which of the plurality of similarly compared files are to be grouped into the subsets for destaging each of the subsets from a deduplication system to one of the plurality of finite-sized containers,using the transitive closures for assisting with using the similarity score for grouping the plurality of similarly compared files of the deduplicated data into the subsets, andcalculating a storage metric value by traversing the each of the subsets for determining a required storage space in one of the plurality of finite-sized containers.
1 Assignment
0 Petitions
Accused Products
Abstract
Deduplicated data is packed into finite-sized containers. A similarity score is calculated between files that are similarly of the deduplicated data. The similarity score is used for grouping the similarly compared files of the deduplicated data into subsets for destaging each of the subsets from a deduplication system to one a finite-sized container.
-
Citations
9 Claims
-
1. A method for rehydrating deduplicated data, by packing the deduplicated data into a plurality of finite-sized containers using a processor device, comprising:
-
calculating a similarity score between a plurality of similarly compared files of the deduplicated data, the similarity score indicating an overall deduplication ratio between the similarly compared files of the deduplicated data;
wherein the similarly compared files are at least 1 Gigabyte (GB) in size, wherein calculating the similarity score further includes calculating an nth percentage threshold of common data intersections shared between the plurality of similarly compared files of the deduplicated data, and wherein a transitive closure between the plurality of similarly compared files of the deduplicated data is determined,using the similarity score for grouping the plurality of similarly compared files of the deduplicated data into subsets for destaging each of the subsets from a deduplication system to one of the plurality of finite-sized containers;
wherein a sum a data space of all of the plurality of the plurality of finite-sized containers is substantially equal to the overall deduplication ratio,receiving an indication by a user which of the plurality of similarly compared files are to be grouped into the subsets for destaging each of the subsets from a deduplication system to one of the plurality of finite-sized containers, using the transitive closures for assisting with using the similarity score for grouping the plurality of similarly compared files of the deduplicated data into the subsets, and calculating a storage metric value by traversing the each of the subsets for determining a required storage space in one of the plurality of finite-sized containers. - View Dependent Claims (2, 3)
-
-
4. A system for rehydrating deduplicated data, by packing the deduplicated data into a plurality of finite-sized containers in a computing environment, comprising:
a processor device, operable in the computing environment, wherein the at least one processor device is adapted for; calculating a similarity score between a plurality of similarly compared files of the deduplicated data, the similarity score indicating an overall deduplication ratio between the similarly compared files of the deduplicated data;
wherein the similarly compared files are at least 1 Gigabyte (GB) in size, wherein calculating the similarity score further includes calculating an nth percentage threshold of common data intersections shared between the plurality of similarly compared files of the deduplicated data, and wherein a transitive closure between the plurality of similarly compared files of the deduplicated data is determined,using the similarity score for grouping the plurality of similarly compared files of the deduplicated data into subsets for destaging each of the subsets from a deduplication system to one of the plurality of finite-sized containers;
wherein a sum a data space of all of the plurality of the plurality of finite-sized containers is substantially equal to the overall deduplication ratio,receiving an indication by a user which of the plurality of similarly compared files are to be grouped into the subsets for destaging each of the subsets from a deduplication system to one of the plurality of finite-sized containers, using the transitive closures for assisting with using the similarity score for grouping the plurality of similarly compared files of the deduplicated data into the subsets, and calculating a storage metric value by traversing the each of the subsets for determining a required storage space in one of the plurality of finite-sized containers. - View Dependent Claims (5, 6)
-
7. A computer program product for rehydrating deduplicated data, by packing the deduplicated data into a plurality of finite-sized containers by a processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
a first executable portion for calculating a similarity score between a plurality of similarly compared files of the deduplicated data, the similarity score indicating an overall deduplication ratio between the similarly compared files of the deduplicated data;
wherein the similarly compared files are at least 1 Gigabyte (GB) in size, wherein calculating the similarity score further includes calculating an nth percentage threshold of common data intersections shared between the plurality of similarly compared files of the deduplicated data, and wherein a transitive closure between the plurality of similarly compared files of the deduplicated data is determined,a second executable portion for using the similarity score for grouping the plurality of similarly compared files of the deduplicated data into subsets for destaging each of the subsets from a deduplication system to one of the plurality of finite-sized containers;
wherein a sum a data space of all of the plurality of the plurality of finite-sized containers is substantially equal to the overall deduplication ratio,a third executable portion for receiving an indication by a user which of the plurality of similarly compared files are to be grouped into the subsets for destaging each of the subsets from a deduplication system to one of the plurality of finite-sized containers, a fourth executable portion for using the transitive closures for assisting with using the similarity score for grouping the plurality of similarly compared files of the deduplicated data into the subsets, and a fifth executable portion for calculating a storage metric value by traversing the each of the subsets for determining a required storage space in one of the plurality of finite-sized containers. - View Dependent Claims (8, 9)
-
Specification