Method and apparatus for determining optimal chunk sizes of a deduplicated storage system
First Claim
Patent Images
1. A computer-implemented method, comprising:
- receiving a request to evaluate deduplication effectiveness of a deduplicated storage system;
examining, in response to the request, metadata of first data chunks associated with the deduplicated storage system, the first data chunks being partitioned from one or more data files according to a first chunk size;
calculating a plurality of new chunk sizes based on the examination of the metadata of first data chunks;
merging metadata of the first data chunks according to each of the plurality of new chunk sizes to represent new data chunks to which the first data chunks would have been merged;
determining a deduplication rate of each of the new data chunks based on the merged metadata; and
selecting a second chunk size from the plurality of new chunk sizes, wherein the second chunk size has the highest deduplication rate among all the plurality of new chunk sizes.
9 Assignments
0 Petitions
Accused Products
Abstract
Techniques for evaluating deduplication effectiveness of data chunks in a storage system are described herein. In one embodiment, metadata of first data chunks associated with a deduplicated storage system is examined, where the first data chunks have been partitioned according to a first chunk size. A second chunk size is calculated based on the examination of the metadata of first data chunks. Metadata of the first data chunks is merged according to the second chunk size to represent second data chunks to which the first data chunks would have been merged. A deduplication rate of the second data chunks is determined based on the merged metadata.
-
Citations
24 Claims
-
1. A computer-implemented method, comprising:
-
receiving a request to evaluate deduplication effectiveness of a deduplicated storage system; examining, in response to the request, metadata of first data chunks associated with the deduplicated storage system, the first data chunks being partitioned from one or more data files according to a first chunk size; calculating a plurality of new chunk sizes based on the examination of the metadata of first data chunks; merging metadata of the first data chunks according to each of the plurality of new chunk sizes to represent new data chunks to which the first data chunks would have been merged; determining a deduplication rate of each of the new data chunks based on the merged metadata; and selecting a second chunk size from the plurality of new chunk sizes, wherein the second chunk size has the highest deduplication rate among all the plurality of new chunk sizes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform operations comprising:
-
receiving a request to evaluate deduplication effectiveness of a deduplicated storage system; examining, in response to the request, metadata of first data chunks associated with the deduplicated storage system, the first data chunks being partitioned from one or more data files according to a first chunk size; calculating a plurality of new chunk sizes based on the examination of the metadata of first data chunks; merging metadata of the first data chunks according to each of the plurality of new chunk sizes to represent new data chunks to which the first data chunks would have been merged; determining a deduplication rate of each of the new data chunks based on the merged metadata; and selecting a second chunk size from the plurality of new chunk sizes, wherein the second chunk size has the highest deduplication rate among all the plurality of new chunk sizes. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A storage system, comprising:
-
a storage unit to store a plurality of deduplicated data chunks; and a chunk evaluation unit coupled to the storage unit configured to receive a request to evaluate deduplication effectiveness of a deduplicated storage system, examine metadata of first data chunks partitioned from one or more data files according to a first chunk size, calculate a plurality of new chunk sizes based on the examination of the metadata of first data chunks, merge metadata of the first data chunks according to each of the plurality of new chunk sizes to represent new data chunks to which the first data chunks would have been merged, determine a deduplication rate of each of the new data chunks based on the merged metadata, select a second chunk size from the plurality of new chunk sizes, wherein the second chunk size has the highest deduplication rate among all the plurality of new chunk sizes. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification