Method and system for implementing high yield de-duplication for computing applications
First Claim
Patent Images
1. A computer-implemented method for selectively performing data de-duplication in a storage device, comprising:
- generating, by a storage management system of the storage device, a non-deduplication reference count for a fingerprint;
generating a de-duplication reference count for the fingerprint;
computing scores for a plurality of extents from the non-deduplication reference count and the de-duplication reference count;
ordering the plurality of extents from the scores to generate an ordered list of extents;
selecting at least some of the plurality of extents from the ordered list of the extents for de-duplication; and
removing copies of data from the at least some of the plurality of extents selected for de-duplication by replacing the copies of data removed with a reference to one or more remaining copies stored on another extent, wherein removing the copies of data from the at least some of the plurality of extents removes a contiguous portion of data from the storage device.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is an improved approach for implementing de-duplication, by selecting data such that the de-duplication efficacy of the storage will be increased without arbitrarily increasing metadata size.
-
Citations
30 Claims
-
1. A computer-implemented method for selectively performing data de-duplication in a storage device, comprising:
-
generating, by a storage management system of the storage device, a non-deduplication reference count for a fingerprint; generating a de-duplication reference count for the fingerprint; computing scores for a plurality of extents from the non-deduplication reference count and the de-duplication reference count; ordering the plurality of extents from the scores to generate an ordered list of extents; selecting at least some of the plurality of extents from the ordered list of the extents for de-duplication; and removing copies of data from the at least some of the plurality of extents selected for de-duplication by replacing the copies of data removed with a reference to one or more remaining copies stored on another extent, wherein removing the copies of data from the at least some of the plurality of extents removes a contiguous portion of data from the storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product embodied on a non-transitory computer readable medium, the non-transitory computer readable medium having stored thereon a sequence of instructions which, when executed by a processor causes the processor to execute a method for selectively performing data de-duplication in a storage device, the method comprising:
-
generating, by a storage management system of the storage device, a non-deduplication reference count for a fingerprint; generating a de-duplication reference count for the fingerprint; computing scores for a plurality of extents from the non-deduplication reference count and the de-duplication reference count; ordering the plurality of extents from the scores to generate an ordered list of extents; selecting at least some of the plurality of extents from the ordered list of the extents for de-duplication; and removing copies of data from the at least some of the plurality of extents selected for de-duplication by replacing the copies of data removed with a reference to one or more remaining copies stored on another extent, wherein removing the copies of data from the at least some of the plurality of extents removes a contiguous portion of data from the storage device. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for selectively performing data de-duplication in a storage device, comprising:
-
a processor to handle computing instructions to access the storage devices; and a computer readable medium comprising executable code that is executable by the processor for generating, by a storage management system of the storage device, a non-deduplication reference count for a fingerprint, generating a de-duplication reference count for the fingerprint, computing scores for a plurality of extents from the non-deduplication reference count and the de-duplication reference count;
ordering the plurality of extents from the scores to generate an ordered list of extents;
selecting at least some of the plurality of extents from the ordered list of the extents for de-duplication; andremoving copies of data from the at least some of the plurality of extents selected for de-duplication by replacing the copies of data removed with a reference to one or more remaining copies stored on another extent, wherein removing the copies of data from the at least some of the plurality of extents removes a contiguous portion of data from the storage device. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification