Region-integrated data deduplication implementing a multi-lifetime duplicate finder
First Claim
1. A computer program product for performing deduplication in conjunction with random read and write operations across a namespace, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a computer to cause the computer to perform a method comprising:
- receiving, at the computer, a write request comprising a data chunk;
computing, by the computer, a fingerprint of the data chunk;
determining, by the computer, whether a short term dictionary corresponding to the namespace comprises an entry corresponding to the fingerprint;
in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing, by the computer, the data chunk to a data store corresponding to the namespace in a deduplicating manner;
in response to determining the short term dictionary does not comprise the entry corresponding to the fingerprint, determining, by the computer, whether a long term dictionary corresponding to the namespace comprises the entry corresponding to the fingerprint;
in response to determining the long term dictionary comprises the entry corresponding to the fingerprint, writing, by the computer the data chunk to the data store in the deduplicating manner;
in response to determining the long term dictionary does not comprise the entry corresponding to the fingerprint, writing, by the computer, the data chunk to the data store in a non-deduplicating manner; and
in response to determining the long term dictionary comprises the entry corresponding to the fingerprint, repopulating the short term dictionary with the entry corresponding to the fingerprint,wherein the short term dictionary comprises a first eviction policy,wherein the long term dictionary comprises a second eviction policy,wherein the first eviction policy is configured to evict one or more entries of the short term dictionary in response to a new entry being inserted into the short term dictionary, andwherein the second eviction policy is configured to evict one or more entries of the long term dictionary in response to a new entry being inserted into the long term dictionary.
1 Assignment
0 Petitions
Accused Products
Abstract
Computer program products, as well as corresponding systems and methods are configured for performing deduplication in conjunction with random read and write operations, and include: receiving a write request comprising data; computing a fingerprint of the data; determining whether a short term dictionary comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data to a data store in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry, determining whether a long term dictionary corresponding to the namespace comprises the entry; in response to determining the long term dictionary comprises the entry, writing the data to the data store in the deduplicating manner; and in response to determining the long term dictionary does not comprise the entry, writing the data to the data store in a non-deduplicating manner.
-
Citations
19 Claims
-
1. A computer program product for performing deduplication in conjunction with random read and write operations across a namespace, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a computer to cause the computer to perform a method comprising:
-
receiving, at the computer, a write request comprising a data chunk; computing, by the computer, a fingerprint of the data chunk; determining, by the computer, whether a short term dictionary corresponding to the namespace comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing, by the computer, the data chunk to a data store corresponding to the namespace in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry corresponding to the fingerprint, determining, by the computer, whether a long term dictionary corresponding to the namespace comprises the entry corresponding to the fingerprint; in response to determining the long term dictionary comprises the entry corresponding to the fingerprint, writing, by the computer the data chunk to the data store in the deduplicating manner; in response to determining the long term dictionary does not comprise the entry corresponding to the fingerprint, writing, by the computer, the data chunk to the data store in a non-deduplicating manner; and in response to determining the long term dictionary comprises the entry corresponding to the fingerprint, repopulating the short term dictionary with the entry corresponding to the fingerprint, wherein the short term dictionary comprises a first eviction policy, wherein the long term dictionary comprises a second eviction policy, wherein the first eviction policy is configured to evict one or more entries of the short term dictionary in response to a new entry being inserted into the short term dictionary, and wherein the second eviction policy is configured to evict one or more entries of the long term dictionary in response to a new entry being inserted into the long term dictionary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-implemented method for performing deduplication in conjunction with random read and write operations across a namespace, the method comprising:
-
receiving a write request comprising a data chunk; computing a fingerprint of the data chunk; determining whether a short term dictionary corresponding to the namespace comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data chunk to a data store corresponding to the namespace in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry corresponding to the fingerprint, determining whether a long term dictionary corresponding to the namespace comprises the entry corresponding to the fingerprint; in response to determining the long term dictionary comprises the entry corresponding to the fingerprint; writing the data chunk to the data store in the deduplicating manner; and repopulating the short term dictionary with the entry corresponding to the fingerprint; and in response to determining the long term dictionary does not comprise the entry corresponding to the fingerprint, writing the data chunk to the data store in a non-deduplicating manner, wherein the short term dictionary comprises a first eviction policy, wherein the long term dictionary comprises a second eviction policy, wherein the first eviction policy is configured to evict one or more entries of the short term dictionary in response to a new entry being inserted into the short term dictionary, and wherein the second eviction policy is configured to evict one or more entries of the long term dictionary in response to a new entry being inserted into the long term dictionary.
-
-
19. A deduplicating storage system configured to perform deduplication in conjunction with random read and write operations across a namespace, the system comprising:
- a processor and logic integrated with the processor, executable by the processor, or integrated with and executable by the processor, the logic being configured to cause the processor to perform a method comprising;
receiving a write request comprising a data chunk; computing a fingerprint of the data chunk; determining whether a short term dictionary corresponding to the namespace comprises an entry corresponding to the fingerprint; in response to determining the short term dictionary comprises the entry corresponding to the fingerprint, writing the data chunk to a data store corresponding to the namespace in a deduplicating manner; in response to determining the short term dictionary does not comprise the entry corresponding to the fingerprint, determining whether a long term dictionary corresponding to the namespace comprises the entry corresponding to the fingerprint; in response to determining the long term dictionary comprises the entry corresponding to the fingerprint; writing the data chunk to the data store in the deduplicating manner; and repopulating the short term dictionary with the entry corresponding to the fingerprint; and in response to determining the long term dictionary does not comprise the entry corresponding to the fingerprint, writing the data chunk to the data store in a non-deduplicating manner, wherein the short term dictionary comprises a first eviction policy, wherein the long term dictionary comprises a second eviction policy, wherein the first eviction policy is configured to evict one or more entries of the short term dictionary in response to a new entry being inserted into the short term dictionary, and wherein the second eviction policy is configured to evict one or more entries of the long term dictionary in response to a new entry being inserted into the long term dictionary.
- a processor and logic integrated with the processor, executable by the processor, or integrated with and executable by the processor, the logic being configured to cause the processor to perform a method comprising;
Specification