Direct lookup for identifying duplicate data in a data deduplication system
First Claim
Patent Images
1. A method for identifying data in a data deduplication system, by a processor device, comprising:
- identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and
deduplicating the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments for identifying data in a data deduplication system, by a processor device, are provided. In one embodiment, a method comprises efficiently identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the direct inter-region fingerprint lookup supplementing a central fingerprint index.
-
Citations
24 Claims
-
1. A method for identifying data in a data deduplication system, by a processor device, comprising:
-
identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and deduplicating the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for identifying data in a data deduplication system, the system comprising:
at least one processor device, wherein the processor device; identifies duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and deduplicates the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
17. A computer program product for identifying data in a data deduplication system, by a processor device, the computer program product embodied on a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
an executable portion that identifies duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and an executable portion that deduplicates the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification