Systems and methods for classifying files as candidates for deduplication
First Claim
1. A computer-implemented method for classifying files as candidates for deduplication, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
- identifying at least a portion of a file;
detecting an event that is suggestive of a duplicate instance of the portion of the file already being stored within a storage device prior to determining whether the duplicate instance of the portion of the file is already stored within the storage device;
in response to detecting the event, classifying the file as a candidate for deduplication such that the file'"'"'s candidate-for-deduplication classification indicates that the duplicate instance of the portion of the file is likely already stored within the storage device;
maintaining the file'"'"'s candidate-for-deduplication classification for use in prompting a determination on whether the duplicate instance of the portion of the file is already stored within the storage device by maintaining an attribute associated with the file that indicates that the file is a candidate for deduplication;
reducing the amount of time or resources needed to determine whether a set of files that includes the file qualify for deduplication by, during deduplication or backup of data within a storage system;
identifying the attribute associated with the file;
determining, based on the attribute associated with the file, that the file is a candidate for deduplication;
in response to determining that the file is a candidate for deduplication, determining whether the portion of the file is already stored within the storage device.
7 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method may include identifying at least one file and detecting an event that is suggestive of at least a portion of the file being duplicated in at least one additional file. The computer-implemented method may also include classifying the file as a candidate for deduplication in response to detecting the event. The computer-implemented method may further include maintaining the file'"'"'s candidate-for-deduplication classification for use in prompting a determination on whether the portion of the file is already stored within a storage device.
-
Citations
20 Claims
-
1. A computer-implemented method for classifying files as candidates for deduplication, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
-
identifying at least a portion of a file; detecting an event that is suggestive of a duplicate instance of the portion of the file already being stored within a storage device prior to determining whether the duplicate instance of the portion of the file is already stored within the storage device; in response to detecting the event, classifying the file as a candidate for deduplication such that the file'"'"'s candidate-for-deduplication classification indicates that the duplicate instance of the portion of the file is likely already stored within the storage device; maintaining the file'"'"'s candidate-for-deduplication classification for use in prompting a determination on whether the duplicate instance of the portion of the file is already stored within the storage device by maintaining an attribute associated with the file that indicates that the file is a candidate for deduplication; reducing the amount of time or resources needed to determine whether a set of files that includes the file qualify for deduplication by, during deduplication or backup of data within a storage system; identifying the attribute associated with the file; determining, based on the attribute associated with the file, that the file is a candidate for deduplication; in response to determining that the file is a candidate for deduplication, determining whether the portion of the file is already stored within the storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for determining whether files are candidates for deduplication, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
-
identifying at least a portion of a file; identifying a classification assigned to the file that is suggestive of a duplicate instance of the portion of the file already being stored within a storage device by identifying, within the file, an attribute associated with the file that indicates that the file is a candidate for deduplication; reducing the amount of time or resources needed to determine whether a set of files that includes the file qualify for deduplication by determining, based on the classification assigned to the file, that the file is a candidate for deduplication prior to determining whether the duplicate instance of the portion of the file is already stored within the storage device; in response to determining that the file is a candidate for deduplication, determining whether the duplicate instance of the portion of the file is already stored within the storage device. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A system for classifying files as candidates for deduplication, the system comprising:
-
an identification module, stored in memory, that identifies at least a portion of a file; a detection module, stored in memory, that detects an event that is suggestive of a duplicate instance of the portion of the file already being stored within a storage device prior to a determination of whether the duplicate instance of the portion of the file is already stored within the storage device; a classification module, stored in memory, that; classifies, in response to detecting the event, the file as a candidate for deduplication such that the file'"'"'s candidate-for-deduplication classification indicates that the duplicate instance of the portion of the file is likely already stored within the storage device; maintains the file'"'"'s candidate-for-deduplication classification for use in prompting an application to determine whether the duplicate instance of the portion of the file is already stored within the storage device by maintaining an attribute associated with the file that indicates that the file is a candidate for deduplication; a deduplication module, stored in memory, that reduces the amount of time or resources needed to determine whether a set of files that includes the file qualify for deduplication by, during deduplication or backup of data within a storage system; identifying the attribute associated with the file; determining, based on the attribute associated with the file, that the file is a candidate for deduplication; determining, in response to determining that the file is a candidate for deduplication, whether the portion of the file is already stored within the storage device; at least one processor that executes the identification module, the detection module, the classification module, and the deduplication module. - View Dependent Claims (18, 19, 20)
-
Specification