SCALABLE MECHANISM FOR DETECTION OF COMMONALITY IN A DEDUPLICATED DATA SET
First Claim
Patent Images
1. A method, comprising:
- generating a filemap corresponding to a deduplicated file using a processor included in a deduplication system, the filemap including a plurality of filemap indices, a plurality of offsets to identify a plurality of data segments in the deduplicated file, and a plurality of lname entries identifying last files having placed a reference to corresponding data segments in the deduplicated file;
modifying a datastore suitcase, the datastore suitcase including an index portion and a data portion, the data portion holding a plurality of datastore indices corresponding to the filemap indices, a plurality of deduplicated data segments, and a last file entry identifying last files having placed a reference to deduplicated data segments.
25 Assignments
0 Petitions
Accused Products
Abstract
Mechanisms are provided for efficiently determining commonality in a deduplicated data set in a scalable manner regardless of the number of deduplicated files or the number of stored segments. Information is generated and maintained during deduplication to allow scalable and efficient determination of data segments shared in a particular file, other files sharing data segments included in a particular file, the number of files sharing a data segment, etc. Data need not be expanded or uncompressed. Deduplication processing can be validated and verified during commonality detection.
-
Citations
20 Claims
-
1. A method, comprising:
-
generating a filemap corresponding to a deduplicated file using a processor included in a deduplication system, the filemap including a plurality of filemap indices, a plurality of offsets to identify a plurality of data segments in the deduplicated file, and a plurality of lname entries identifying last files having placed a reference to corresponding data segments in the deduplicated file; modifying a datastore suitcase, the datastore suitcase including an index portion and a data portion, the data portion holding a plurality of datastore indices corresponding to the filemap indices, a plurality of deduplicated data segments, and a last file entry identifying last files having placed a reference to deduplicated data segments. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system, comprising:
-
a deduplication system processor configured to generate a filemap corresponding to a deduplicated file, the filemap including a plurality of filemap indices, a plurality of offsets to identify a plurality of data segments in the deduplicated file, and a plurality of lname entries identifying last files having placed a reference to corresponding data segments in the deduplicated file; a storage device configured to hold a datastore suitcase modified by the deduplication system processor, the datastore suitcase including an index portion and a data portion, the data portion holding a plurality of datastore indices corresponding to the filemap indices, a plurality of deduplicated data segments, and a last file entry identifying last files having placed a reference to deduplicated data segments. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer readable medium having computer code embodied therein, the computer readable medium, comprising:
-
computer code for generating a filemap corresponding to a deduplicated file using a processor included in a deduplication system, the filemap including a plurality of filemap indices, a plurality of offsets to identify a plurality of data segments in the deduplicated file, and a plurality of lname entries identifying last files having placed a reference to corresponding data segments in the deduplicated file; computer code modifying a datastore suitcase, the datastore suitcase including an index portion and a data portion, the data portion holding a plurality of datastore indices corresponding to the filemap indices, a plurality of deduplicated data segments, and a last file entry identifying last files having placed a reference to deduplicated data segments. - View Dependent Claims (18, 19, 20)
-
Specification