MAINTAINING DEDUPLICATION DATA IN NATIVE FILE FORMATS
First Claim
1. A method, comprising:
- parsing a file to identify a first plurality of components including a first component and a second component of the file;
replacing the first component in the file with a first stub and the second component in the file with a second stub;
delineating the first component into a first plurality of chunks;
generating a first chunk identifier corresponding to a first chunk, the first chunk identifier used to access a deduplication dictionary;
determining whether the first chunk is already stored in a deduplication system using the first chunk identifier and the deduplication dictionary.
23 Assignments
0 Petitions
Accused Products
Abstract
Mechanisms are provided to maintain deduplication data in native file formats. Files, including entities such as volumes and databases, are analyzed to identify components suitable for deduplication. These components suitable for deduplication are delineated into chunks and identifiers are generated for each of the chunks. The identifiers are used to reference the chunks in deduplication dictionaries that provide locations indicating where deduplicated chunks are stored. The components in the files are replaced with file handles or stubs that applications can use to access deduplicated data. Applications can continue to perform operations on the files as though no deduplication has occurred.
8 Citations
20 Claims
-
1. A method, comprising:
-
parsing a file to identify a first plurality of components including a first component and a second component of the file; replacing the first component in the file with a first stub and the second component in the file with a second stub; delineating the first component into a first plurality of chunks; generating a first chunk identifier corresponding to a first chunk, the first chunk identifier used to access a deduplication dictionary; determining whether the first chunk is already stored in a deduplication system using the first chunk identifier and the deduplication dictionary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer readable medium, comprising:
-
computer code for parsing a file to identify a first plurality of components including a first component and a second component of the file; computer code for replacing the first component in the file with a first stub and the second component in the file with a second stub; computer code for delineating the first component into a first plurality of chunks; computer code for generating a first chunk identifier corresponding to a first chunk, the first chunk identifier used to access a deduplication dictionary; computer code for determining whether the first chunk is already stored in a deduplication system using the first chunk identifier and the deduplication dictionary. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method, comprising:
-
parsing a file to identify a first plurality of components including a first component and a second component of the file; replacing the first component in the file with a first stub and the second component in the file with a second stub; delineating the first component into a first plurality of chunks; generating a first chunk identifier corresponding to a first chunk, the first chunk identifier used to access a deduplication dictionary; determining whether the first chunk is already stored in a deduplication system using the first chunk identifier and the deduplication dictionary.
-
Specification