OBJECT DEDUPLICATION AND APPLICATION AWARE SNAPSHOTS
First Claim
Patent Images
1. A method of reducing data duplication in a computer system storing files, the method comprising:
- (a) parsing a file to determine native application boundaries of applications within the file;
(b) determining if the applications are compressed;
(c) decompressing the applications within the file with a technique appropriate for each application;
(d) recursively performing steps (a)-(d) until all primitive objects are uncovered and decompressed;
(e) correlating primitive objects within each file and across a plurality of files accessible by the computer system; and
(f) reducing redundancy between two or more primitive objects within related or versioned files.
25 Assignments
0 Petitions
Accused Products
Abstract
Embodiments deploy delayering techniques, and the relationships between successive versions of a rich-media file become apparent. With this, modified rich-media files suddenly present far smaller storage overhead as compared to traditional application-unaware snapshot and versioning implementations. Optimized file data is stored in suitcases. As a file is versioned, each new version of the file is placed in the same suitcase as the previous version, allowing embodiments to employ correlation techniques to enhance optimization savings.
-
Citations
13 Claims
-
1. A method of reducing data duplication in a computer system storing files, the method comprising:
-
(a) parsing a file to determine native application boundaries of applications within the file; (b) determining if the applications are compressed; (c) decompressing the applications within the file with a technique appropriate for each application; (d) recursively performing steps (a)-(d) until all primitive objects are uncovered and decompressed; (e) correlating primitive objects within each file and across a plurality of files accessible by the computer system; and (f) reducing redundancy between two or more primitive objects within related or versioned files. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of managing a data storage system, comprising:
-
determining if a file is of a compound nature containing underlying primitive objects or of a primitive nature; extracting the underlying primitive objects from files determined to be compound, at least one extraction comprising two layers of decoding with two different codecs; correlating the extracted objects with the information they represent; and reducing information redundancy between two or more primitive objects within related or versioned files.
-
Specification