Object deduplication and application aware snapshots
First Claim
Patent Images
1. A method comprising:
- parsing a file to identify boundaries for a plurality of first level objects, including a first compound object, in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a primitive object is the most basic representation of a discrete data structure in the file;
recursively parsing the first compound object until a plurality of lowest level primitive objects is identified;
correlating primitive objects within the file and across a plurality of files;
storing the identified boundaries for use in deduplication;
setting deduplication boundaries at boundaries of the plurality of lowest level primitive objects such that deduplication uses variable sized blocks instead of fixed sized blocks; and
optimizing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms.
31 Assignments
0 Petitions
Accused Products
Abstract
Embodiments deploy delayering techniques, and the relationships between successive versions of a rich-media file become apparent. With this, modified rich-media files suddenly present far smaller storage overhead as compared to traditional application-unaware snapshot and versioning implementations. Optimized file data is stored in suitcases. As a file is versioned, each new version of the file is placed in the same suitcase as the previous version, allowing embodiments to employ correlation techniques to enhance optimization savings.
23 Citations
20 Claims
-
1. A method comprising:
-
parsing a file to identify boundaries for a plurality of first level objects, including a first compound object, in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a primitive object is the most basic representation of a discrete data structure in the file; recursively parsing the first compound object until a plurality of lowest level primitive objects is identified; correlating primitive objects within the file and across a plurality of files; storing the identified boundaries for use in deduplication; setting deduplication boundaries at boundaries of the plurality of lowest level primitive objects such that deduplication uses variable sized blocks instead of fixed sized blocks; and optimizing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; and memory comprising one or more programs, the one or more programs containing instructions for; parsing a file to identify boundaries for a plurality of first level objects, including a first compound object, in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a primitive object is the most basic representation of a discrete data structure in the file; recursively parsing the first compound object until a plurality of lowest level primitive objects is identified; correlating primitive objects within the file and across a plurality of files; storing the identified boundaries for use in deduplication; setting deduplication boundaries at boundaries of the plurality of lowest level primitive objects such that deduplication uses variable sized blocks instead of fixed sized blocks; and optimizing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium comprising one or more programs, the one or more programs containing instructions for:
-
parsing a file to identify boundaries for a plurality of first level objects, including a first compound object, in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a primitive object is the most basic representation of a discrete data structure in the file; recursively parsing the first compound object until a plurality of lowest level primitive objects is identified; correlating primitive objects within the file and across a plurality of files; storing the identified boundaries for use in deduplication; setting deduplication boundaries at boundaries of the plurality of lowest level primitive objects such that deduplication uses variable sized blocks instead of fixed sized blocks; and optimizing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification