Object deduplication and application aware snapshots
First Claim
Patent Images
1. A method comprising:
- parsing a file to identify boundaries for a plurality of first level objects, including a first compound object, in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a primitive object is the most basic representation of a discrete data structure in the file;
recursively parsing the first compound object until a plurality of lowest level primitive objects is identified;
correlating primitive objects within the file and across a plurality of files;
storing the identified boundaries for use in deduplication;
setting deduplication boundaries at boundaries of the plurality of lowest level primitive objects such that deduplication uses variable sized blocks instead of fixed sized blocks; and
optimizing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms.
31 Assignments
0 Petitions
Accused Products
Abstract
Embodiments deploy delayering techniques, and the relationships between successive versions of a rich-media file become apparent. With this, modified rich-media files suddenly present far smaller storage overhead as compared to traditional application-unaware snapshot and versioning implementations. Optimized file data is stored in suitcases. As a file is versioned, each new version of the file is placed in the same suitcase as the previous version, allowing embodiments to employ correlation techniques to enhance optimization savings.
-
Citations
20 Claims
-
1. A method comprising:
-
parsing a file to identify boundaries for a plurality of first level objects, including a first compound object, in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a primitive object is the most basic representation of a discrete data structure in the file; recursively parsing the first compound object until a plurality of lowest level primitive objects is identified; correlating primitive objects within the file and across a plurality of files; storing the identified boundaries for use in deduplication; setting deduplication boundaries at boundaries of the plurality of lowest level primitive objects such that deduplication uses variable sized blocks instead of fixed sized blocks; and optimizing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; and memory comprising one or more programs, the one or more programs containing instructions for; parsing a file to identify boundaries for a plurality of first level objects, including a first compound object, in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a primitive object is the most basic representation of a discrete data structure in the file; recursively parsing the first compound object until a plurality of lowest level primitive objects is identified; correlating primitive objects within the file and across a plurality of files; storing the identified boundaries for use in deduplication; setting deduplication boundaries at boundaries of the plurality of lowest level primitive objects such that deduplication uses variable sized blocks instead of fixed sized blocks; and optimizing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium comprising one or more programs, the one or more programs containing instructions for:
-
parsing a file to identify boundaries for a plurality of first level objects, including a first compound object, in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a primitive object is the most basic representation of a discrete data structure in the file; recursively parsing the first compound object until a plurality of lowest level primitive objects is identified; correlating primitive objects within the file and across a plurality of files; storing the identified boundaries for use in deduplication; setting deduplication boundaries at boundaries of the plurality of lowest level primitive objects such that deduplication uses variable sized blocks instead of fixed sized blocks; and optimizing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification