Object deduplication and application aware snapshots
First Claim
Patent Images
1. A method comprising:
- parsing a file to identify boundaries for a plurality of first level objects included in the file in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a first compound object included in the file encapsulates a plurality of second level objects and a primitive object is the most basic representation of a discrete data structure in the file;
recursively parsing the first compound object to identify boundaries for a plurality of second level objects included in the first compound object;
determining whether each of the plurality of second level objects is compound or primitive;
identifying a plurality of lowest level primitive objects, wherein the plurality of lowest level primitive objects are basic representations of discrete data structures, wherein metadata for each of the plurality of lowest level primitive objects is stored redundantly in a suitcase file, and wherein deduplication boundaries are set at boundaries of the plurality of lowest level primitive objects;
decompressing the plurality of lowest level primitive objects;
recompressing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms.
24 Assignments
0 Petitions
Accused Products
Abstract
Embodiments deploy delayering techniques, and the relationships between successive versions of a rich-media file become apparent. With this, modified rich-media files suddenly present far smaller storage overhead as compared to traditional application-unaware snapshot and versioning implementations. Optimized file data is stored in suitcases. As a file is versioned, each new version of the file is placed in the same suitcase as the previous version, allowing embodiments to employ correlation techniques to enhance optimization savings.
23 Citations
16 Claims
-
1. A method comprising:
-
parsing a file to identify boundaries for a plurality of first level objects included in the file in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a first compound object included in the file encapsulates a plurality of second level objects and a primitive object is the most basic representation of a discrete data structure in the file; recursively parsing the first compound object to identify boundaries for a plurality of second level objects included in the first compound object; determining whether each of the plurality of second level objects is compound or primitive; identifying a plurality of lowest level primitive objects, wherein the plurality of lowest level primitive objects are basic representations of discrete data structures, wherein metadata for each of the plurality of lowest level primitive objects is stored redundantly in a suitcase file, and wherein deduplication boundaries are set at boundaries of the plurality of lowest level primitive objects; decompressing the plurality of lowest level primitive objects; recompressing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer readable medium comprising:
-
computer code for parsing a file to identify boundaries for a plurality of first level objects included in the file in order to determine whether each of the plurality of first level objects is compound or primitive, wherein a first compound object included in the file encapsulates a plurality of second level objects and a primitive object is the most basic representation of a discrete data structure in the file; computer code for recursively parsing the first compound object to identify boundaries for a plurality of second level objects included in the first compound object; computer code for determining whether each of the plurality of second level objects is compound or primitive; computer code for identifying a plurality of lowest level primitive objects, wherein the plurality of lowest level primitive objects are basic representations of discrete data structures, wherein metadata for each of the plurality of lowest level primitive objects is stored redundantly in a suitcase file, and wherein deduplication boundaries are set at boundaries of the plurality of lowest level primitive objects; computer code for decompressing the plurality of lowest level primitive objects; computer code for recompressing the plurality of lowest level primitive objects with a plurality of object specific optimization algorithms. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
an interface operable to receive a file comprising a plurality of first level objects; a parser operable to recursively parse the file to identify a plurality of first level objects, second level objects, and lowest level objects included in the file; a processor configured to determine through parsing the file whether each of the plurality of first level objects is compound or primitive, wherein a first compound object included in the file encapsulates a plurality of second level objects and a primitive object is the most basic representation of a discrete data structure in the file, the processor further configured to determine whether each of the plurality of second level objects is compound or primitive, wherein the plurality of lowest level primitive objects are identified, wherein the plurality of lowest level primitive objects are basic representations of discrete data structures, wherein metadata for each of the plurality of lowest level primitive objects is stored redundantly in a suitcase file, and wherein deduplication boundaries are set at boundaries of the plurality of lowest level primitive objects; a decompression mechanisms configured to decompress the plurality of lowest level primitive objects, wherein the plurality of lowest level primitive objects are recompressed with a plurality of object specific optimization algorithms. - View Dependent Claims (16)
-
Specification