MULTIMODAL OBJECT DE-DUPLICATION
First Claim
1. A method of storing an object of an object system having an object index, the method comprising:
- if the size of the object is below a data size threshold, storing the object in the object system indexed according to an object de-duplication method; and
if the size of the object is not below the data size threshold;
if the object comprises a structure, storing the object in the object system indexed according to an object segment de-duplication method based on at least one object segment defined by the structure of the object; and
if the object does not comprise a structure, storing the object in the object system indexed according to an object chunk de-duplication method based on at least one arbitrarily defined object chunk.
2 Assignments
0 Petitions
Accused Products
Abstract
Various object de-duplication techniques may be applied to object systems (such as to files in a file store) to identify similar or identical objects or portions thereof, so that duplicate objects or object portions may be associated with one copy, and the duplicate copies may be removed. However, an object de-duplication technique that is suitable for de-duplicating one type of object may be inefficient for de-duplicating another type of object; e.g., a de-duplication method that significantly condenses sets of small objects may achieve very little condensation among sets of large objects, and vice versa. A multimodal approach to object de-duplication may be devised that analyzes an object to be stored and chooses a de-duplication technique that is likely to be effective for storing the object. The object index may be configured to support several de-duplication schemes for indexing and storing many types of objects in a space-economizing manner.
-
Citations
20 Claims
-
1. A method of storing an object of an object system having an object index, the method comprising:
-
if the size of the object is below a data size threshold, storing the object in the object system indexed according to an object de-duplication method; and if the size of the object is not below the data size threshold; if the object comprises a structure, storing the object in the object system indexed according to an object segment de-duplication method based on at least one object segment defined by the structure of the object; and if the object does not comprise a structure, storing the object in the object system indexed according to an object chunk de-duplication method based on at least one arbitrarily defined object chunk. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system for storing an object of an object system having an object index, the system comprising:
-
an object storage component configured to store objects having a size below a data size threshold in the object system indexed according to an object de-duplication method; an object segment storage component configured to store objects of a structure and having a size not below a data size threshold in the object system indexed according to an object segment de-duplication method based on at least one object segment defined by the structure of the object; and an object chunk storage component configured to store objects without structure and having a size not below the data size threshold in the object system indexed according to an object chunk de-duplication method based on at least one arbitrarily defined object chunk. - View Dependent Claims (17, 18, 19)
-
-
20. A method of storing an object comprising files of an object system having an object index configured to store signatures and trait sets of respective objects, the object index having a segment index configured to store signatures of respective segments, and the method comprising:
-
if the size of the object is below a data size threshold of 128 kilobytes, storing the object in the object system indexed according to an object de-duplication method comprising; generating a signature of the object; comparing the signature of the object with the signatures of other objects in the object system; upon identifying a second object having a signature equal to the signature of the object, indexing the object in the object index as a reference to the second object; upon failing to identify a second object having a signature equal to the signature of the object; storing the object in the object system, and indexing the object in the object index as a reference to the object; and storing the signature of the object in the object index; and if the size of the object is not below the data size threshold; if the object comprises a structure, storing the object in the object system indexed according to an object segment de-duplication method based on at least one object segment defined by the structure of the object, the method comprising; segmenting the object according to the structure of the object; for respective segments of the object; generating a signature of the segment; comparing the signature of the segment with the signatures of other segments in the object system; upon identifying a second segment having a signature equal to the signature of the segment, indexing the segment in the segment index as a reference to the second segment; upon failing to identify a second segment having a signature equal to the signature of the segment; storing the segment in the object system, and indexing the segment in the segment index as a reference to the segment; indexing the object in the object index as a reference to the segments of the object indexed in the segment index; and storing the signature of the segment in the segment index; and if the object does not comprise a structure, storing the object in the object system indexed according to an object chunk de-duplication method based on at least one arbitrarily defined object chunk, the method comprising; detecting at least zero fingerprints in the object of a fingerprint size of 32 bits and matching a fingerprint value comprising a random value associated with the object index, the fingerprints computed according to a fingerprint detection method comprising; setting a sliding window of the fingerprint size at a start position of the object; and while the sliding window is within the object; computing the Rabin fingerprint hash of the sliding window; if the Rabin fingerprint hash of the sliding window equals the fingerprint value, defining a chunk from one of the position of a preceding chunk and the start position to the position of the sliding window; and incrementing the sliding window by a window increment size of eight bits; dividing the object into chunks according to the fingerprints of the object; computing a trait set of the object comprising at least one trait relating to the chunks of the object, respective traits associated with a trait hash function, and the computing comprising; for respective traits of the trait set; calculating a trait hash for respective chunks of the object with the trait hash function; selecting a lowest trait hash having a lowest value among the trait hashes of the chunks; and selecting the trait comprising an arbitrary selection of bits of the lowest trait hash according to the mathematical formula;
Tt=select(t−
1)b . . . tb−
1Htwherein;
t represents a trait number 1 . . . n among n traits;
Ht represents the lowest trait hash among the trait hashes of the chunks computed according to trait hash function t;
b represents the bit size of a trait, wherein nb=size(Ht); and
Tt represents the trait computed for trait number t;computing trait set similarities between the trait set of the object and the trait sets of other objects in the object system; upon identifying a second object having a trait set similarity greater than a similarity threshold; computing a data delta between the object and the second object, and storing the data delta in the object system, and indexing the object in the object index as a reference to the second object and the data delta; upon failing to identify a second object having a trait set similarity greater than the similarity threshold; storing the object in the object system, and indexing the object in the object index as a reference to the object; and storing the trait set of the object in the object index.
-
Specification