×

SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY TRACKING TEMPORAL RELATIONSHIPS AMONG COPIES USING HIGHER-LEVEL HASH STRUCTURES

  • US 20130318053A1
  • Filed: 03/19/2013
  • Published: 11/28/2013
  • Est. Priority Date: 11/16/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method of storing deduplicated images of a data object that changes over time in a deduplicating content store, the deduplicating content store having a local cache and a global cache, said method comprising:

  • organizing the content of the data object for a first temporal state of the data object as a plurality of content segments and storing the plurality of content segments in a data store;

    creating a content structure representing content of the data object as a hierarchical arrangement of hash structures in the data store, wherein each hash structure includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and wherein a higher-level hash structure in the hierarchical arrangement aggregates a set of lower-level hash structures, such that a logical organization of the content structure represents the organization of the content segments as they are represented within the data object;

    receiving difference information for the data object, said difference information indicating changed content for the data object for a second temporal state of the data object relative to the first temporal state, and said difference information indicating a location of the changed content within the data object;

    receiving the changed content for the data object at the deduplicating content store;

    forming a hash signature for each of a set of changed lower-level hash structures associated with the changed content;

    forming a hash signature for a changed higher-level hash structure aggregating a plurality of the set of changed lower-level hash structures;

    determining, subsequent to receiving the changed content at the deduplicating content store, whether the changed content should be stored by searching for the hash signature for the changed higher-level hash structure in the global cache of the deduplicating content store before attempting to search for the hash signatures for each of the set of changed lower-level hash structures;

    storing any changed content that is unique in the data store as content segments;

    modifying the organized arrangement of hash structures to incorporate new structures for the content segment corresponding to at least one hash signature for the changed content, and incorporating the new structures in the organized arrangement of structures at a position corresponding to the location of the changed content within the data object as indicated within said difference information,thereby using the higher-level hash signature for the changed content without unnecessary searching for hash signatures for the lower-level hash structures.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×