×

Data processing method and apparatus in cluster system

  • US 8,892,529 B2
  • Filed: 12/24/2013
  • Issued: 11/18/2014
  • Est. Priority Date: 12/12/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method of data de-duplication performed by a first processing node in storage system having a plurality of processing nodes each maintaining multiple data containers for storing de-duplicated data chunks, comprising:

  • receiving a data stream to be stored after de-duplication;

    dividing a segment of the data stream into a plurality of super-chunks, each super-chunk including multiple data chunks;

    deriving a first super-chunk identification (SID) for a super-chunk of the segment;

    identifying a second processing node of the storage system that corresponds to the first SID;

    querying the second processing node for a first data container that corresponds to the first SID, wherein the first data container is maintained by a third processing node of the storage system;

    obtaining fingerprints of data chunks stored in the first data container that corresponds to the first SID;

    based on a comparison between fingerprints of data chunks in the super-chunk and the obtained fingerprints to identify new data chunks whose signatures are not found in the obtained fingerprints;

    storing the new data chunks in a local buffer of the first processing node;

    selecting, according to a preset storage policy, a second data container of the storage system to write data in the local buffer;

    deriving a second SID for data of the local buffer;

    identifying, by the same way for identifying the second processing node, a fourth processing node of the storage system that corresponds to the second SID for data of the local buffer; and

    storing correspondence between the second SID for data of the local buffer and the second data container in the fourth processing node.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×