Method and system for data backup

US 9,465,699 B2
Filed: 06/04/2013
Issued: 10/11/2016
Est. Priority Date: 07/30/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A method for data backup, wherein, there is original backup data and current data to be backed up, the method comprising:

performing first chunking on the current data by using the same chunking method as that used by the original backup data to obtain a current chunk, wherein the original backup data is a content defined chunking data;

calculating hash value of the current chunk, wherein a determination of whether a number of continuous matched chunks exceeds a threshold is determined, based on the calculated hash value, and wherein the threshold is a preset value, if number of continuous matched chunks exceeds the preset threshold, time for chunking the data is saved, and, wherein, then data blocks of current chunk of data is equal to matched block of the original backup data;

in response to a number of continuous matched chunks exceeding the threshold, the length of a data block that corresponds to an identifier of a next chunk of the matched chunk is acquired;

acquiring, from a hash value table of the original backup data, an identifier of a matched chunk whose hash value is the same as the calculated hash value of the current chunk, and incrementing number of continuous matched chunks, based on the exceeded threshold; and

clearing the number of continuous matched chunks, whereby, the hash value table of the original backup data, and the identifier of a matched chunk whose hash value is the same as the calculated hash value of the current chunk are returned are returned in response to the number of continuous matched chunks exceeding the threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a method, system, and computer program product for data backup, the method comprising: performing first chunking on current data by using the same chunking method as that used by original backup data to obtain a current chunk; calculating hash value of the current chunk; and acquiring, from a hash value table of the original backup data, an identifier of a matched chunk whose hash value is the same as the calculated hash value of the current chunk, and incrementing number of continuous matched chunks by one. Since the pertinence between original backup data and current data is maximally utilized, performance of de-duplication method can be efficiently improved.

29 Citations

5 Claims

1. A method for data backup, wherein, there is original backup data and current data to be backed up, the method comprising:
- performing first chunking on the current data by using the same chunking method as that used by the original backup data to obtain a current chunk, wherein the original backup data is a content defined chunking data;
  
  calculating hash value of the current chunk, wherein a determination of whether a number of continuous matched chunks exceeds a threshold is determined, based on the calculated hash value, and wherein the threshold is a preset value, if number of continuous matched chunks exceeds the preset threshold, time for chunking the data is saved, and, wherein, then data blocks of current chunk of data is equal to matched block of the original backup data;
  
  in response to a number of continuous matched chunks exceeding the threshold, the length of a data block that corresponds to an identifier of a next chunk of the matched chunk is acquired;
  
  acquiring, from a hash value table of the original backup data, an identifier of a matched chunk whose hash value is the same as the calculated hash value of the current chunk, and incrementing number of continuous matched chunks, based on the exceeded threshold; and
  
  clearing the number of continuous matched chunks, whereby, the hash value table of the original backup data, and the identifier of a matched chunk whose hash value is the same as the calculated hash value of the current chunk are returned are returned in response to the number of continuous matched chunks exceeding the threshold.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method according to claim 1, further comprising:
    - in response to not exceeding the threshold, continuing to perform second chunking on the current data by using the same chunking method as that used by the original backup data to obtain a new current chunk; and
      
      calculating hash value of the new current chunk.
  - 3. The method according to claim 2, further comprising:
    - acquiring hash value of a next chunk of the matched chunk from the hash value table of the original backup data;
      
      comparing the hash value of the new current chunk with the hash value of the next chunk of the matched chunk;
      
      in response to the hash value of the new current chunk being the same as the hash value of the next chunk of the matched chunk;
      
      incrementing number of continuous matched chunks by one;
      
      taking the next chunk of the matched chunk as a new matched chunk;
      
      returning to the step of determining whether the number of continuous matched chunks exceeds a threshold;
      
      in response to the hash value of the new current chunk being different from the hash value of the next chunk of the matched chunk;
      
      clearing the number of continuous matched chunks; and
      
      returning to the step of acquiring, from a hash value table of the original backup data, an identifier of a matched chunk whose hash value is the same as the calculated hash value of the current chunk.
  - 4. The method according to claim 1, further comprising:
    - determining whether the number of continuous matched chunks exceeds a threshold, and in response to exceeding the threshold; and
      
      acquiring length of a data block corresponding to an identifier of a next chunk of the matched chunk.
  - 5. The method according to claim 4, further comprising:
    - continuing to perform third chunking on the current data by using the acquired length of a data block corresponding to an identifier of a next chunk of the matched chunk to obtain a new current chunk;
      
      calculating hash value of the new current chunk;
      
      acquiring hash value of the next chunk of the matched chunk from a hash value table of the original backup data;
      
      comparing the hash value of the new current chunk with the hash value of the next chunk of the matched chunk;
      
      in response to the hash value of the new current chunk being the same as the hash value of the next chunk of the matched chunk;
      
      taking the next chunk of the matched chunk as a new matched chunk;
      
      returning to the step of acquiring an identifier of the next chunk of the matched chunk from a hash value table of the original backup data;
      
      in response to the hash value of the new current chunk being different from the hash value of the next chunk of the matched chunk;
      
      clearing the number of continuous matched chunks; and
      
      returning to the step of acquiring, from a hash value table of the original backup data, an identifier of a matched chunk whose hash value is the same as the calculated hash value of the current chunk.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Li, Ya J., Li, Yu M., Sisco, Michael G., Xiong, Yin X.
Primary Examiner(s)
Alam, Hosain
Assistant Examiner(s)
ALLEN, NICHOLAS E

Application Number

US13/909,370
Publication Number

US 20140032499A1
Time in Patent Office

1,225 Days
Field of Search

707/646
US Class Current

1/1
CPC Class Codes

G06F 11/1451   by selection of backup cont...

G06F 11/1453   using de-duplication of the...

G06F 11/1469   Backup restoration techniques

G06F 16/215   Improving data quality; Dat...

Method and system for data backup

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

29 Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for data backup

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links