Virtual machine snapshot backup based on multilayer De-duplication

US 9,460,098 B2
Filed: 08/15/2013
Issued: 10/04/2016
Est. Priority Date: 08/15/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for virtual machine snapshot backup, comprising:

dividing a virtual machine snapshot of a virtual machine into one or more child data blocks;

dividing a respective child data block into one or more data segments;

applying multilayer de-duplication to the virtual machine snapshot;

periodically scanning the backup storage file system; and

based on one or more data repetition characteristics of data storage, extracting data whose repetition rate is higher than a preset threshold into a public data set.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure provides an example method and system for virtual machine backup based on multilayer de-duplication. A virtual machine snapshot is divided into multiple child data blocks. Each child data block is divided into multiple data segments. Multilayer de-duplication is applied to the virtual machine snapshot to exclude data causing duplicate backup in the virtual machine snapshot. The remaining virtual machine snapshot data after the processing of the multilayer de-duplication is stored.

36 Citations

View as Search Results

18 Claims

1. A computer-implemented method for virtual machine snapshot backup, comprising:
- dividing a virtual machine snapshot of a virtual machine into one or more child data blocks;
  
  dividing a respective child data block into one or more data segments;
  
  applying multilayer de-duplication to the virtual machine snapshot;
  
  periodically scanning the backup storage file system; and
  
  based on one or more data repetition characteristics of data storage, extracting data whose repetition rate is higher than a preset threshold into a public data set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The computer-implemented method as recited in claim 1, further comprising storing remaining data in the virtual machine snapshot after applying the multilayer de-duplication.
  - 3. The computer-implemented method as recited in claim 1, wherein the applying the multilayer de-duplication to the virtual machine snapshot comprises:
    - applying a child data block de-duplication to the virtual machine snapshot;
      
      applying a data segment de-duplication to the virtual machine snapshot; and
      
      applying a public data set de-duplication to the virtual machine snapshot.
  - 4. The computer-implemented method as recited in claim 3, wherein the applying the child data block de-duplication to the virtual machine snapshot comprises:
    - determining whether there has been any change of the respective child data block since a preceding backup;
      
      in response to determining that there has not been any change of the respective child data block since the preceding backup, excluding the respective child data block; and
      
      in response to determining that there has been a change of the respective child data block since the preceding backup, keeping the respective child data block.
  - 5. The computer-implemented method as recited in claim 3, wherein the applying the data segment de-duplication to the virtual machine snapshot comprises:
    - determining whether there has been any change of a respective data segment in the respective child data block that is remaining after applying the child data block de-duplication since a preceding backup;
      
      in response to determining that there has not been any change of the respective data segment, excluding the respective data segment; and
      
      in response to determining that there has been a change of the respective child data block since the preceding backup, keeping the respective data segment.
  - 6. The computer-implemented method as recited in claim 3, wherein the applying the public data set de-duplication to the virtual machine snapshot comprises:
    - comparing one or more characteristics of a respective data segment that is remaining after applying the data segment de-duplication with one or more characteristics of data in the public data set; and
      
      determining whether the respective data segment exists in the public data set;
      
      in response to determining that the respective data segments exists in a public data set, excluding the respective data segment; and
      
      in response to determining that the respective data segments does not exist in the public data set, keeping the respective data segment.
  - 7. The computer-implemented method as recited in claim 6, wherein the public data set stores one or more data segments whose repetition rates are higher than a preset threshold in a backup storage file system.
  - 8. The computer-implemented method as recited in claim 1, wherein dividing the respective child data block into one or more data segments comprises dividing the respective child data block into one or more data segments with variant lengths or sizes.
  - 9. The computer-implemented method as recited in claim 1, wherein the virtual machine snapshot comprises data fingerprints, sizes, and data points of the one or more child data blocks.
  - 10. The computer-implemented method as recited in claim 1, wherein the respective child data block comprises data fingerprints, sizes, and data points of the one or more data segments.
  - 11. The computer-implemented method as recited in claim 1, further comprising with respect to data excluded by the multilayer de-duplication, using an index of corresponding data in a preceding virtual machine snapshot in a backup of the virtual machine snapshot.

12. A computer-implemented method for virtual machine snapshot backup, comprising:
- dividing a virtual machine snapshot of a virtual machine into one or more child data blocks;
  
  dividing a respective child data block into one or more data segments;
  
  applying multilayer de-duplication to the virtual machine snapshot; and
  
  performing a rollover of the virtual machine snapshot, the performing including;
  
  reading an index of child data blocks from a backup storage file system according to an index of a to-be-rollover virtual machine snapshot;
  
  reading data segments according to the index of child data blocks;
  
  forming the read data segments into one or more child data blocks; and
  
  forming the formed one or more child data blocks into the to-be-rollover virtual machine snapshot.
- View Dependent Claims (13)
- - 13. The computer-implemented method as recited in claim 12, further comprising:
    - by reference to revision information of a current virtual machine mirror file and index information of the to-be-rollover virtual machine snapshot, determining common data in the current virtual machine mirror file and the to-be-rollover virtual machine snapshot is determined, the common data not being read from the backup storage file system.

14. A computer-implemented method for virtual machine snapshot backup, comprising:
- dividing a virtual machine snapshot of a virtual machine into one or more child data blocks;
  
  dividing a respective child data block into one or more data segments;
  
  applying multilayer de-duplication to the virtual machine snapshot; and
  
  performing deleting the virtual machine snapshot, the deleting including;
  
  writing deletion information of an index of a to-be-deleted virtual machine snapshot of the virtual machine into a log;
  
  when a volume of the deletion information in the log is larger than a preset threshold, scanning backup data of the virtual machine to find a child data block or a data segment that has not been referenced for a threshold period of time; and
  
  deleting the child data block or the data segment.

15. A system for virtual machine snapshot backup, comprising:
- a processor; and
  
  a memory coupled to the processor for storing computer programs to be executed by the processor, wherein the processor is configured to;
  
  divide a virtual machine snapshot of a virtual machine into one or more child data blocks and to divide a respective child data block into one or more data segments;
  
  apply multilayer de-duplication to the virtual machine snapshot;
  
  store remaining data in the virtual machine snapshot after applying the multilayer de-duplication;
  
  periodically scan the backup storage file system; and
  
  based on one or more data repetition characteristics of data storage, extract data whose repetition rate is higher than a preset threshold into the public data set.
- View Dependent Claims (16, 17, 18)
- - 16. The system as recited in claim 15, wherein the processor is further configured to:
    - determine whether there has been any change of the respective child data block since a preceding backup;
      
      in response to determining that there has not been any change of the respective child data block since the preceding backup, exclude the respective child data block; and
      
      in response to determining that there has been a change of the respective child data block since the preceding backup, keep the respective child data block.
  - 17. The system as recited in claim 15, wherein the processor is further configured to:
    - determine whether there has been any change of a respective data segment in the respective child data block that is remaining after processing of the child data block de-duplication module since a preceding backup;
      
      in response to determining that there has not been any change of the respective data segment, exclude the respective data segment; and
      
      in response to determining that there has been a change of the respective child data block since the preceding backup, keep the respective data segment.
  - 18. The system as recited in claim 15, wherein the processor is further configured to:
    - compare one or more characteristics of a respective data segment that is remaining after applying the data segment de-duplication with one or more characteristics of data in a public data set; and
      
      determine whether the respective data segment exists in the public data set;
      
      in response to determining that the respective data segments exists in a public data set, exclude the respective data segment; and
      
      in response to determining that the respective data segments does not exist in the public data set, keep the respective data segment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Alibaba Group Holding Ltd.
Original Assignee
Alibaba Group Holding Ltd.
Inventors
Zhang, Wei, Tang, Hong, Jiang, Hao, Zeng, Yue, Li, Xiaogang
Primary Examiner(s)
Mofiz, Apu
Assistant Examiner(s)
Nguyen, Cindy

Application Number

US13/967,939
Publication Number

US 20140052692A1
Time in Patent Office

1,146 Days
Field of Search

707/639
US Class Current

1/1
CPC Class Codes

G06F 11/1451   by selection of backup cont...

G06F 11/1453   using de-duplication of the...

G06F 11/1464   for networked environments

G06F 11/1466   to make the backup process ...

G06F 16/11   File system administration,...

G06F 16/128   Details of file system snap...

G06F 16/1748   De-duplication implemented ...

G06F 2201/80   Database-specific techniques

G06F 2201/805   Real-time

G06F 2201/815   Virtual middleware or OS fu...

G06F 2201/82   Solving problems relating t...

G06F 2201/84   Using snapshots, i.e. a log...

Virtual machine snapshot backup based on multilayer De-duplication

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

36 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

Virtual machine snapshot backup based on multilayer De-duplication

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others