Deduplicating snapshots associated with a backup operation

US 9,990,156 B1
Filed: 06/13/2014
Issued: 06/05/2018
Est. Priority Date: 06/13/2014
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

a processor configured to;

receive an indication to perform a backup operation on a plurality of storage areas of a source system;

in response to the indication, perform the backup operation including by generating a plurality of snapshots corresponding to respective ones of the plurality of storage areas associated with the backup operation, wherein a snapshot corresponds to a point-in-time state of a corresponding storage area;

maintain, at the source system, deduplication data corresponding to one or more data blocks that have already been written to backup media during the backup operation, wherein the deduplication data comprises a plurality of identifiers corresponding to respective ones of data blocks that have already been written to backup media; and

use the deduplication data, at the source system, to deduplicate backup data across the plurality of snapshots associated with the backup operation, wherein to use the deduplication data comprises to compare, at the source system, an identifier associated with a data block to back up in a first snapshot of the plurality of snapshots to the plurality of identifiers,wherein in response to a first determination that a matching identifier is not found in the plurality of identifiers;

determine, at the source system, that the data block has not already been written to backup media;

send, from the source system, to a backup storage underlying data of the data block to be stored as an entry associated with the data block in the first snapshot at the backup media at the backup storage;

send, from the source system, to the backup storage a metadata block corresponding to the data block, wherein the metadata block is to be stored in the first snapshot at the backup media at the backup storage, wherein the metadata block is configured to be used to determine to which file or directory, or both, the data block belongs; and

update, at the source system, the deduplication data to include the identifier associated with the data block;

wherein in response to a second determination that the matching identifier is found in the plurality of identifiers;

determine, at the source system, that the data block has already been written to the backup media; and

send, from the source system, to the backup storage a representation of the data block to be stored as the entry associated with the data block in the first snapshot on the backup media at the backup storage, wherein the representation of the data block comprises associating data to a location at the backup media to which the data block was previously written, wherein the representation of the data block is determined based at least in part on information stored in the deduplication data, wherein the data block was previously written to the location at the backup media for a second snapshot of the plurality of snapshots; and

a memory coupled to the processor and configured to store the deduplication data.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Deduplicating snapshot associated with a backup operation is disclosed, including: performing a backup operation including by generating a plurality of snapshots; maintaining, at a source system, deduplication data corresponding to one or more data blocks that have already been written to backup media during the backup operation; and using the deduplication data to deduplicate backup data across the plurality of snapshots.

Citations

22 Claims

1. A system, comprising:
- a processor configured to;
  
  receive an indication to perform a backup operation on a plurality of storage areas of a source system;
  
  in response to the indication, perform the backup operation including by generating a plurality of snapshots corresponding to respective ones of the plurality of storage areas associated with the backup operation, wherein a snapshot corresponds to a point-in-time state of a corresponding storage area;
  
  maintain, at the source system, deduplication data corresponding to one or more data blocks that have already been written to backup media during the backup operation, wherein the deduplication data comprises a plurality of identifiers corresponding to respective ones of data blocks that have already been written to backup media; and
  
  use the deduplication data, at the source system, to deduplicate backup data across the plurality of snapshots associated with the backup operation, wherein to use the deduplication data comprises to compare, at the source system, an identifier associated with a data block to back up in a first snapshot of the plurality of snapshots to the plurality of identifiers,wherein in response to a first determination that a matching identifier is not found in the plurality of identifiers;
  
  determine, at the source system, that the data block has not already been written to backup media;
  
  send, from the source system, to a backup storage underlying data of the data block to be stored as an entry associated with the data block in the first snapshot at the backup media at the backup storage;
  
  send, from the source system, to the backup storage a metadata block corresponding to the data block, wherein the metadata block is to be stored in the first snapshot at the backup media at the backup storage, wherein the metadata block is configured to be used to determine to which file or directory, or both, the data block belongs; and
  
  update, at the source system, the deduplication data to include the identifier associated with the data block;
  
  wherein in response to a second determination that the matching identifier is found in the plurality of identifiers;
  
  determine, at the source system, that the data block has already been written to the backup media; and
  
  send, from the source system, to the backup storage a representation of the data block to be stored as the entry associated with the data block in the first snapshot on the backup media at the backup storage, wherein the representation of the data block comprises associating data to a location at the backup media to which the data block was previously written, wherein the representation of the data block is determined based at least in part on information stored in the deduplication data, wherein the data block was previously written to the location at the backup media for a second snapshot of the plurality of snapshots; and
  
  a memory coupled to the processor and configured to store the deduplication data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 18, 19)
- - 2. The system of claim 1, wherein the plurality of snapshots is configured to be stored at the backup media.
  - 3. The system of claim 1, wherein the identifier associated with the data block comprises a disk block number.
  - 4. The system of claim 1, wherein the backup operation comprises a full backup.
  - 5. The system of claim 1, wherein the backup operation comprises an incremental backup.
  - 6. The system of claim 1, wherein the deduplication data is configured to be deleted subsequent to completion of the backup operation.
  - 7. The system of claim 1, wherein the processor is further configured to restore the first snapshot, including by reading the entry associated with the data block included in the first snapshot:
    - in response to a first determination that the entry includes underlying data of the data block;
      
      restore stored data associated with the entry associated with the data block to the source system; and
      
      use the stored metadata block corresponding to the data block to determine to which file, directory, or both the data block belongs; and
      
      in response to a second determination that the entry associated with the data block includes the representation of the data block;
      
      use the representation included in the entry to locate the location at the backup media to restore data stored at the location at the backup media to the source system; and
      
      use the stored metadata block corresponding to the data block to determine to which file, directory, or both the data block belongs.
  - 8. The system of claim 1, wherein the representation of the data block comprises at least one of a hard link or a soft link.
  - 9. The system of claim 1, wherein the metadata block includes a snapshot identifier to identify its membership in a snapshot.
  - 18. The system of claim 1, wherein the metadata block comprises an inode.
  - 19. The system of claim 1, wherein the processor is configured to:
    - receive the location at the backup media to which the data block was previously written from the backup storage; and
      
      store the location at the backup media to which the data block was previously written from the backup storage in the deduplication data.

10. A method, comprising:
- receiving an indication to perform a backup operation on a plurality of storage areas of a source system;
  
  in response to the indication, performing the backup operation including by generating a plurality of snapshots corresponding to respective ones of the plurality of storage areas associated with the backup operation, wherein a snapshot corresponds to a point-in-time state of a corresponding storage area;
  
  maintaining, at the source system, deduplication data corresponding to one or more data blocks that have already been written to backup media during the backup operation, wherein the deduplication data comprises a plurality of identifiers corresponding to respective ones of data blocks that have already been written to backup media; and
  
  using the deduplication data, at the source system, to deduplicate backup data across the plurality of snapshots associated with the backup operation, wherein using the deduplication data comprises comparing, at the source system, an identifier associated with a data block to back up in a first snapshot of the plurality of snapshots to the plurality of identifiers,wherein in response to a first determination that a matching identifier is not found in the plurality of identifiers;
  
  determining, at the source system, that the data block has not already been written to backup media;
  
  sending, at the source system, to a backup storage underlying data of the data block to be stored as an entry associated with the data block in the first snapshot;
  
  sending, from the source system, to the backup storage a metadata block corresponding to the data block, wherein the metadata block is to be stored in the first snapshot at the backup media at the backup storage, wherein the metadata block is configured to be used to determine to which file or directory, or both, the data block belongs; and
  
  updating at the backup storage, the deduplication data to include the identifier associated with the data block;
  
  wherein in response to a second determination that the matching identifier is found in the plurality of identifiers;
  
  determining, at the source system, that the data block has already been written to the backup media; and
  
  sending, at the source system, to the backup storage a representation of the data block to be stored as the entry associated with the data block in the first snapshot on the backup media at the backup storage, wherein the representation of the data block comprises associating data to a location at the backup media to which the data block was previously written, wherein the representation of the data block is determined based at least in part on information stored in the deduplication data, wherein the data block was previously written to the location at the backup media for a second snapshot of the plurality of snapshots.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The method of claim 10, wherein the plurality of snapshots is configured to be stored at the backup media.
  - 12. The method of claim 10, wherein the identifier associated with the data block comprises a disk block number.
  - 13. The method of claim 10, wherein the deduplication data is configured to be deleted subsequent to completion of the backup operation.
  - 14. The method of claim 10, further comprising restoring the first snapshot including by reading the entry included in the first snapshot:
    - in response to a first determination that the entry includes underlying data of the data block;
      
      restoring stored data associated with the entry associated with the data block to the source system; and
      
      using the stored metadata block corresponding to the data block to determine to which file, directory, or both the data block belongs; and
      
      in response to a second determination that the entry associated with the data block includes the representation of the data block;
      
      using the representation included in the entry to locate the location at the backup media to restore data stored at the location at the backup media to the source system; and
      
      using the stored metadata block corresponding to the data block to determine to which file, directory, or both the data block belongs.
  - 15. The method of claim 10, wherein the representation of the data block comprises at least one of a hard link or a soft link.
  - 16. The method of claim 10, wherein the metadata block includes a snapshot identifier to identify its membership in a snapshot.

17. A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
- receiving an indication to perform a backup operation on a plurality of storage areas of a source system;
  
  in response to the indication, performing the backup operation including by generating a plurality of snapshots corresponding to respective ones of the plurality of storage areas associated with the backup operation, wherein a snapshot corresponds to a point-in-time state of a corresponding storage area;
  
  maintaining, at the source system, deduplication data corresponding to one or more data blocks that have already been written to backup media during the backup operation, wherein the deduplication data comprises a plurality of identifiers corresponding to respective ones of data blocks that have already been written to backup media; and
  
  using the deduplication data, at the source system, to deduplicate backup data across the plurality of snapshots associated with the backup operation, wherein using the deduplication data comprises comparing, at the source system, an identifier associated with a data block to back up in a first snapshot of the plurality of snapshots to the plurality of identifiers,wherein in response to a first determination that a matching identifier is not found in the plurality of identifiers;
  
  determining, at the source system, that the data block has not already been written to backup media;
  
  sending, from the source system, to a backup storage underlying data of the data block to be stored as an entry associated with the data block in the first snapshot at the backup storage;
  
  sending, from the source system, to the backup storage a metadata block corresponding to the data block, wherein the metadata block is to be stored in the first snapshot at the backup media at the backup storage, wherein the metadata block is configured to be used to determine to which file or directory, or both, the data block belongs; and
  
  updating, at the source system, the deduplication data to include the identifier associated with the data block;
  
  wherein in response to a second determination that the matching identifier is found in the plurality of identifiers;
  
  determining, at the source system, that the data block has already been written to the backup media; and
  
  sending, from the source system, to the backup storage a representation of the data block to be stored as the entry associated with the data block in the first snapshot on the backup media at the backup storage, wherein the representation of the data block comprises associating data to a location at the backup media to which the data block was previously written, wherein the representation of the data block is determined based at least in part on information stored in the deduplication data, wherein the data block was previously written to the location at the backup media for a second snapshot of the plurality of snapshots.
- View Dependent Claims (20, 21, 22)
- - 20. The computer program product of claim 17, further comprising restoring the first snapshot including by reading the entry included in the first snapshot:
    - in response to a first determination that the entry includes underlying data of the data block;
      
      restoring stored data associated with the entry associated with the data block to the source system; and
      
      using the stored metadata block corresponding to the data block to determine to which file, directory, or both the data block belongs; and
      
      in response to a second determination that the entry associated with the data block includes the representation of the data block;
      
      using the representation included in the entry to locate the location at the backup media to restore data stored at the location at the backup media to the source system; and
      
      using the stored metadata block corresponding to the data block to determine to which file, directory, or both the data block belongs.
  - 21. The computer program product of claim 17, wherein the representation of the data block comprises at least one of a hard link or a soft link.
  - 22. The computer program product of claim 17, wherein the metadata block includes a snapshot identifier to identify its membership in a snapshot.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.)
Original Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.)
Inventors
Kandamuthan, Nirmala
Primary Examiner(s)
Thammavong, Prasith
Assistant Examiner(s)
Kwong, Edmund H

Application Number

US14/304,616
Time in Patent Office

1,453 Days
Field of Search

711162
US Class Current
CPC Class Codes

G06F 11/1453   using de-duplication of the...

G06F 16/1748   De-duplication implemented ...

G06F 3/0608   Saving storage space on sto...

G06F 3/0641   De-duplication techniques

G06F 3/065   Replication mechanisms

G06F 3/0673   Single storage device

Deduplicating snapshots associated with a backup operation

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Deduplicating snapshots associated with a backup operation

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links