MANAGING DELETIONS FROM A DEDUPLICATION DATABASE

US 20150261776A1
Filed: 03/17/2014
Published: 09/17/2015
Est. Priority Date: 03/17/2014
Status: Active Grant

First Claim

Patent Images

1. A method for removing information from a deduplication data store maintained in a secondary storage subsystem, the method comprising:

identifying a first data block for removal from a data store of a secondary storage subsystem, the data store containing a set of data blocks including the first data block and corresponding to a set of files formed from the set of data blocks and stored in deduplicated fashion, the data store further including a set of data block entries, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least a deduplication signature corresponding to the respective data block and a reference count corresponding to the number of instances of the respective data block included in the set of files;

generating a modified version of a working copy of a first data block entry of the set of data block entries, the first data block entry associated with the first data block, the working copy being separate from the data store and residing in a local data store in memory local to the secondary storage computing device;

updating the data store based on the modified version of the working copy to include information sufficient to indicate that the first data block should be removed from the data store;

subsequent to said updating, querying the data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the data store, the group including the first data block; and

removing the group of one or more data blocks from the data store.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information management system can manage the removal of data block entries in a deduplicated data store using working copies of the data block entries residing in a local data store of a secondary storage computing device. The system can use the working copies to identify data blocks for removal. Once the deduplication database is updated with the changes to the working copies (e.g., using a transaction based update scheme), the system can query the deduplication database for the database entries identified for removal. Once identified, the system can remove the database entries identified for pruning and/or the corresponding deduplication data blocks from secondary storage.

Citations

20 Claims

1. A method for removing information from a deduplication data store maintained in a secondary storage subsystem, the method comprising:
- identifying a first data block for removal from a data store of a secondary storage subsystem, the data store containing a set of data blocks including the first data block and corresponding to a set of files formed from the set of data blocks and stored in deduplicated fashion, the data store further including a set of data block entries, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least a deduplication signature corresponding to the respective data block and a reference count corresponding to the number of instances of the respective data block included in the set of files;
  
  generating a modified version of a working copy of a first data block entry of the set of data block entries, the first data block entry associated with the first data block, the working copy being separate from the data store and residing in a local data store in memory local to the secondary storage computing device;
  
  updating the data store based on the modified version of the working copy to include information sufficient to indicate that the first data block should be removed from the data store;
  
  subsequent to said updating, querying the data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the data store, the group including the first data block; and
  
  removing the group of one or more data blocks from the data store.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein said updating comprises updating the value of the reference count of the working copy of the first data block entry to generate a modified reference count value.
  - 3. The method of claim 2, wherein said querying comprises identifying the first data block for removal from the data store based on the modified reference count value indicating that the number of instances of the first data block is below a threshold value.
  - 4. The method of claim 3, wherein the threshold value is one and said updating comprises decrementing the reference count of the first data block entry from one to zero.
  - 5. The method of claim 1, wherein said updating comprises setting a flag of the working copy of the first data block entry.
  - 6. The method of claim 1, wherein said updating is initiated based on a detection of at least one:
    - of expiration of a time threshold since a previous update to the data store based on the contents of the local data store; and
      
      a size threshold of the local data store being exceeded.
  - 7. The method of claim 1, wherein said updating comprises merging the modified version of the working copy with the first data block entry contained in the data store.
  - 8. The method of claim 1, wherein the local data store includes a plurality of working copies including the working copy corresponding to the first data block entry and a plurality of additional working copies corresponding to additional data block entries in the set of data block entries, and wherein said updating comprises merging the plurality of working copies with the corresponding plurality of data block entries in the set of data block entries.
  - 9. The method of claim 1, wherein said removing the group of one or more data blocks from the data store comprises, for at least one data block of the data blocks in the group, removing a copy of the data block from the data store and additionally removing one or more pointers to the copy of the data block from the data store.
  - 10. The method of claim 1, further comprising, subsequent to said querying, removing at least a portion the first data block entry from the data store.

11. A system for pruning a deduplication database, comprising:
- a data store contained in one or more storage devices of a secondary storage subsystem, the data store including a set of data blocks corresponding to a set of files formed from the set of data blocks, the set of files stored in deduplicated fashion, the data store further including a set of data block entries, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least a deduplication signature corresponding to the respective data block and a reference count corresponding to the number of instances of the respective data block included in the set of files; and
  
  a secondary storage computing device residing in the secondary storage subsystem and comprising a local data store that is separate from the deduplicated data store and resides in memory local to the secondary storage computing device, the secondary storage computing device further comprising computer hardware configured to;
  
  identify a first data block for removal from the data store;
  
  generate a modified version of a working copy contained in the local data store that corresponds to a first data block entry of the set of data block entries, the first data block entry associated with the first data block;
  
  update the data store based on the modified version of the working copy to include information sufficient to indicate that the first data block should be removed from the data store;
  
  query the data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the data store, the group including the first data block; and
  
  remove the group of one or more data blocks from the data store.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11, wherein the secondary storage computing device is configured to modify the value of the reference count of the working copy of the first data block entry to generate a modified reference count value.
  - 13. The system of claim 12, wherein the secondary storage computing device is configured to identify the first data block for removal from the data store based on the modified reference count value indicating that the number of instances of the first data block is below a threshold value.
  - 14. The system of claim 13, wherein the threshold value is one.
  - 15. The system of claim 11, wherein the secondary storage computing device is configured to set a flag of the working copy of the first data block entry as part of the modification of the working copy.
  - 16. The system of claim 11, wherein the update of the data store is initiated based on detection of at least one of:
    - expiration of a time threshold since a previous update of the data store; and
      
      a size threshold of the local data store being exceeded.
  - 17. The system of claim 11, wherein the secondary storage computing device is configured to merge the contents of the working copy of the first data block entry with the first data block entry contained in the data store to perform the update of the data store.
  - 18. The system of claim 11, wherein said local data store includes a plurality of working copies including the working copy corresponding to the first data block entry and a plurality of additional working copies corresponding to additional data block entries in the set of data block entries, and wherein the secondary storage computing device is configured to update each of the plurality of working copies during the update.
  - 19. The system of claim 11, wherein for at least one data block of the data blocks in the group, the secondary storage computing device is configured to remove a copy of the data block from the data store and additionally remove one or more pointers to the copy of the data block.
  - 20. The system of claim 11, wherein the secondary storage computing device is further configured, subsequent to said querying, to remove at least a portion the first database entry from the data store.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CommVault Systems Incorporated
Original Assignee
CommVault Systems Incorporated
Inventors
Attarde, Deepak Raghunath, Vijayan, Manoj Kumar

Granted Patent

US 10,380,072 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 11/1453   using de-duplication of the...

G06F 16/1748   De-duplication implemented ...

G06F 3/0641   De-duplication techniques

MANAGING DELETIONS FROM A DEDUPLICATION DATABASE

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

MANAGING DELETIONS FROM A DEDUPLICATION DATABASE

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links