MANAGING DELETIONS FROM A DEDUPLICATION DATABASE
First Claim
1. A method for removing information from a deduplication data store maintained in a secondary storage subsystem, the method comprising:
- identifying a first data block for removal from a data store of a secondary storage subsystem, the data store containing a set of data blocks including the first data block and corresponding to a set of files formed from the set of data blocks and stored in deduplicated fashion, the data store further including a set of data block entries, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least a deduplication signature corresponding to the respective data block and a reference count corresponding to the number of instances of the respective data block included in the set of files;
generating a modified version of a working copy of a first data block entry of the set of data block entries, the first data block entry associated with the first data block, the working copy being separate from the data store and residing in a local data store in memory local to the secondary storage computing device;
updating the data store based on the modified version of the working copy to include information sufficient to indicate that the first data block should be removed from the data store;
subsequent to said updating, querying the data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the data store, the group including the first data block; and
removing the group of one or more data blocks from the data store.
4 Assignments
0 Petitions
Accused Products
Abstract
An information management system can manage the removal of data block entries in a deduplicated data store using working copies of the data block entries residing in a local data store of a secondary storage computing device. The system can use the working copies to identify data blocks for removal. Once the deduplication database is updated with the changes to the working copies (e.g., using a transaction based update scheme), the system can query the deduplication database for the database entries identified for removal. Once identified, the system can remove the database entries identified for pruning and/or the corresponding deduplication data blocks from secondary storage.
-
Citations
20 Claims
-
1. A method for removing information from a deduplication data store maintained in a secondary storage subsystem, the method comprising:
-
identifying a first data block for removal from a data store of a secondary storage subsystem, the data store containing a set of data blocks including the first data block and corresponding to a set of files formed from the set of data blocks and stored in deduplicated fashion, the data store further including a set of data block entries, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least a deduplication signature corresponding to the respective data block and a reference count corresponding to the number of instances of the respective data block included in the set of files; generating a modified version of a working copy of a first data block entry of the set of data block entries, the first data block entry associated with the first data block, the working copy being separate from the data store and residing in a local data store in memory local to the secondary storage computing device; updating the data store based on the modified version of the working copy to include information sufficient to indicate that the first data block should be removed from the data store; subsequent to said updating, querying the data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the data store, the group including the first data block; and removing the group of one or more data blocks from the data store. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for pruning a deduplication database, comprising:
-
a data store contained in one or more storage devices of a secondary storage subsystem, the data store including a set of data blocks corresponding to a set of files formed from the set of data blocks, the set of files stored in deduplicated fashion, the data store further including a set of data block entries, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least a deduplication signature corresponding to the respective data block and a reference count corresponding to the number of instances of the respective data block included in the set of files; and a secondary storage computing device residing in the secondary storage subsystem and comprising a local data store that is separate from the deduplicated data store and resides in memory local to the secondary storage computing device, the secondary storage computing device further comprising computer hardware configured to; identify a first data block for removal from the data store; generate a modified version of a working copy contained in the local data store that corresponds to a first data block entry of the set of data block entries, the first data block entry associated with the first data block; update the data store based on the modified version of the working copy to include information sufficient to indicate that the first data block should be removed from the data store; query the data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the data store, the group including the first data block; and remove the group of one or more data blocks from the data store. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification