Managing deletions from a deduplication database
First Claim
1. A method for removing information from a deduplication data store maintained in a secondary storage subsystem, the method comprising:
- reviewing a plurality of working copies of data block entries residing in memory local to a secondary storage computing device to identify a first data block entry and a first data block corresponding to the first data block entry associated with a secondary storage operation, each of the plurality of working copies of data block entries corresponding to a data block entry stored in a first data store of a secondary storage subsystem that is distinct from the memory local to the secondary storage computing device,the first data block being stored in a second data store of the secondary storage subsystem, the second data store storing a set of data blocks including the first data block and corresponding to a set of files formed from the set of data blocks and stored in deduplicated fashion,the first data block entry being stored in the first data store of the secondary storage subsystem, the first data store storing a set of data block entries including the first data block entry, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least a deduplication signature corresponding to the respective data block and a reference count corresponding to a number of instances of the respective data block included in the set of files;
modifying a working copy of the first data block entry residing in the memory local to the secondary storage computing device and corresponding to the first data block entry and the first data block;
updating the first data block entry stored in the first data store based on the modified working copy to indicate that the first data block should be removed from the second data store;
subsequent to said updating, querying the first data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the second data store, the group including the first data block;
removing the group of one or more data blocks from the second data store; and
removing a group of one or more data block entries that correspond to the group of one or more data blocks from the first data store.
4 Assignments
0 Petitions
Accused Products
Abstract
An information management system can manage the removal of data block entries in a deduplicated data store using working copies of the data block entries residing in a local data store of a secondary storage computing device. The system can use the working copies to identify data blocks for removal. Once the deduplication database is updated with the changes to the working copies (e.g., using a transaction based update scheme), the system can query the deduplication database for the database entries identified for removal. Once identified, the system can remove the database entries identified for pruning and/or the corresponding deduplication data blocks from secondary storage.
594 Citations
15 Claims
-
1. A method for removing information from a deduplication data store maintained in a secondary storage subsystem, the method comprising:
-
reviewing a plurality of working copies of data block entries residing in memory local to a secondary storage computing device to identify a first data block entry and a first data block corresponding to the first data block entry associated with a secondary storage operation, each of the plurality of working copies of data block entries corresponding to a data block entry stored in a first data store of a secondary storage subsystem that is distinct from the memory local to the secondary storage computing device, the first data block being stored in a second data store of the secondary storage subsystem, the second data store storing a set of data blocks including the first data block and corresponding to a set of files formed from the set of data blocks and stored in deduplicated fashion, the first data block entry being stored in the first data store of the secondary storage subsystem, the first data store storing a set of data block entries including the first data block entry, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least a deduplication signature corresponding to the respective data block and a reference count corresponding to a number of instances of the respective data block included in the set of files; modifying a working copy of the first data block entry residing in the memory local to the secondary storage computing device and corresponding to the first data block entry and the first data block; updating the first data block entry stored in the first data store based on the modified working copy to indicate that the first data block should be removed from the second data store; subsequent to said updating, querying the first data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the second data store, the group including the first data block; removing the group of one or more data blocks from the second data store; and removing a group of one or more data block entries that correspond to the group of one or more data blocks from the first data store. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for pruning a deduplication database, comprising:
-
a data block data store contained in one or more storage devices of a secondary storage subsystem, the data block data store storing a set of data blocks corresponding to a set of files formed from the set of data blocks, the set of files stored in deduplicated fashion; a deduplication data store storing a set of data block entries, each entry of the set of data block entries corresponding to a respective data block of the set of data blocks and comprising at least a deduplication signature corresponding to the respective data block and a reference count corresponding to a number of instances of the respective data block included in the set of files; and a secondary storage computing device residing in the secondary storage subsystem and comprising a local data store that is separate from the deduplication data store and resides in memory local to the secondary storage computing device, the local data store storing working copies of at least a subset of the set of data block entries, the secondary storage computing device further comprising computer hardware configured to; review the working copies of the at least a subset of the set of data block entries to identify a first data block entry and a first data block corresponding to the first data block entry that are associated with a secondary storage operation, the first data block entry being stored in the deduplication data store and the first data block being stored in the data block data store; modify a working copy of the first data block entry stored on the local data store that corresponds to the first data block entry and the first data block; cause the deduplication data store to be updated based on the modified working copy to indicate that the first data block is to be removed from the data block data store; query the deduplication data store to identify a group of one or more data blocks in the set of data blocks that are to be removed from the data block data store, the group including the first data block; cause the group of one or more data blocks to be removed from data the data store; and cause a group of one or more data block entries that correspond to the group of one or more data blocks to be removed from the deduplication data store. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification