Managing deletions from a deduplication database
First Claim
1. A non-transitory computer-readable medium comprising instructions, which when executed by a computing device comprising one or more processors and computer memory, cause the computing device to perform a method for removing information from a deduplication data store, the method comprising:
- reviewing working copies of data block entries residing in memory local to the computing device to identify a first data block entry and a first data block corresponding to the first data block entry associated with a secondary storage operation,wherein each of the working copies of data block entries corresponds to a respective data block entry stored in a first data store of a secondary storage subsystem that is distinct from the memory local to the computing device;
the first data block being stored in a second data store of the secondary storage subsystem, the second data store storing a set of data blocks including the first data block,wherein a set of files formed from the set of data blocks are stored in deduplicated fashion;
wherein the first data store comprises a set of data block entries including the first data block entry, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least;
(i) a deduplication signature for the respective data block, and (ii) a reference count for a number of instances of the respective data block included in the set of files;
modifying a working copy of the first data block entry residing in the memory local to the computing device and corresponding to the first data block entry and the first data block;
updating the first data block entry stored in the first data store based on the modified working copy to indicate that the first data block should be removed from the second data store;
after the updating, querying the first data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the second data store, the group including the first data block;
removing the group of one or more data blocks from the second data store; and
removing from the first data store one or more data block entries that correspond to the group of one or more data blocks removed from the second data store.
2 Assignments
0 Petitions
Accused Products
Abstract
An information management system can manage the removal of data block entries in a deduplicated data store using working copies of the data block entries residing in a local data store of a secondary storage computing device. The system can use the working copies to identify data blocks for removal. Once the deduplication database is updated with the changes to the working copies (e.g., using a transaction based update scheme), the system can query the deduplication database for the database entries identified for removal. Once identified, the system can remove the database entries identified for pruning and/or the corresponding deduplication data blocks from secondary storage.
588 Citations
20 Claims
-
1. A non-transitory computer-readable medium comprising instructions, which when executed by a computing device comprising one or more processors and computer memory, cause the computing device to perform a method for removing information from a deduplication data store, the method comprising:
-
reviewing working copies of data block entries residing in memory local to the computing device to identify a first data block entry and a first data block corresponding to the first data block entry associated with a secondary storage operation, wherein each of the working copies of data block entries corresponds to a respective data block entry stored in a first data store of a secondary storage subsystem that is distinct from the memory local to the computing device; the first data block being stored in a second data store of the secondary storage subsystem, the second data store storing a set of data blocks including the first data block, wherein a set of files formed from the set of data blocks are stored in deduplicated fashion; wherein the first data store comprises a set of data block entries including the first data block entry, each entry in the set of data block entries corresponding to a respective data block in the set of data blocks and comprising at least;
(i) a deduplication signature for the respective data block, and (ii) a reference count for a number of instances of the respective data block included in the set of files;modifying a working copy of the first data block entry residing in the memory local to the computing device and corresponding to the first data block entry and the first data block; updating the first data block entry stored in the first data store based on the modified working copy to indicate that the first data block should be removed from the second data store; after the updating, querying the first data store to identify a group of one or more data blocks in the set of data blocks that should be removed from the second data store, the group including the first data block; removing the group of one or more data blocks from the second data store; and removing from the first data store one or more data block entries that correspond to the group of one or more data blocks removed from the second data store. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An information management system for pruning a deduplication database, the system comprising:
-
a data block data store contained in one or more storage devices of the information management system, wherein the data block data store stores a set of data blocks that form a set of files stored in deduplicated fashion; a deduplication data store storing a set of data block entries, wherein each entry in the set of data block entries corresponds to a respective data block in the set of data blocks and comprises at least;
(i) a deduplication signature for the respective data block, and (ii) a reference count for a number of instances of the respective data block that are included in the set of files;a computing device residing in the information management system and comprising a local data store that is separate from the deduplicated data store and resides in memory local to the computing device, wherein the local data store stores working copies of at least a subset of the set of data block entries, and wherein the computing device further comprises computer processing hardware; and wherein the computing device is configured to; identify within the working copies a first data block entry and a first data block corresponding to the first data block entry that are associated with a secondary storage operation, wherein the first data block entry is stored in the deduplication data store, and wherein the first data block is stored in the data block data store, modify a working copy of the first data block entry stored on the local data store that corresponds to the first data block entry and the first data block, based on the modified working copy, cause the deduplication data store to indicate that the first data block is to be removed from the data block data store, identify, based on a query of the deduplication data store, a group of one or more data blocks in the set of data blocks that are to be removed from the data block data store, wherein the group includes the first data block, cause the group of one or more data blocks to be removed from the data block data store, and cause one or more data block entries that correspond to the group of one or more data blocks to be removed from the deduplication data store. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A non-transitory computer-readable medium comprising instructions, which when executed by a computing device comprising one or more processors and computer memory, cause the computing device to perform a method for removing information from a deduplication data store, the method comprising:
-
identifying within working copies a first data block entry and a first data block corresponding to the first data block entry that are associated with a secondary storage operation, wherein the first data block entry is stored in a deduplication data store, and wherein the first data block is stored in a data block data store; wherein the data block data store is contained in one or more storage devices of an information management system, and wherein the data block data store stores a set of data blocks that form a set of files stored in deduplicated fashion; wherein the deduplication data store stores a set of data block entries, wherein each entry in the set of data block entries corresponds to a respective data block in the set of data blocks, including the first data block, and comprises at least;
a deduplication signature for the respective data block and a reference count for a number of instances of the respective data block that are included in the set of files;modifying a first working copy of the first data block entry stored on the local data store that corresponds to the first data block entry and the first data block; based on the modified first working copy, causing the deduplication data store to indicate that the first data block is to be removed from the data block data store; identifying, based on a query of the deduplication data store, a group of one or more data blocks in the set of data blocks that are to be removed from the data block data store, wherein the group includes the first data block; causing the group of one or more data blocks to be removed from the data block data store; and causing one or more data block entries that correspond to the group of one or more data blocks to be removed from the deduplication data store. - View Dependent Claims (19, 20)
-
Specification