Multitier deduplication systems and methods
First Claim
1. A system for deduplicating a backup archive, the system comprising:
- a computer system comprising computer hardware, the computer system programmed to implement;
a deduplication module configured to;
access one or more block directories associated with one or more backup archives at an archive data store, wherein the one or more block directories include fingerprints of data blocks associated with the one or more backup archives, and wherein at least one of the one or more backup archives is associated with a data store;
create a composite block map based at least in part on the one or more block directories, wherein the composite block map includes fingerprints of each data block stored at the archive data store; and
access one or more data blocks from the data store and for each of the one or more data blocks, the deduplication module is further configured to;
create a fingerprint for the data block;
determine whether the fingerprint exists in the composite block map of the archive data store;
in response to determining that the fingerprint does not exist in the composite block map, determine whether the fingerprint exists in a global deduplication data store, wherein the global deduplication data store is separate from the archive data store and the composite block map; and
in response to determining that the fingerprint does not exist in the global deduplication data store, identify the data block for backup storage; and
a backup module configured to;
backup each of the data blocks identified for backup storage as a target archive at the archive data store; and
store the fingerprint associated with each of the data blocks identified for backup storage at a target block directory associated with the target archive.
24 Assignments
0 Petitions
Accused Products
Abstract
Multitier deduplication can reduce the amount of bandwidth and storage resources used during deduplication. In certain embodiments, the system can determine if a data block is stored in a first archive data storage. If so, the system can skip the data block. If not, the system can determine if the data block is stored or identified in a second archive data storage. In various implementations, the first archive data storage can be local to the system and the second archive data storage can be a global archive that may be remote from the system. The system can create a map of a plurality of backups stored at the first archive enabling the system to quickly check multiple archives. The multitier data deduplication can filter out inactive data blocks during or before performing the deduplication process.
-
Citations
20 Claims
-
1. A system for deduplicating a backup archive, the system comprising:
a computer system comprising computer hardware, the computer system programmed to implement; a deduplication module configured to; access one or more block directories associated with one or more backup archives at an archive data store, wherein the one or more block directories include fingerprints of data blocks associated with the one or more backup archives, and wherein at least one of the one or more backup archives is associated with a data store; create a composite block map based at least in part on the one or more block directories, wherein the composite block map includes fingerprints of each data block stored at the archive data store; and access one or more data blocks from the data store and for each of the one or more data blocks, the deduplication module is further configured to; create a fingerprint for the data block; determine whether the fingerprint exists in the composite block map of the archive data store; in response to determining that the fingerprint does not exist in the composite block map, determine whether the fingerprint exists in a global deduplication data store, wherein the global deduplication data store is separate from the archive data store and the composite block map; and in response to determining that the fingerprint does not exist in the global deduplication data store, identify the data block for backup storage; and a backup module configured to; backup each of the data blocks identified for backup storage as a target archive at the archive data store; and store the fingerprint associated with each of the data blocks identified for backup storage at a target block directory associated with the target archive. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A method of deduplicating a backup archive, the method comprising:
by a computer system comprising computer hardware; accessing one or more block directories associated with one or more archives at an archive data store, wherein the one or more block directories include fingerprints of data blocks associated with the one or more archives, and wherein at least one of the one or more archives is associated with a data store; creating a composite block map based at least in part on the one or more block directories, wherein the composite block map includes fingerprints of each data block stored at the archive data store; and accessing one or more data blocks from the data store and for each of the one or more data blocks; creating a fingerprint for the data block; determining whether the fingerprint exists in the composite block map of the archive data store; in response to determining that the fingerprint does not exist in the composite block map, determining whether the fingerprint exists in a global deduplication data store, wherein the global deduplication data store is separate from the archive data store and the composite block map; in response to determining that the fingerprint does not exist in the global deduplication data store, backing up the data block as part of a target archive at the archive data store; and storing the fingerprint associated with the data block storage at a target block directory associated with the target archive. - View Dependent Claims (10, 11, 12, 13, 14)
-
15. A non-transitory computer-readable storage medium comprising computer-executable instructions configured to implement a method of deduplicating a backup archive, the method comprising:
accessing one or more block directories associated with one or more archives at a first archive data store, wherein the one or more block directories include fingerprints of data blocks associated with the one or more archives, and wherein at least one of the one or more archives is associated with a data store; creating a composite block map based at least in part on the one or more block directories, wherein the composite block map includes fingerprints of each data block stored at the first archive data store; and accessing one or more data blocks from the data store and for each of the one or more data blocks; creating a fingerprint for the data block; determining whether the fingerprint exists in the composite block map of the first archive data store; in response to determining that the fingerprint does not exist in the composite block map, determining whether the fingerprint exists in a second archive data store, wherein the second archive data store is separate from the first archive data store and the composite block map; in response to determining that the fingerprint does not exist in the second archive data store, backing up the data block as part of a target archive at the first archive data store; and storing the fingerprint associated with the data block storage at a target block directory associated with the target archive. - View Dependent Claims (16, 17, 18, 19, 20)
Specification