System and method for real-time deduplication utilizing an electronic storage medium
First Claim
Patent Images
1. A method comprising:
- chunking a data set, received at a storage system, into at least one block;
generating a signature of one of the at least one blocks;
determining if the generated signature is stored in a signature database in an electronic storage medium of the storage system;
in response to determining that the generated signature is stored in the signature database, replacing the block associated with the generated signature with a pointer to a previously stored block associated with the signature stored in the signature database;
setting one of a lowest and a highest clear bit in a bit field associated with a counter in a block reference counter file stored in the electronic storage medium to indicate that an additional reference to the block has been made; and
setting one of a lowest and a highest clear bit in the bit field to indicate that a reference to the block has been deleted, whereby deduplication of the data set occurs in real time.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a system and method for eliminating duplicate data (de-duplication) in substantially real time using an electronic storage medium.
41 Citations
17 Claims
-
1. A method comprising:
-
chunking a data set, received at a storage system, into at least one block; generating a signature of one of the at least one blocks; determining if the generated signature is stored in a signature database in an electronic storage medium of the storage system; in response to determining that the generated signature is stored in the signature database, replacing the block associated with the generated signature with a pointer to a previously stored block associated with the signature stored in the signature database; setting one of a lowest and a highest clear bit in a bit field associated with a counter in a block reference counter file stored in the electronic storage medium to indicate that an additional reference to the block has been made; and setting one of a lowest and a highest clear bit in the bit field to indicate that a reference to the block has been deleted, whereby deduplication of the data set occurs in real time. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for real time deduplication of a data stream comprising:
-
a network; a storage system coupled to the network; and a virtual tape library coupled to the network and configured to receive the data stream transmitted over the network from the storage system in accordance with a backup operation, the virtual tape library further configured to generate a signature for each block of the data stream and to determine whether the generated signature is stored in a signature database within an electronic storage medium of the virtual tape library; wherein the signature database comprises a plurality of entries each of which is associated with one signature, each entry comprising a counter field and a bit field wherein the counter field is utilized to store a count of a number of references to a block associated with the signature associated with the entry and wherein one of a lowest and a highest clear bit of the bit field is set to record an increase to the number of references since a last repack operation and one of a lowest and a highest clear bit is set to record a decrease to the number of references since the last repack operation. - View Dependent Claims (12)
-
-
13. A method comprising:
-
in response to adding a reference to a previously stored block setting one of a lowest and a highest clear bit in a bit field of an entry associated with a signature for the previously stored block and in response to removing a reference to the previously stored blocksetting one of a lowest and a highest clear bit in the bit field of the entry associated with the signature for the previously stored block, the entry stored in a signature database in an electronic storage medium of the storage system; determining whether a last clear bit in the bit field is set; and in response to determining that the last clear bit in the bit field is set, performing a repack operation. - View Dependent Claims (14)
-
-
15. A computer program product for real time data deduplication of a data set comprising:
-
computer code, configured for execution on a storage system, that chunks the data set into at least one block; computer code, configured for execution on a storage system, that generates a signature of one of the at least one blocks; computer code, configured for execution on a storage system, that determines if the generated signature is stored in a signature database in an electronic storage medium of the storage system; computer code, configured for execution on a storage system, that, in response to determining that the generated signature is stored in the signature database, replaces the block associated with the generated signature with a pointer to a previously stored block associated with the signature stored in the signature database; computer code that sets one of a lowest and a highest clear bit in a bit field associated with a counter in a block reference counter file to indicate that an additional reference to the block has been made; computer code that sets one of a lowest and a highest clear bit in the bit field to indicate that a reference to the block has been deleted; and a non-transitory computer readable medium that stores the computer codes. - View Dependent Claims (16, 17)
-
Specification