Self-referential deduplication
First Claim
1. A method comprising:
- deduplicating data, wherein the deduplicating comprises;
calculating a first value as a function of data in an nth data block of a backup copy comprising a number, z, of data blocks, where n is a number from 1 to z;
comparing the first value with each of a plurality of values in a plurality of entries, respectively, of a first data structure, the plurality of entries comprising a plurality of pointers, respectively, that identify a plurality of locations, respectively, where a plurality of data blocks, respectively, are physically stored in a storage system;
if the first value compares equally to a value contained in one of the plurality of entries of the first data structure, then a copy of the nth data block was previously stored in the storage system and as a result the method further comprises adding a pointer to the one entry to an nth entry of a second data structure, wherein the pointer identifies where the copy of the nth data block is stored in the storage system;
if the first value does not compare equally with any value contained in the plurality of entries of the first data structure, (1) storing the nth data block in the storage system, and (2) adding a first pointer to the nth entry of the second data structure, wherein the first pointer identifies where the nth data block is stored in the storage system;
storing the second data structure in the storage system after adding the pointer or the first pointer to the nth entry;
wherein the first pointer enables a computer system to retrieve the nth data block from the storage system without referencing the first data structure during data reflation, andwherein the pointer enables the computer system to retrieve the copy of the nth data block from the storage system without referencing the first data structure during reflation.
7 Assignments
0 Petitions
Accused Products
Abstract
A first value is calculated as a function of data in an nth data block of a backup copy. The first value is then compared with each of a plurality of values in a plurality of entries, respectively, of a first data structure. The plurality of entries in the first data structure include a plurality of pointers, respectively, that correspond to a plurality of data blocks, respectively, in a storage system. If the first value compares equally to a value contained in one of the plurality of entries of the first data structure, a pointer of the one entry is added to an nth entry of a second data structure. This pointer corresponds to a copy of the nth data block that is stored in the storage system. If the first value does not compare equally with any value contained in the plurality of entries of the first data structure, (1) the nth data block is stored in the storage system, and (2) a first pointer is added to the nth entry of the second data structure. The first pointer corresponds to the nth data block that is stored in the storage system. Eventually the second data structure is stored in the storage system after adding the pointer or the first pointer to the nth entry.
-
Citations
20 Claims
-
1. A method comprising:
deduplicating data, wherein the deduplicating comprises; calculating a first value as a function of data in an nth data block of a backup copy comprising a number, z, of data blocks, where n is a number from 1 to z; comparing the first value with each of a plurality of values in a plurality of entries, respectively, of a first data structure, the plurality of entries comprising a plurality of pointers, respectively, that identify a plurality of locations, respectively, where a plurality of data blocks, respectively, are physically stored in a storage system; if the first value compares equally to a value contained in one of the plurality of entries of the first data structure, then a copy of the nth data block was previously stored in the storage system and as a result the method further comprises adding a pointer to the one entry to an nth entry of a second data structure, wherein the pointer identifies where the copy of the nth data block is stored in the storage system; if the first value does not compare equally with any value contained in the plurality of entries of the first data structure, (1) storing the nth data block in the storage system, and (2) adding a first pointer to the nth entry of the second data structure, wherein the first pointer identifies where the nth data block is stored in the storage system; storing the second data structure in the storage system after adding the pointer or the first pointer to the nth entry; wherein the first pointer enables a computer system to retrieve the nth data block from the storage system without referencing the first data structure during data reflation, and wherein the pointer enables the computer system to retrieve the copy of the nth data block from the storage system without referencing the first data structure during reflation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
11. A non-transitory computer readable medium comprising program instructions, wherein the program instructions are executable by one or more processors to implement a method, the method comprising:
deduplicating data, wherein the deduplicating comprises; calculating a first value as a function of data in an nth data block of a backup copy that comprises a number, z, of data blocks, where n is a number from 1 to z; if the first value compares equally to a value contained in one of a plurality of entries of a first data structure, adding a pointer to the one entry to an nth entry of a second data structure, wherein the pointer identifies where a copy of the nth data block is physically stored in a storage system; if the first value does not compare equally with any value contained in the plurality of entries of the first data structure, (1) storing the nth data block in the storage system, and (2) adding a first pointer to the nth entry of the second data structure, wherein the first pointer identifies where the nth data block is physically stored in the storage system; storing the second data structure in the storage system after adding the pointer or the first pointer to the nth entry; wherein the first pointer enables a computer system to retrieve the nth data block from the storage system without referencing the first data structure during reflation, and wherein the pointer enables the computer system to retrieve the copy of the nth data block from the storage system without referencing the first data structure during reflation. - View Dependent Claims (12, 13, 14, 15)
-
16. An apparatus comprising:
-
a first computer system coupled to a second computer system via a communication link, wherein; the first computer system is configured to calculate a first value as a function of data in an nth data block of a backup copy that comprises a number, z, of data blocks, where n is a number from 1 to z; the second computer system is configured to compare the first value with each of a plurality of values in a plurality of entries, respectively, of a first data structure, the plurality of entries comprising a plurality of pointers, respectively, that identify a plurality of locations, respectively, where a plurality of data blocks, respectively, are physically stored in a storage system; if the first value compares equally to a value contained in one of the plurality of entries of the first data structure, the first computer system is configured to add a pointer to the one entry to an nth entry of a second data structure, wherein the pointer identifies where a copy of the nth data block is stored in the storage system; if the first value does not compare equally with any value contained in the plurality of entries of the first data structure, the first computer system is configured to (1) store the nth data block in the storage system, and (2) add a first pointer to the nth entry of the second data structure, wherein the first pointer identifies where the nth data block is stored in the storage system; the first computer system is configured to store the second data structure in the storage system after adding the pointer or the first pointer to the nth entry; wherein the first pointer enables a computer system to retrieve the nth data block from the storage system without referencing the first data structure during reflation, and wherein the pointer enables the computer system to retrieve the copy of the nth data block from the storage system without referencing the first data structure during reflation. - View Dependent Claims (17, 18, 19, 20)
-
Specification