Highly scalable and distributed data de-duplication
First Claim
Patent Images
1. A method of maintaining data blocks in a data storage system, the method comprising:
- maintaining a plurality of data blocks in the data storage system;
representing a plurality of data files by associating each file identifier of a plurality of file identifiers with at least one of the plurality of data blocks;
maintaining a first set of timestamps, each data block in the plurality of data blocks being associated with one of the first set of timestamps, each of the first set of timestamps indicating a time when a respective data block was verified to have been associated with at least one of the plurality of file identifiers;
maintaining a second set of timestamps, different than the first set of timestamps, each file identifier of the plurality of file identifiers being associated with one of the second set of timestamps, each of the second set of timestamps indicating a time when a respective file identifier was verified to have been associated with at least one of the plurality of data blocks; and
deleting a given data block when timestamp of the first set of timestamps associated with the given data block indicates an earlier time than each of the second set of timestamps.
5 Assignments
0 Petitions
Accused Products
Abstract
This disclosure relates to systems and methods for both maintaining referential integrity within a data storage system, and freeing unused storage in the system, without the need to maintain reference counts to the blocks of storage used to represent and store the data.
84 Citations
25 Claims
-
1. A method of maintaining data blocks in a data storage system, the method comprising:
-
maintaining a plurality of data blocks in the data storage system; representing a plurality of data files by associating each file identifier of a plurality of file identifiers with at least one of the plurality of data blocks; maintaining a first set of timestamps, each data block in the plurality of data blocks being associated with one of the first set of timestamps, each of the first set of timestamps indicating a time when a respective data block was verified to have been associated with at least one of the plurality of file identifiers; maintaining a second set of timestamps, different than the first set of timestamps, each file identifier of the plurality of file identifiers being associated with one of the second set of timestamps, each of the second set of timestamps indicating a time when a respective file identifier was verified to have been associated with at least one of the plurality of data blocks; and deleting a given data block when timestamp of the first set of timestamps associated with the given data block indicates an earlier time than each of the second set of timestamps. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A data storage system configured for maintaining data blocks, comprising:
-
a data store comprising software; and a processor in data communication with the data store and configured to execute the software in order to cause the data storage system to; maintain a plurality of data blocks; represent a plurality of data files by associating each file identifier of a plurality of file identifiers with at least one of the plurality of data blocks; maintain a first set of timestamps, each data block in the plurality of data blocks being associated with one of the first set of timestamps, each of the first set of timestamps indicating a time when a respective data block was verified to have been associated with at least one of the plurality of file identifiers; maintain a second set of timestamps, different than the first set of timestamps, each file identifier of the plurality of file identifiers being associated with one of the second set of timestamps, each of the second set of timestamps indicating a time when a respective file identifier was verified to have been associated with at least one of the plurality of data blocks; and delete a given data block when a timestamp of the first set of timestamps associated with the given data block indicates an earlier time than each of the second set of timestamps. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A non-transitory, computer-readable medium comprising computer-executable instructions that, when executed by a processor, cause a computing device to perform a method, the method comprising:
-
maintaining a plurality of data blocks in a data storage system; representing a plurality of data files by associating each file identifier of a plurality of file identifiers with at least one of the plurality of data blocks; maintaining a first set of timestamps, each data block in the plurality of data blocks being associated with one of the first set of timestamps, each of the first set of timestamps indicating a time when a respective data block was verified to have been associated with at least one of the plurality of file identifiers; maintaining a second set of timestamps, different than the first set of timestamps, each file identifier of the plurality of file identifiers being associated with one of the second set of timestamps, each of the second set of timestamps indicating a time when a respective file identifier was verified to have been associated with at least one of the plurality of data blocks; and deleting a given data block when a timestamp of the first set of time stamps associated with the given data block indicates an earlier time than each of the second set of timestamps. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
Specification