Highly Scalable and Distributed Data De-Duplication
First Claim
Patent Images
1. A method comprising:
- partitioning, in a data storage system, each of a plurality of instances of digital data into a respective plurality of blocks, wherein each instance of digital data is represented by a file identifier, the file identifier referencing each of the respective plurality of blocks; and
maintaining a last-reference-check timestamp for each of the blocks within each of the pluralities of blocks such that each last-reference-check timestamp indicates a last time, if ever, the block was validated to confirm that the block was referenced within the system;
maintaining a last-validation timestamp for each file identifier such that each last-validation timestamp indicates when, if ever, each block referenced by the file identifier had been validated to confirm that the file identifier referenced the respective block;
removing a block from the data storage system when the last-reference-check timestamp associated with the block is earlier than the earliest last-validation timestamp in the system.
5 Assignments
0 Petitions
Accused Products
Abstract
This disclosure relates to systems and methods for both maintaining referential integrity within a data storage system, and freeing unused storage in the system, without the need to maintain reference counts to the blocks of storage used to represent and store the data.
123 Citations
24 Claims
-
1. A method comprising:
-
partitioning, in a data storage system, each of a plurality of instances of digital data into a respective plurality of blocks, wherein each instance of digital data is represented by a file identifier, the file identifier referencing each of the respective plurality of blocks; and maintaining a last-reference-check timestamp for each of the blocks within each of the pluralities of blocks such that each last-reference-check timestamp indicates a last time, if ever, the block was validated to confirm that the block was referenced within the system; maintaining a last-validation timestamp for each file identifier such that each last-validation timestamp indicates when, if ever, each block referenced by the file identifier had been validated to confirm that the file identifier referenced the respective block; removing a block from the data storage system when the last-reference-check timestamp associated with the block is earlier than the earliest last-validation timestamp in the system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a memory capable of storing data; and a processor configured for; partitioning, in a data storage system, each of a plurality of instances of digital data into a respective plurality of blocks, wherein each instance of digital data is represented by a file identifier, the file identifier referencing each of the respective plurality of blocks; and maintaining a last-reference-check timestamp for each of the blocks within each of the pluralities of blocks such that each last-reference-check timestamp indicates a last time, if ever, the block was validated to confirm that the block was referenced within the system; maintaining a last-validation timestamp for each file identifier such that each last-validation timestamp indicates when, if ever, each block referenced by the file identifier had been validated to confirm that the file identifier referenced the respective block; removing a block from the data storage system when the last-reference-check timestamp associated with the block is earlier than the earliest last-validation timestamp in the system. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. Logic encoded in one or more non-transient media that includes code for execution and when executed by a processor is operable to perform operations comprising:
-
partitioning, in a data storage system, each of a plurality of instances of digital data into a respective plurality of blocks, wherein each instance of digital data is represented by a file identifier, the file identifier referencing each of the respective plurality of blocks; and maintaining a last-reference-check timestamp for each of the blocks within each of the pluralities of blocks such that each last-reference-check timestamp indicates a last time, if ever, the block was validated to confirm that the block was referenced within the system; maintaining a last-validation timestamp for each file identifier such that each last-validation timestamp indicates when, if ever, each block referenced by the file identifier had been validated to confirm that the file identifier referenced the respective block; removing a block from the data storage system when the last-reference-check timestamp associated with the block is earlier than the earliest last-validation timestamp in the system. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification