Highly Scalable and Distributed Data De-Duplication
First Claim
Patent Images
1. A method comprising:
- partitioning, in a data storage system, each of a plurality of instances of digital data into a respective plurality of blocks, wherein each instance of digital data is represented by a file identifier, the file identifier referencing each of the respective plurality of blocks; and
maintaining a last-reference-check timestamp for each of the blocks within each of the pluralities of blocks such that each last-reference-check timestamp indicates a last time, if ever, the block was validated to confirm that the block was referenced within the system;
maintaining a last-validation timestamp for each file identifier such that each last-validation timestamp indicates when, if ever, each block referenced by the file identifier had been validated to confirm that the file identifier referenced the respective block;
removing a block from the data storage system when the last-reference-check timestamp associated with the block is earlier than the earliest last-validation timestamp in the system.
5 Assignments
0 Petitions
Accused Products
Abstract
This disclosure relates to systems and methods for both maintaining referential integrity within a data storage system, and freeing unused storage in the system, without the need to maintain reference counts to the blocks of storage used to represent and store the data.
-
Citations
24 Claims
-
1. A method comprising:
-
partitioning, in a data storage system, each of a plurality of instances of digital data into a respective plurality of blocks, wherein each instance of digital data is represented by a file identifier, the file identifier referencing each of the respective plurality of blocks; and maintaining a last-reference-check timestamp for each of the blocks within each of the pluralities of blocks such that each last-reference-check timestamp indicates a last time, if ever, the block was validated to confirm that the block was referenced within the system; maintaining a last-validation timestamp for each file identifier such that each last-validation timestamp indicates when, if ever, each block referenced by the file identifier had been validated to confirm that the file identifier referenced the respective block; removing a block from the data storage system when the last-reference-check timestamp associated with the block is earlier than the earliest last-validation timestamp in the system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a memory capable of storing data; and a processor configured for; partitioning, in a data storage system, each of a plurality of instances of digital data into a respective plurality of blocks, wherein each instance of digital data is represented by a file identifier, the file identifier referencing each of the respective plurality of blocks; and maintaining a last-reference-check timestamp for each of the blocks within each of the pluralities of blocks such that each last-reference-check timestamp indicates a last time, if ever, the block was validated to confirm that the block was referenced within the system; maintaining a last-validation timestamp for each file identifier such that each last-validation timestamp indicates when, if ever, each block referenced by the file identifier had been validated to confirm that the file identifier referenced the respective block; removing a block from the data storage system when the last-reference-check timestamp associated with the block is earlier than the earliest last-validation timestamp in the system. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. Logic encoded in one or more non-transient media that includes code for execution and when executed by a processor is operable to perform operations comprising:
-
partitioning, in a data storage system, each of a plurality of instances of digital data into a respective plurality of blocks, wherein each instance of digital data is represented by a file identifier, the file identifier referencing each of the respective plurality of blocks; and maintaining a last-reference-check timestamp for each of the blocks within each of the pluralities of blocks such that each last-reference-check timestamp indicates a last time, if ever, the block was validated to confirm that the block was referenced within the system; maintaining a last-validation timestamp for each file identifier such that each last-validation timestamp indicates when, if ever, each block referenced by the file identifier had been validated to confirm that the file identifier referenced the respective block; removing a block from the data storage system when the last-reference-check timestamp associated with the block is earlier than the earliest last-validation timestamp in the system. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification