Method for cleaning a delta storage system
First Claim
Patent Images
1. A computer-implemented method for performing garbage collection in a delta compressed data storage system, the method comprising:
- selecting a file recipe identifying a plurality of data chunks within the data storage system that can be joined to reconstruct a file, where each of the plurality of data chunks is different and includes a plurality of bytes;
selecting a chunk identifier from the file recipe, where the chunk identifier is an identifier of a specific one of the plurality of data chunks;
adding the chunk identifier to a set of live data chunks;
adding a base chunk that is identified by a delta reference to the set of live data chunks, wherein the delta reference is stored in metadata of the specific data chunk;
discarding dead data chunks in the data storage system, where the dead data chunks are not identified by the set of live data chunks; and
sanitizing the dead data chunks by decompressing live data chunks referencing the dead data chunks.
9 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method and system for performing garbage collection in a delta compressed data storage system selects a file recipe to traverse to identify live data chunks and selects a chunk identifier from the file recipe. The chunk identifier is added to a set of live data chunks. Delta references in the file metadata corresponding to the chunk identifier are added to the set of live data chunks. Data chunks in a data storage system not identified by the set of live data chunks are then discarded.
73 Citations
18 Claims
-
1. A computer-implemented method for performing garbage collection in a delta compressed data storage system, the method comprising:
-
selecting a file recipe identifying a plurality of data chunks within the data storage system that can be joined to reconstruct a file, where each of the plurality of data chunks is different and includes a plurality of bytes; selecting a chunk identifier from the file recipe, where the chunk identifier is an identifier of a specific one of the plurality of data chunks; adding the chunk identifier to a set of live data chunks; adding a base chunk that is identified by a delta reference to the set of live data chunks, wherein the delta reference is stored in metadata of the specific data chunk; discarding dead data chunks in the data storage system, where the dead data chunks are not identified by the set of live data chunks; and sanitizing the dead data chunks by decompressing live data chunks referencing the dead data chunks. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform a method, the method for performing garbage collection in delta compressed data storage system, the method comprising:
-
selecting a file recipe identifying a plurality of data chunks within the data storage system that can be joined to reconstruct a file, where each of the plurality of data chunks is different and includes a plurality of bytes; selecting a chunk identifier from the file recipe, where the chunk identifier is an identifier of a specific one of the plurality of data chunks; adding the chunk identifier to a set of live data chunks; adding a base chunk that is identified by a delta reference to the set of live data chunks, wherein the delta reference is stored in metadata of the specific data chunk; discarding dead data chunks in the data storage system, where the dead data chunks are not identified by the set of live data chunks; and sanitizing the dead data chunks by decompressing live data chunks referencing the dead data chunks. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A delta compression system, comprising:
-
a delta processing module to delta compress a first set of data chunks; a data storage system to store a second set of data chunks; a garbage collection module coupled to the delta processing module and the data storage system, the garbage collection module operable to traverse each of a plurality of data chunk identifiers in each file recipe in the data storage system, wherein each data chunk identifier identifies a data chunk, wherein each file recipe identifies a plurality of data chunks that can be joined to reconstruct a file, wherein each of the plurality of data chunks is different and includes a plurality of bytes, the garbage collection module further operable to generate a set of live data chunks, the set of live data chunks including delta references, each delta reference identifying a base chunk, each delta reference retrieved from metadata of a data chunk corresponding to one of the plurality of data chunk identifiers, the garbage collection module further operable to discard dead data chunks not referenced in the set of live data chunks; and a sanitization module to sanitize the dead data chunks by decompressing live data chunks referencing the dead data chunks. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification