Method for cleaning a delta storage system
First Claim
Patent Images
1. A computer-implemented method for performing garbage collection in a delta compressed data storage system, the method comprising:
- selecting each of a plurality of file recipes stored in the data storage system to traverse to identify a plurality of data chunks, the plurality of file recipes representing a plurality of files, wherein each file recipe includes a plurality of chunk identifiers identifying a plurality of data chunks that constitute a corresponding file represented by the file recipe and locations of the data chunks within the corresponding file, wherein each chunk identifier includes a fingerprint of an associated data chunk;
selecting each chunk identifier included in each of the file recipes;
adding each of the chunk identifiers to a set of live data chunks;
for each of the data chunks identified by the chunk identifiers of the file recipes, performing a lookup operation in an index of the storage system to determine whether there is a delta reference associated with the data chunk, the index including a plurality of entries corresponding to the plurality of data chunks stored in the data storage system respectively, wherein each entry of the index stores a fingerprint of a corresponding data chunk, a storage location of the corresponding data chunk, and a delta reference if the corresponding data chunk has been delta encoded;
adding delta references for the file recipes corresponding to the chunk identifiers that are obtained from the index to the set of live data chunks, wherein the index is stored separately from the file recipes; and
discarding data chunks in a data storage system not identified by the set of live data chunks, wherein the set of live data chunks is a listing that identifies live data chunks in the data storage system distinct from the listing of data chunks in the data storage system and the file recipes.
9 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for performing garbage collection in a delta compressed data storage system selects a file recipe to traverse to identify live data chunks and a chunk identifier from the file recipe. The chunk identifier is added to a set of live data chunks. Delta references in the file recipe corresponding to the chunk identifier are added to the set of live data chunks. Data chunks in a data storage system not identified by the set of live data chunks are discarded.
48 Citations
18 Claims
-
1. A computer-implemented method for performing garbage collection in a delta compressed data storage system, the method comprising:
-
selecting each of a plurality of file recipes stored in the data storage system to traverse to identify a plurality of data chunks, the plurality of file recipes representing a plurality of files, wherein each file recipe includes a plurality of chunk identifiers identifying a plurality of data chunks that constitute a corresponding file represented by the file recipe and locations of the data chunks within the corresponding file, wherein each chunk identifier includes a fingerprint of an associated data chunk; selecting each chunk identifier included in each of the file recipes; adding each of the chunk identifiers to a set of live data chunks; for each of the data chunks identified by the chunk identifiers of the file recipes, performing a lookup operation in an index of the storage system to determine whether there is a delta reference associated with the data chunk, the index including a plurality of entries corresponding to the plurality of data chunks stored in the data storage system respectively, wherein each entry of the index stores a fingerprint of a corresponding data chunk, a storage location of the corresponding data chunk, and a delta reference if the corresponding data chunk has been delta encoded; adding delta references for the file recipes corresponding to the chunk identifiers that are obtained from the index to the set of live data chunks, wherein the index is stored separately from the file recipes; and discarding data chunks in a data storage system not identified by the set of live data chunks, wherein the set of live data chunks is a listing that identifies live data chunks in the data storage system distinct from the listing of data chunks in the data storage system and the file recipes. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform operations of garbage collection in delta compressed data storage system, the operations comprising:
-
selecting each of a plurality of file recipes stored in the data storage system to traverse to identify a plurality of data chunks, the plurality of file recipes representing a plurality of files, wherein each file recipe includes a plurality of chunk identifiers identifying a plurality of data chunks that constitute a corresponding file represented by the file recipe and locations of the data chunks within the corresponding file, wherein each chunk identifier includes a fingerprint of an associated data chunk; selecting each chunk identifier included in each of the file recipes; adding each of the chunk identifiers to a set of live data chunks; for each of the data chunks identified by the chunk identifiers of the file recipes, performing a lookup operation in an index of the storage system to determine whether there is a delta reference associated with the data chunk, the index including a plurality of entries corresponding to the plurality of data chunks stored in the data storage system respectively, wherein each entry of the index stores a fingerprint of a corresponding data chunk, a storage location of the corresponding data chunk, and a delta reference if the corresponding data chunk has been delta encoded; adding delta references for the file recipes corresponding to the chunk identifiers that are obtained from the index to the set of live data chunks, wherein the index is sstored separately from the file recipes; and discarding data chunks in a data storage system not identified by the set of live data chunks, wherein the set of live data chunks is a listing that identifies live data chunks in the data storage system distinct from the listing of data chunks in the data storage system and the file recipes. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A delta compression system, comprising:
-
a processor; a memory; and a garbage collection module loaded in the memory and executed by the processor to perform operations of garbage collection, the operations including selecting each of a plurality of file recipes stored in the data storage system to traverse to identify a plurality of data chunks, the plurality of file recipes representing a plurality of files, wherein each file recipe includes a plurality of chunk identifiers identifying a plurality of data chunks that constitute a corresponding file represented by the file recipe and locations of the data chunks within the corresponding file, wherein each chunk identifier includes a fingerprint of an associated data chunk, selecting each chunk identifier included in each of the file recipes, adding each of the chunk identifiers to a set of live data chunks, for each of the data chunks identified by the chunk identifiers of the file recipes, performing a lookup operation in an index of the storage system to determine whether there is a delta reference associated with the data chunk, the index including a plurality of entries corresponding to the plurality of data chunks stored in the data storage system respectively, wherein each entry of the index stores a fingerprint of a corresponding data chunk, a storage location of the corresponding data chunk, and a delta reference if the corresponding data chunk has been delta encoded, adding delta references for the file recipes corresponding to the chunk identifiers that are obtained from the index to the set of live data chunks, wherein the index is stored separately from the file recipes, and discarding data chunks in a data storage system not identified by the set of live data chunks, wherein the set of live data chunks is a listing that identifies live data chunks in the data storage system distinct from the listing of data chunks in the data storage system and the file recipes. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification