Method for cleaning a delta storage system
First Claim
Patent Images
1. A computer-implemented method for performing garbage collection in a delta compressed data storage system, the method comprising:
- traversing a plurality of file recipes stored in the data storage system to identify a plurality of data chunks, the plurality of file recipes representing a plurality of files, wherein each file recipe includes a plurality of chunk identifiers identifying a plurality of data chunks that constitute a corresponding file represented by the file recipe;
for each of the file recipes,adding chunk identifiers contained in the file recipe to a set of live data chunks, andfor each of the chunk identifiers in the file recipe,performing a lookup operation in an index of the storage system to determine whether there is a delta reference associated with the data chunk identifier, wherein each entry of the index stores a chunk identifier of a corresponding data chunk and a delta reference when the corresponding data chunk has been delta encoded from another data chunk,traversing the index using the chunk identifiers referenced in each of the file recipes to determine matches between the chunk identifiers and entries in the index,selecting base data chunk identifiers associated with matched entries as the delta references, andadding the delta reference corresponding to the chunk identifier to the set of live data chunks; and
discarding data chunks in a data storage system that are not identified by the set of live data chunks, wherein the set of live data chunks is a listing that identifies live data chunks in the data storage system.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for performing garbage collection in a delta compressed data storage system selects a file recipe to traverse to identify live data chunks and a chunk identifier from the file recipe. The chunk identifier is added to a set of live data chunks. Delta references in the file recipe corresponding to the chunk identifier are added to the set of live data chunks. Data chunks in a data storage system not identified by the set of live data chunks are discarded.
-
Citations
21 Claims
-
1. A computer-implemented method for performing garbage collection in a delta compressed data storage system, the method comprising:
-
traversing a plurality of file recipes stored in the data storage system to identify a plurality of data chunks, the plurality of file recipes representing a plurality of files, wherein each file recipe includes a plurality of chunk identifiers identifying a plurality of data chunks that constitute a corresponding file represented by the file recipe; for each of the file recipes, adding chunk identifiers contained in the file recipe to a set of live data chunks, and for each of the chunk identifiers in the file recipe, performing a lookup operation in an index of the storage system to determine whether there is a delta reference associated with the data chunk identifier, wherein each entry of the index stores a chunk identifier of a corresponding data chunk and a delta reference when the corresponding data chunk has been delta encoded from another data chunk, traversing the index using the chunk identifiers referenced in each of the file recipes to determine matches between the chunk identifiers and entries in the index, selecting base data chunk identifiers associated with matched entries as the delta references, and adding the delta reference corresponding to the chunk identifier to the set of live data chunks; and discarding data chunks in a data storage system that are not identified by the set of live data chunks, wherein the set of live data chunks is a listing that identifies live data chunks in the data storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations of garbage collection in a delta compressed data storage system, the operations comprising:
-
traversing a plurality of file recipes stored in the data storage system to identify a plurality of data chunks, the plurality of file recipes representing a plurality of files, wherein each file recipe includes a plurality of chunk identifiers identifying a plurality of data chunks that constitute a corresponding file represented by the file recipe; for each of the file recipes, adding chunk identifiers contained in the file recipe to a set of live data chunks, and for each of the chunk identifiers in the file recipe, performing a lookup operation in an index of the storage system to determine whether there is a delta reference associated with the data chunk identifier, wherein each entry of the index stores a chunk identifier of a corresponding data chunk and a delta reference when the corresponding data chunk has been delta encoded from another data chunk, traversing the index using the chunk identifiers referenced in each of the file recipes to determine matches between the chunk identifiers and entries in the index, selecting base data chunk identifiers associated with matched entries as the delta references, and adding the delta reference corresponding to the chunk identifier to the set of live data chunks; and discarding data chunks in a data storage system that are not identified by the set of live data chunks, wherein the set of live data chunks is a listing that identifies live data chunks in the data storage system. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A data processing system, comprising:
-
a processor; a memory coupled to the processor; and a garbage collection module loaded in the memory and executed by the processor to perform operations of garbage collection, the operations including traversing a plurality of file recipes stored in the data storage system to identify a plurality of data chunks, the plurality of file recipes representing a plurality of files, wherein each file recipe includes a plurality of chunk identifiers identifying a plurality of data chunks that constitute a corresponding file represented by the file recipe, for each of the file recipes, adding chunk identifiers contained in the file recipe to a set of live data chunks, and for each of the chunk identifiers in the file recipe, performing a lookup operation in an index of the storage system to determine whether there is a delta reference associated with the data chunk identifier, wherein each entry of the index stores a chunk identifier of a corresponding data chunk and a delta reference when the corresponding data chunk has been delta encoded from another data chunk, traversing the index using the chunk identifiers referenced in each of the file recipes to determine matches between the chunk identifiers and entries in the index, selecting base data chunk identifiers associated with matched entries as the delta references, and adding the delta reference corresponding to the chunk identifier to the set of live data chunks, and discarding data chunks in a data storage system that are not identified by the set of live data chunks, wherein the set of live data chunks is a listing that identifies live data chunks in the data storage system. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification