Method for cleaning a delta storage system
First Claim
Patent Images
1. A computer-implemented method for performing garbage collection in a delta compressed data storage system, the method comprising:
- selecting a file recipe for each file of a plurality of files in the delta compressed data storage system, wherein each file in the plurality of files has been segmented into data chunks, and each file recipe comprises a plurality of identifiers of data chunks that comprise the file;
traversing the selected file recipe to determine the plurality of data chunk identifiers of the data chunks of the file;
for each data chunk identifier in the selected file recipe;
adding the data chunk identifier to a set of live data chunks;
determining whether the data chunk identifier comprises a base chunk identifier identifying a base chunk and whether the file recipe contains a delta reference identifying a data chunk that is delta encoded relative to the base chunk;
adding the delta reference to the set of live data chunks, in response to determining that the data chunk identifier comprises a base chunk identifier and the file recipe contains a delta reference identifying a data chunk that is delta encoded relative to the base data chunk; and
discarding data chunks in the delta compressed data storage system that are not identified by the set of live data chunks.
9 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method and system for performing garbage collection in a delta compressed data storage system selects a file recipe to traverse to identify live data chunks and selects a chunk identifier from the file recipe. The chunk identifier is added to a set of live data chunks. Delta references in an entry of an index corresponding to the chunk identifier are added to the set of live data chunks. Data chunks in a data storage system not identified by the set of live data chunks are then discarded.
64 Citations
18 Claims
-
1. A computer-implemented method for performing garbage collection in a delta compressed data storage system, the method comprising:
-
selecting a file recipe for each file of a plurality of files in the delta compressed data storage system, wherein each file in the plurality of files has been segmented into data chunks, and each file recipe comprises a plurality of identifiers of data chunks that comprise the file; traversing the selected file recipe to determine the plurality of data chunk identifiers of the data chunks of the file; for each data chunk identifier in the selected file recipe; adding the data chunk identifier to a set of live data chunks; determining whether the data chunk identifier comprises a base chunk identifier identifying a base chunk and whether the file recipe contains a delta reference identifying a data chunk that is delta encoded relative to the base chunk; adding the delta reference to the set of live data chunks, in response to determining that the data chunk identifier comprises a base chunk identifier and the file recipe contains a delta reference identifying a data chunk that is delta encoded relative to the base data chunk; and discarding data chunks in the delta compressed data storage system that are not identified by the set of live data chunks. - View Dependent Claims (2, 3, 4, 5, 16)
-
-
6. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform operations for performing garbage collection in delta compressed data storage system, the operations comprising:
-
selecting a file recipe for each file of a plurality of files in the delta compressed data storage system, wherein each file in the plurality of files has been segmented into data chunks, and each file recipe comprises a plurality of identifiers of data chunks that comprise the file; traversing the selected file recipe to determine the plurality of data chunk identifiers of the data chunks of the file; for each data chunk identifier in the file recipe; adding the data chunk identifier to a set of live data chunks; determining whether the data chunk identifier comprises a base chunk identifier identifying a base chunk and whether the file recipe contains a delta reference identifying a data chunk that is delta encoded relative to the base chunk; adding the delta reference to the set of live data chunks, in response to determining that the chunk identifier contains a delta reference identifying a data chunk that is delta encoded to the data chunk; and discarding data chunks in the delta compressed data storage system that are not identified by the set of live data chunks. - View Dependent Claims (7, 8, 9, 10, 17)
-
-
11. A delta compression system, comprising:
-
a delta processing module to delta compresses data chunks; a data storage system to store delta compressed data chunks; and a garbage collection module coupled to the data storage system configured to; select a file recipe for each file of a plurality of files in the delta compression system, wherein each file in the plurality of files has been segmented into data chunks, and each file recipe comprises a plurality of identifiers of data chunks that comprise the file; traverse the selected file recipe to determine the plurality of data chunk identifiers of the data chunks of the file; for each data chunk identifier in the selected file recipe; add the data chunk to a set of live data chunks, determine whether the chunk identifier comprises a base chunk identifier identifying a base chunk and whether the file recipe contains a delta reference identifying a data chunk that is delta encoded relative to the base chunk, adding the delta reference to the set of live data chunks in response to determining that the data chunk identifier contains a reference identifying a data chunk that is delta encoded relative to the data chunk, the garbage collection module is configured to discard all data chunks that in the delta compression system that are not identified by the set of live data chunks. - View Dependent Claims (12, 13, 14, 15, 18)
-
Specification