Preferential selection of candidates for delta compression
First Claim
1. A computer-implemented method for improving efficiency in a delta compression process in a data storage system, the method comprising:
- selecting a data chunk to delta compress;
generating a sketch for the selected data chunk;
searching for a set of candidate data chunks using the sketch;
ranking the set of candidate data chunks by degree of sketch matching;
tie-breaking the set of candidate data chunks, where the set of candidate data chunks has an equal degree of sketch matching, including ranking a candidate data chunk stored in a cache higher than a candidate data chunk stored on a storage device, and, ranking a decompressed candidate data chunk higher than a compressed candidate data chunk having a same location status as the decompressed candidate data chunk;
delta compressing the selected data chunk with a selected candidate data chunk of the ranked set of candidate data chunks; and
storing the delta compressed selected data chunk in the data storage system.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method and system for improving efficiency in a delta compression process in a data storage system selects a data chunk to delta compress and generates a sketch for the selected data chunk. The method and system search for a set of candidate data chunks with a matching sketch and rank the set of candidate data chunks by degree of sketch matching. The set of candidate data chunks are tie-braked using location status data for each candidate and the selected data chunk is delta compressed with a selected candidate data chunk. The delta compressed selected data chunk is then stored in a data storage system.
-
Citations
17 Claims
-
1. A computer-implemented method for improving efficiency in a delta compression process in a data storage system, the method comprising:
-
selecting a data chunk to delta compress; generating a sketch for the selected data chunk; searching for a set of candidate data chunks using the sketch; ranking the set of candidate data chunks by degree of sketch matching; tie-breaking the set of candidate data chunks, where the set of candidate data chunks has an equal degree of sketch matching, including ranking a candidate data chunk stored in a cache higher than a candidate data chunk stored on a storage device, and, ranking a decompressed candidate data chunk higher than a compressed candidate data chunk having a same location status as the decompressed candidate data chunk; delta compressing the selected data chunk with a selected candidate data chunk of the ranked set of candidate data chunks; and storing the delta compressed selected data chunk in the data storage system. - View Dependent Claims (2, 3, 4, 5, 7)
-
-
6. A computer-implemented method for improving efficiency in a delta compression process in a data storage system, the method comprising:
-
selecting a data chunk to delta compress; generating a sketch for the selected data chunk; searching for a set of candidate data chunks using the sketch; ranking the set of candidate data chunks by degree of sketch matching; tie-breaking the set of candidate data chunks, where the set of candidate data chunks has an equal degree of sketch matching, using location status data for each candidate data chunk, wherein the location status data indicates one or more of the location of a corresponding candidate data chunk in a set of memory devices or a compression status of the corresponding candidate data chunk, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, wherein tie-breaking the set of candidate data chunks using location status data for each candidate prefers a decompressed in a cache status over a compressed in a cache status, and wherein tie-breaking the set of candidate data chunks using location status data for each candidate prefers a compressed in a cache status over a compressed in a data storage status; delta compressing the selected data chunk with a selected candidate data chunk; and storing the delta compressed selected data chunk in the data storage system.
-
-
8. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform a method, the method for improving efficiency in a delta compression process, the method comprising:
-
selecting a data chunk to delta compress; generating a sketch for the selected data chunk; searching for a set of candidate data chunks using the sketch; ranking the set of candidate data chunks by degree of sketch matching; tie-breaking the set of candidate data chunks, where the set of candidate data chunks has an equal degree of sketch matching, including ranking a candidate data chunk stored in a cache higher than a candidate data chunk stored on a storage device, and, ranking a decompressed candidate data chunk higher than a compressed candidate data chunk having a same location status as the decompressed candidate data chunk; delta compressing the selected data chunk with a selected candidate data chunk of the ranked candidate data chunks; and storing the delta compressed selected data chunk in a data storage system. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform a method, the method for improving efficiency in a delta compression process, the method comprising:
-
selecting a data chunk to delta compress; generating a sketch for the selected data chunk; searching for a set of candidate data chunks using the sketch; ranking the set of candidate data chunks by degree of sketch matching; tie-breaking the set of candidate data chunks, where the set of candidate data chunks has an equal degree of sketch matching, using location status data for each candidate data chunk, wherein the location status data indicates one or more of the location of a corresponding candidate data chunk in a set of memory units or a compression status of the corresponding candidate data chunk, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, wherein tiebreaking the set of candidate data chunks using location status data for each candidate prefers a decompressed in a cache status over a compressed in a cache status, wherein tiebreaking the set of candidates data chunks using location status data for each candidate prefers a compressed in a cache status over a compressed in a data storage status; delta compressing the selected data chunk with a selected candidate data chunk; and storing the delta compressed selected data chunk in a data storage system.
-
-
15. A delta compression system, comprising:
-
a delta processing module to delta compress a first set of data chunks; and a cache to store a second set of data chunks; a data storage system to store a third set of data chunks; a preferential selection module coupled to the cache, data storage system and delta processing module, the preferential selection module to identify a candidate to serve as a base chunk for delta compression by ranking a set of candidate base chunks from the second set of data chunks and third set of data chunks by similarity to a data chunk from the first set of data chunks then tie-breaking a ranked set of candidate data chunks, where the set of candidate data chunks has an equal degree of sketch matching, tie-breaking comprises ranking a candidate data chunk stored in a cache higher than a candidate data chunk stored on a storage device, and, ranking a decompressed candidate data chunk higher than a compressed candidate data chunk having a same location status as the decompressed candidate data chunk; the delta compression module configured to; select a data chunk to compress from the first set of data chunks; delta compress the selected data chunk with the identified candidate to serve as a base chunk for delta compression; and store the delta compressed selected data chunk in the data storage system. - View Dependent Claims (16)
-
-
17. A delta compression system comprising:
-
a delta processing module to delta compress a first set of data chunks; and a cache to store a second set of data chunks; a data storage system to store a third set of data chunks; a preferential selection module coupled to the cache, data storage system and delta processing module, the preferential selection module to identify a candidate to serve as a base chunk for delta compression by ranking a set of candidate base chunks from the second set of data chunks and third set of data chunks by similarity to a data chunk from the first set of data chunks then tie-breaking a ranked set of candidate data chunks, where the set of candidate data chunks has an equal degree of sketch matching, using location status information, wherein the location status information indicates one or more of the location of a corresponding candidate data chunk in a set of memory devices or a compression status of the corresponding candidate data chunk, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, wherein tie-breaking the set of candidate data chunks using location status data for each candidate prefers a decompressed in a cache status over a compressed in a cache status, wherein tie-breaking the set of candidate data chunks using location status data for each candidate prefers a compressed in a cache status over a compressed in a data storage status.
-
Specification