Preferential selection of candidates for delta compression
First Claim
Patent Images
1. A computer-implemented method for improving efficiency in a delta compression process, the method comprising:
- selecting a data chunk to delta compress;
generating a sketch for the selected data chunk;
searching for a set of candidate data chunks using the sketch;
ranking the set of candidate data chunks with at least minimum degree of similarity by location status data, wherein the location status data indicates a location and a compression status of a candidate data chunk, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, and wherein ranking the set of candidate data chunks using location status data for each candidate prefers a compressed in a cache status over a compressed in a data storage status;
tie-breaking the set of candidates data chunks using degree of sketch similarity for each candidate; and
delta compressing the selected data chunk with a selected candidate data chunk.
9 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method and system for improving efficiency in a delta compression process selects a data chunk to delta compress and generates a sketch for the selected data chunk. A set of candidate data chunks with a matching sketch is searched for. The set of candidate data chunks with at least a minimum degree of similarity is ranked by location status data. Tie-breaking of the set of candidate data chunks is done using a degree of sketch similarity for each candidate and the selected data chunk is delta compressed with a selected candidate data chunk.
40 Citations
15 Claims
-
1. A computer-implemented method for improving efficiency in a delta compression process, the method comprising:
-
selecting a data chunk to delta compress; generating a sketch for the selected data chunk; searching for a set of candidate data chunks using the sketch; ranking the set of candidate data chunks with at least minimum degree of similarity by location status data, wherein the location status data indicates a location and a compression status of a candidate data chunk, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, and wherein ranking the set of candidate data chunks using location status data for each candidate prefers a compressed in a cache status over a compressed in a data storage status; tie-breaking the set of candidates data chunks using degree of sketch similarity for each candidate; and delta compressing the selected data chunk with a selected candidate data chunk. - View Dependent Claims (2, 3, 5)
-
-
4. A computer-implemented method for improving efficiency in a delta compression process, the method comprising:
-
selecting a data chunk to delta compress; generating a sketch for the selected data chunk; searching for a set of candidate data chunks using the sketch; ranking the set of candidate data chunks with at least minimum degree of similarity by location status data, wherein the location status data indicates a location and a compression status of a candidate data chunk, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, and wherein ranking the set of candidate data chunks using location status data for each candidate prefers a compressed in a cache status over a compressed in a data storage status; tie-breaking the set of candidates data chunks using degree of sketch similarity for each candidate; and delta compressing the selected data chunk with a selected candidate data chunk.
-
-
6. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform operations for improving efficiency in a delta compression process, the operations comprising:
-
selecting a data chunk to delta compress; selecting a data chunk to delta compress; generating a sketch for the selected data chunk; searching for a set of candidate data chunks using the sketch; ranking the set of candidate data chunks with at least minimum degree of similarity by location status data, wherein the location status data indicates a location and a compression status of a candidate data chunk status, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, and wherein ranking the set of candidate data chunks using location status data for each candidate prefers a decompressed in a cache status over a compressed in a cache status; tie-breaking the set of candidates data chunks using degree of sketch similarity for each candidate; and delta compressing the selected data chunk with a selected candidate data chunk. - View Dependent Claims (7, 8, 10)
-
-
9. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform operations for improving efficiency in a delta compression process, the operations comprising:
-
selecting a data chunk to delta compress; selecting a data chunk to delta compress; generating a sketch for the selected data chunk; searching for a set of candidate data chunks using the sketch; ranking the set of candidate data chunks with at least minimum degree of similarity by location status data, wherein the location status data indicates a location and a compression status of a candidate data chunk, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, and wherein ranking the set of candidate data chunks using location status data for each candidate prefers a compressed in a cache status over a compressed in a data storage status; tie-breaking the set of candidates data chunks using degree of sketch similarity for each candidate; and delta compressing the selected data chunk with a selected candidate data chunk.
-
-
11. A delta compression system, comprising:
-
a delta processing module to delta compress a first set of data chunks; and a cache to store a second set of data chunks; a data storage system to store a third set of data chunks; a preferential selection module coupled to the cache, data storage system and delta processing module, the preferential selection module to identify a candidate to serve as a base chunk for delta compression by ranking a set of candidate base chunks from the second set of data chunks and third set of data chunks by location status information that indicates a location and a compression status of a candidate base chunk then tie-breaking a ranked set of candidates by degree of similarity, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, and wherein ranking the set of candidate data chunks using location status data for each candidate prefers a compressed in a cache status over a compressed in a data storage status. - View Dependent Claims (12, 13, 15)
-
-
14. A delta compression system, comprising:
-
a delta processing module to delta compress a first set of data chunks; and a cache to store a second set of data chunks; a data storage system to store a third set of data chunks; a preferential selection module coupled to the cache, data storage system and delta processing module, the preferential selection module to identify a candidate to serve as a base chunk for delta compression by ranking a set of candidate base chunks from the second set of data chunks and third set of data chunks by location status information that indicates a location and a compression status of a candidate base chunk then tie-breaking a ranked set of candidates by degree of similarity, wherein the location status data indicates the location and status of the candidate data chunk as in any one of a compressed in a cache status, a decompressed in a cache status, or a compressed in a data storage status, and wherein ranking the set of candidate data chunks using location status data for each candidate prefers a compressed in a cache status over a compressed in a data storage status.
-
Specification