PREDICTING DATA COMPRESSIBILITY USING DATA ENTROPY ESTIMATION
First Claim
1. In a computing environment, a method, comprising, processing data of a data block to predict compressibility of the data block, including obtaining an entropy estimate corresponding to the data block, determining whether the entropy estimate of the data block is high, and if not, outputting compressibility information that indicates that the data block is predicted to be sufficiently compressible.
3 Assignments
0 Petitions
Accused Products
Abstract
The subject disclosure is directed towards predicting compressibility of a data block, and using the predicted compressibility in determining whether a data block if compressed will be sufficiently compressible to justify compression. In one aspect, data of the data block is processed to obtain an entropy estimate of the data block, e.g., based upon distinct value estimation. The compressibility prediction may be used in conjunction with a chunking mechanism of a data deduplication system.
79 Citations
20 Claims
- 1. In a computing environment, a method, comprising, processing data of a data block to predict compressibility of the data block, including obtaining an entropy estimate corresponding to the data block, determining whether the entropy estimate of the data block is high, and if not, outputting compressibility information that indicates that the data block is predicted to be sufficiently compressible.
- 11. In a computing environment, a system comprising, a chunking mechanism of a deduplication system, the chunking mechanism configured to chunk data for storage in a chunk store, the chunking mechanism coupled to or incorporating a compression prediction mechanism, the compression prediction mechanism configured to process at least some of the data in a chunk to obtain an estimate of compressibility of the chunk based upon a data entropy estimation.
- 16. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising, estimating compressibility of a data block, including hashing at least some of the data of the data block into values in a data structure, using the data structure to obtain an estimated data entropy of the data block, and using the estimated data entropy to determine whether to compress the data block.
Specification