Variable bit-length reiterative lossless compression system and method
First Claim
Patent Images
1. A computer-implemented method of performing lossless compression of a digital data set, the method comprising:
- performing a compression process including;
analyzing at least a part of the data set to establish a partition thereof into N symbols of symbol length n, and to determine whether the N symbols can be further compressed, and, if so, a model to be used in encoding the N symbols;
if it has been determined that the N symbols can be further compressed, encoding the N symbols using the model and storing the encoded data in an iteration store;
if it has been determined that the N symbols cannot be compressed, storing the N symbols in the iteration store;
determining whether any part of the digital data set remains to be processed, and if so, then repeating the compression process for an additional part of the digital data set; and
if not, then substituting the contents of the iteration store for the data set and repeating the compression process on the data set thus updated until a specified end condition has been met, and then providing an output from the iteration store.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method of performing lossless compression of a digital data set uses an iterative compression process in which the number of symbols N and bit length per symbol n may vary on successive iterations. The process includes analyzing at least a part of the data set to establish a partition thereof into N symbols of symbol length n, and to determine whether the N symbols can be further compressed, and, if so, a model to be used in encoding the N symbols.
17 Citations
12 Claims
-
1. A computer-implemented method of performing lossless compression of a digital data set, the method comprising:
-
performing a compression process including; analyzing at least a part of the data set to establish a partition thereof into N symbols of symbol length n, and to determine whether the N symbols can be further compressed, and, if so, a model to be used in encoding the N symbols; if it has been determined that the N symbols can be further compressed, encoding the N symbols using the model and storing the encoded data in an iteration store; if it has been determined that the N symbols cannot be compressed, storing the N symbols in the iteration store; determining whether any part of the digital data set remains to be processed, and if so, then repeating the compression process for an additional part of the digital data set; and
if not, then substituting the contents of the iteration store for the data set and repeating the compression process on the data set thus updated until a specified end condition has been met, and then providing an output from the iteration store. - View Dependent Claims (11)
-
-
2. A computer-implemented method of analyzing a digital data set to establish a partition thereof into N symbols of symbol length n, and to determine whether the N symbols can be further compressed, and, if so, a model to be used in encoding the N symbols, the method comprising:
-
in a computer process, storing for later retrieval a set of values for a natural number n that is greater than 0; performing an analysis loop process including; retrieving a distinct one of the stored values for n, for each symbol of length n in the data set, performing an entropy calculation including; receiving such symbol from the digital data set at an input; for each model in a set of models, computing an entropy value and adding the computed entropy value to an entropy accumulator for such model; and using the value in the entropy accumulator, computing, for such symbol, a compression score and an updated value of partition size N for such model; using the values in the entropy accumulators for each of the models, identifying the model having the best compression score; and
storing the identified model, its corresponding value of N, and its compression score;determining whether a processing end point has been reached, and if not, then repeating the analysis loop process for an additional value of n;
otherwiseevaluating the stored compression scores for each value of n to identify the value of n having the best compression score; and
returning the identified value of n, and its corresponding model and its corresponding value of N. - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method of performing lossless compression of a digital data set, the method comprising:
-
performing an analysis process including; analyzing at least a part of the data set to establish a partition thereof into N symbols of symbol length n, and to determine whether the N symbols can be further compressed, and, if so, a model to be used in encoding the N symbols; storing, in an analysis FIFO, a set of values of n, N, and the model for the just-analyzed portion of the digital data set; determining whether any part of the digital data set remains to be processed, and if so, then repeating the analysis process for an additional part of the digital data set; for each set of values in the analysis FIFIO, performing an encoding process including; retrieving, from the analysis FIFO, such set of values of n, N, and the model; if it has been determined that the N symbols can be further compressed, encoding the N symbols using the retrieved model and retrieved value of n and storing the compressed data in an iteration store; if it has been determined that the N symbols cannot be compressed, storing the N symbols in the iteration store; substituting the contents of the iteration store for the data set and repeating the analysis process and the compression process on the data set thus updated until a specified end condition has been met, and then providing an output from the iteration store. - View Dependent Claims (10)
-
-
12. A non-volatile storage medium in which is stored a compressed digital data set, the compressed data set comprising:
a sentinel indicating whether or not the results of decompressing said compressed digital data set results in an endpoint condition; a set of partitions, wherein at least one of the partitions includes compressed digital data, and each partition includes; a partition header, the header including; a type field indicating whether the partition contains compressed or uncompressed data; a sentinel indicating whether the partition is the last partition in the compressed digital data set; if the type field indicates that the partition contains uncompressed data, a length field indicating the number of bits of uncompressed data in the partition; if the type field indicates that the partition contains compressed data, a symbol field containing a value of little-n in the partition, a model field containing a value identifying the model used in the partition, and if the last partition, a discard field indicating a number of bits in the last decompressed symbol from the partition that must be discarded; and data of the partition, such data having content and format characterized by the partition header.
Specification