Optimal sequential (de)compression of digital data
First Claim
1. In a computing system environment, a method of utilizing a computing device for compressing original data arranged as a plurality of symbols, comprising:
- recursively and sequentially advancing through the plurality of symbols to determine a most frequently occurring tuple of the plurality of symbols, wherein a tuple comprises at least two symbols;
recursively replacing, in the original data, the determined most frequently occurring tuple by a new symbol to generate a new compressed data stream;
recursively comparing a size of a most recent new compressed data stream and attendant overhead to a size of an immediately preceding new compressed data stream and attendant overhead to determine if a compression goal for the original data has been achieved, wherein the attendant overhead indicates an amount of computing resources required to operate a corresponding compressing operation; and
terminating the recursively determining, replacing, and comparing on determining that the compression goal has been achieved.
15 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus involve an original data stream arranged as a plurality of symbols. Of those symbols, all possible tuples are identified and the highest or most frequently occurring tuple is determined. A new symbol is created and substituted for each instance of the highest occurring tuple, which results in a new data stream. The new data stream is encoded and its size determined. Also, a size of a dictionary carrying all the original and new symbols is determined. The encoding size, the size of the dictionary and sizes of any other attendant overhead is compared to a size of the original data to see if compression has occurred, and by how much. Upon reaching pre-defined objectives, compression ceases. Decompression occurs oppositely. Other features include resolving ties between equally occurring tuples, path weighted Huffman coding, storing files, decoding structures, and computing arrangements and program products, to name a few.
114 Citations
19 Claims
-
1. In a computing system environment, a method of utilizing a computing device for compressing original data arranged as a plurality of symbols, comprising:
-
recursively and sequentially advancing through the plurality of symbols to determine a most frequently occurring tuple of the plurality of symbols, wherein a tuple comprises at least two symbols; recursively replacing, in the original data, the determined most frequently occurring tuple by a new symbol to generate a new compressed data stream; recursively comparing a size of a most recent new compressed data stream and attendant overhead to a size of an immediately preceding new compressed data stream and attendant overhead to determine if a compression goal for the original data has been achieved, wherein the attendant overhead indicates an amount of computing resources required to operate a corresponding compressing operation; and terminating the recursively determining, replacing, and comparing on determining that the compression goal has been achieved. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method utilizing a computing device for compressing original data arranged as a plurality of symbols, comprising:
-
recursively and sequentially advancing through the original data to determine all possible two-adjoining symbols of the plurality of symbols; recursively replacing in the original data a new symbol for a determined pair of most frequently occurring two-adjoining symbols of the plurality of symbols to generate a new compressed data stream; recursively comparing a size of a most recent new compressed data stream and attendant overhead to a size of an immediately preceding new compressed data stream and attendant overhead to determine if a compression goal for the original data has been achieved, wherein the attendant overhead indicates an amount of computing resources required to operate a corresponding compressing operation; and terminating the recursively determining, replacing, and comparing on determining that the compression goal has been achieved. - View Dependent Claims (10, 11, 12, 13)
-
-
14. In a computing system environment, a method utilizing a computing device for compressing original data arranged as a plurality of symbols, comprising:
-
recursively advancing through the plurality of symbols sequentially to determine all possible tuples of the plurality of symbols, wherein a tuple comprises two or more symbols; recursively determining most frequently occurring tuple of the identified all possible tuples; recursively replacing in the original data a new symbol for the determined most frequently occurring tuple to generate a new compressed data stream; recursively encoding the new compressed data stream; recursively comparing a size of a most recent encoded new compressed data stream and attendant overhead to a size of an immediately preceding encoded new compressed data stream and attendant overhead to determine if a compression goal for the original data has been achieved; and terminating the recursively advancing, determining, replacing, encoding, and comparing on determining that the compression goal has been achieved. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification