Compression using small dictionaries with applications to network packets
First Claim
1. A method for encoding input data in a dictionary based compression/decompression system comprising:
- storing unique multiple character strings from the input data as dictionary entries in the compression/decompression system;
encoding multi-character strings from the input data into codewords according to the address of dictionary entries matching the character strings;
identifying single-character strings from the input data that have not previously been stored in the compression/decompression dictionary;
encoding each single-character string into a special code and a partial code, each special and partial code representing a selectable portion of the associated single-character string; and
outputting a compressed data stream from the compression/decompression system having both the codewords corresponding to the encoded single-character strings and the codewords corresponding to the encoded multiple character strings, each special code and associated partial code uniquely identifying a single-character string and representing a subset of the total number of unique single-character strings that can exist in the input data.
3 Assignments
0 Petitions
Accused Products
Abstract
The invention is a dictionary initialization scheme adaptive to changes in the type and structure of input data. The compression ratio is increased by minimizing the number of data entries used to represent single characters in the input data. By using fewer codes than what is normally used to represent characters in an array of input data, the dictionary can have fewer entries than the alphabet size. A further aspect of the invention implements a type of run-length encoding in the LZ methodology which exploits the redundant structure existing in the compressed stream in the presence of a long run. Some of the codewords in the compressed stream are deleted but can be recovered at the decompression site. The foregoing LZE method is used alone, or used in combination with other methods to form a compression scheme that is especially useful for transmitting network packets.
157 Citations
23 Claims
-
1. A method for encoding input data in a dictionary based compression/decompression system comprising:
-
storing unique multiple character strings from the input data as dictionary entries in the compression/decompression system; encoding multi-character strings from the input data into codewords according to the address of dictionary entries matching the character strings; identifying single-character strings from the input data that have not previously been stored in the compression/decompression dictionary; encoding each single-character string into a special code and a partial code, each special and partial code representing a selectable portion of the associated single-character string; and outputting a compressed data stream from the compression/decompression system having both the codewords corresponding to the encoded single-character strings and the codewords corresponding to the encoded multiple character strings, each special code and associated partial code uniquely identifying a single-character string and representing a subset of the total number of unique single-character strings that can exist in the input data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for encoding input data in a dictionary based compression/decompression system comprising:
-
encoding character strings from the input data and storing unique character strings as data entries in the compression/decompression dictionary; outputting a compressed data stream from the compression/decompression system, the compressed data stream made up of codewords representing the encoded character strings; detecting a sequence of codewords in the compressed data stream that represent a run of input data characters, the run representing input data characters having the same value and processed by the compression/decompression system in sequential order; and compressing the compressed data stream by disabling the compression/decompression system from outputting part of the codeword run in the compressed data stream;
the value of the first codeword sent following the disabled codewords allowing for reconstruction of the sequence of disabled codewords. - View Dependent Claims (16, 17, 18)
-
-
19. A circuit for encoding input data in a dictionary based compression/decompression system comprising:
-
a compression/decompression engine for encoding single and multiple character strings from the input data into codewords and outputting the codewords as a compressed data stream; means for separating single-character strings from the input data into first and second code fields; means for encoding the first code field to identify single-character strings in the compressed data stream, the first code field encoded to be within a predefined range of compression/decompression engine code values; and means for generating single-character strings from the compressed data stream by decoding the first code field and combining it with the second code field, the first and second code fields reducing the bit-length of encoded character strings by representing a subset of all single-character strings that can possibly occur in the input data. - View Dependent Claims (20, 21, 22, 23)
-
Specification