Apparatus and method for very high data rate-compression incorporating lossless data compression and expansion utilizing a hashing technique
First Claim
1. A compression method for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression method comprising the steps of:
- a. initializing a hash table and initializing an SRC pointer;
b. processing input data in the order in which the characters in the data appear and hashing input data subblocks of the minimum compression size selected;
c. maintaining a hash table which contains at each entry, an SRC pointer which points to a previous subblock which hashed to this hash table entry, such that the possibility of any string of data previously occurring in the input block may be tested by hashing the current subblock to a hash table entry, obtaining the previous SRC pointer contained in that entry, and comparing the two strings of data;
d. if the two strings of data match on at least the size of the subblock, then generating a backwards pointer to the previous occurrence of the same string of data and thereby compressing the second occurrence of the string of data;
e. if the two strings of data do not match, then storing the string of data as incompressible data; and
f. continuing steps b. through e. until the entire input data has been processed.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for compressing digital data that is represented as a sequence of characters drawn from an alphabet. An input data block is processed into an output data block composed of sections of variable length. Unlike most prior art methods which emphasize the creation of a dictionary comprised of a tree with nodes or a set of strings, the present invention creates its own pointers from the sequence characters previously processed and emphasizes the highest priority on maximizing the data rate-compression factor product. The use of previously input data acting as the dictionary combined with the use of a hashing algorithm to find candidates for string matches and the absence of a traditional string matching table and associated search time allows the compressor to very quickly process the input data block. Therefore, the result is a high data rate-compression factor product achieved due to the absence of any string storage table and matches being tested only against one string.
-
Citations
15 Claims
-
1. A compression method for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression method comprising the steps of:
-
a. initializing a hash table and initializing an SRC pointer; b. processing input data in the order in which the characters in the data appear and hashing input data subblocks of the minimum compression size selected; c. maintaining a hash table which contains at each entry, an SRC pointer which points to a previous subblock which hashed to this hash table entry, such that the possibility of any string of data previously occurring in the input block may be tested by hashing the current subblock to a hash table entry, obtaining the previous SRC pointer contained in that entry, and comparing the two strings of data; d. if the two strings of data match on at least the size of the subblock, then generating a backwards pointer to the previous occurrence of the same string of data and thereby compressing the second occurrence of the string of data; e. if the two strings of data do not match, then storing the string of data as incompressible data; and f. continuing steps b. through e. until the entire input data has been processed. - View Dependent Claims (2, 3)
-
- 4. A compression method for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data string to be compressed, said compression method comprising the creation of a hash table, hashing each occurrence of a string of input data and subsequently searching for identical strings of input data and if such an identical string of input data is located whose string size is at least equal to the minimum compression size selected, compressing the second and all subsequent occurrences of such identical string of data, if a string of data is located which does not match to a previously compressed string of data, storing such data as uncompressed data, and for each input string after each hash is used to find a possible previous match location for the string, the location of the string is stored in the hash table, thereby using the previously processed data to act as a compression dictionary.
-
6. A compression method for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression method comprising the steps of:
-
a. creating an identifier header having a count value for data to follow and marking the identifier header with a code to designate whether incompressible or compressible data follows; b. reading input data in the order in which the input data appears and hashing subblocks and comparing them to previous subblocks with the same hash until a match to a previous subblock occurs which is a match on a string up to the maximum count value in the identifier header, and then generating a backwards pointer to designate the second occurrence of a character string equal to a previous identical string of characters of at least minimum match size and thereby compressing the second occurrence of the string; and c. continuing generating identifier headers and data until the entire input data stream has been processed. - View Dependent Claims (7, 8)
-
-
9. In a data compression method, having defined at least one variable length compression match output token which includes a count field containing a maximum count value and an incompressible data indicator field which together form an identification header, a backwards pointer to a previously encountered data field, and a size of backwards pointer field, having further defined at least one variable length incompressible data output token which consists of the identification header followed by incompressible data of length specified by the count field, and having further defined a subblock size of a minimum length for performing hash computations that cannot result in the compressed data being larger than the uncompressed data, a method for compressing input data into compressed output data, said compression method comprising the steps of:
-
a. initializing a hash table, a source pointer and a destination pointer; b. initializing a count value and initializing and storing an identification header at the location of the destination pointer and then incrementing the destination pointer; c. reading input subblocks pointed to by the source pointer, and computing hash values for the subblocks in the order in which they appear, and comparing the string located by the previous entry for this hash value to determine potential matches and unconditionally replacing the hash table entry with the current source pointer; d. if a hash match does not occur, the data character pointed to by the source pointer is copied to the data area pointed to by the destination pointer, and the source pointer, destination pointer, and count value are incremented such that if the count value then is equal to the maximum count value, the maximum count value is stored in the current identification header and processing resumes with step b.; e. if a hash match occurs, the match length of the data pointed to by the current hash table value and the data pointed to by the prior identical hash table value is determined; f. if the match length so computed is less than the minimum match length, step d. is performed as if no hash value match had occurred; g. if the match length so computed is equal to or greater than the minimum match length, the current identification header is updated with the count value to complete a prior packet of uncompressible data and an output match token is then stored at the location pointed to by the destination point consisting of a count field containing the match count, a raw data indicator field indicating that a backwards pointer follows, and a variable length backwards pointer that points to the most recent previous occurrence of the matching data and then the source pointer is incremented by the match count and the destination pointer is incremented by the output match size; and h. repeating steps b. through g. until the source data has been processed.
-
-
10. A compression apparatus for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression apparatus comprising:
-
a. means for initializing a hash table and means for initializing an SRC pointer; b. means for processing input data in the order in which the characters in the data appear and means for hashing input data subblocks of the minimum compression size selected; c. means for maintaining a hash table which contains at each entry, an SRC pointer which points to a previous subblock which hashed to this hash table entry, such that the possibility of any string of data previously occurring in the input block may be tested by hashing the current subblock to a hash table entry, means for obtaining the previous SRC pointer contained in that entry, and means for comparing the two strings of data; d. if the two strings of data match on at least the size of the subblock, then means for generating a backwards pointer to the previous occurrence of the same string of data and thereby compressing the second occurrence of the string of data; and e. if the two strings of data do not match, then means for storing the string of data as incompressible data. - View Dependent Claims (11)
-
- 12. A compression apparatus for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data string to be compressed, said compression apparatus comprising the creation of a hash table, means for hashing each occurrence of a string of input data and subsequently searching for identical strings of input data and if such an identical string of input data is located whose string size is at least equal to the minimum compression size selected, means for compressing the second and all subsequent occurrences of such identical string of data, if a string of data is located which does not match to a previously compressed string of data, means for storing such data as uncompressed data, and for each input string after each hash is used to find a possible previous match location for the string, means for storing the location of the string in the hash table, thereby using the previously processed data to act as a compression dictionary.
-
14. A compression apparatus for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression apparatus comprising:
-
a. means for creating an identifier header having a count value for data to follow and means for marking the identifier header with a code to designate whether incompressible or compressible data follows; and b. means for reading input data in the order in which the input data appears and means for hashing subblocks and comparing them to previous subblocks with the same hash until a match to a previous subblock occurs which is a match on a string up to the maximum count value in the identifier header, and then means for generating a backwards pointer to designate the second occurrence of a character string equal to a previous identical string of characters of at least minimum match size and thereby compressing the second occurrence of the string. - View Dependent Claims (15)
-
Specification