Data compression using hashing
First Claim
1. A method of compressing a stream of input data into a compressed stream of output data comprising:
- (a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys,(b) hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches,(c) hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, and(d) if a hash match occurs in at least one hash table, outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, outputting at least the first character of an input subblock as uncompressed data.
2 Assignments
0 Petitions
Accused Products
Abstract
Compressing a sequence of characters drawn from an alphabet uses string substitution with no a priori information. An input data block is processed into an output data block comprised of variable length incompressible data sections and variable length compressed token sections. Multiple hash tables are used based on different subblock sizes for string matching, and this improves the compression ratio and rate of compression. The plurality of uses of the multiple hash tables allows for selection of an appropriate compression data rate and/or compression factor in relation to the input data. Using multiple hashing tables with a recoverable hashing method further improves compression ratio and compression rate. Each incompressible data section contains means to distinguish it from compressed token sections.
-
Citations
91 Claims
-
1. A method of compressing a stream of input data into a compressed stream of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, and (d) if a hash match occurs in at least one hash table, outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, outputting at least the first character of an input subblock as uncompressed data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of compressing a stream of input data into a compressed stream of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs in at least one hash table, outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, outputting at least the first character of an input subblock as uncompressed data, and (e) determining at least part of a hash key for one of the hash tables from the hash key of another of the hash tables. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A method of compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match for each of the subblocks does not occur in any hash table, outputting at least the first character of an input subblock as uncompressed data, and (e) if a hash match occurs in at least one hash table, outputting the subsequent occurrence of the string as compressed data, such that when a hash match occurs in a table related to a larger subblock size and a hash match occurs in a table related to a smaller subblock size, the hash match in the hash table of the larger subblock size is selected for the compressed data output. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A method of compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match for each of the subblocks does not occur in any hash table, outputting at least the first character of an input subblock as uncompressed data, and (e) if a hash match occurs in a hash table, outputting the subsequent occurrence of the string as compressed data, such that where a hash match occurs for more than one hash table, determining the longer match length of the string of input data for each respective hash table, and outputting the longer match length as the compressed data output. - View Dependent Claims (20, 21, 22, 23)
-
-
18. A method of compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match for each of the subblocks does not occur in any hash table, outputting at least the first character of an input subblock as uncompressed data, (e) if a hash match occurs in a hash table, outputting the subsequent occurrence of the string as compressed data, such that where a hash match occurs in a table related to a larger subblock size relative to a hash match in a table related to a smaller subblock size, the hash match of the larger subblock size is selected as the compressed data output, and (f) wherein the hash keys for the larger subblock size are computed using at least part of the value of hash keys from the smaller subblock size. - View Dependent Claims (24, 25, 26, 27)
-
-
19. A method of compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match for each of the subblocks does not occur in any hash table, outputting at least the first character of an input subblock as uncompressed data, (e) wherein when a hash match occurs in a table related to a larger subblock size relative to a hash match in a table related to a smaller subblock size, selecting the hash match of the larger subblock size as the compressed data output, and (f) if a hash match occurs in a hash table, outputting the subsequent occurrence of the string as compressed data, such that when a larger subblock size does not match on a minimum match length, selecting the smaller subblock size hash value if such smaller subblock value matches for its respective minimum length. - View Dependent Claims (28, 29, 30, 31)
-
-
32. A method of compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs in at least one hash table, compressing and outputting the subsequent occurrence of the string in accordance with the hash match, and if a hash miss occurs, outputting such data as uncompressed data, and (e) when a hash miss occurs in the hash tables, hashing is effected for at least one of the hash tables by employing a hash key from the hash miss whereby a reduced number of operations is required for hashing. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
-
-
44. A method of compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match for each of the subblocks does not occur in any hash table, outputting at least the first character of an input subblock as uncompressed data, and (e) if a hash match occurs outputting the subsequent occurrence of the string as compressed data, such that when a hash match occurs in a table related to a larger subblock size relative to a hash match in a table related to a smaller subblock size, selectively applying the hash match from either the larger subblock size, thereby to obtain a higher compression rate or applying the longest match of both the hash matches thereby to obtain a high compression ratio. - View Dependent Claims (45, 46, 47, 48)
-
-
49. A method of compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining a hash table for a data subblock size, the hash table having hash keys, (b) hashing a string of input characters of the input data for the data subblock size and entering the hash information into hash table entries addressed by hash keys, (c) hashing, for the subblock size, subsequent strings of data and searching for a match of prior strings related to the hashed information addressed by the hash keys in the hash table, (d) if a hash match occurs, compressing and outputting the subsequent occurrence of the string as compressed data, and if a hash miss occurs, outputting at least the first character of the subblock as uncompressed data, and (e) when a hash miss occurs in the hash table, hashing is effected for a next key for the hash table by employing the key from the hash miss whereby a reduced number of operations is required for hashing. - View Dependent Claims (50)
-
-
51. Apparatus for compressing a stream of input data into a compressed stream of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) means for hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and means for using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) means for hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, and (d) if a hash match occurs in at least one hash table, means for outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, means for outputting at least the first character of an input subblock as uncompressed data. - View Dependent Claims (52, 53, 54, 55)
-
-
56. Apparatus for compressing a stream of input data into a compressed stream of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) means for hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and means for using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) means for hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs in at least one hash table, means for outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, means for outputting at least the first character of an input subblock as uncompressed data, and (e) means for determining at least part of hash key for one of the hash tables from the hash key of the second of the hash tables. - View Dependent Claims (57)
-
-
58. Apparatus for compressing a steam of input data into a compressed stream of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) means for hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and means for using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) means for hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs in at least one hash table, means for outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, means for outputting at least the first character of an input subblock as uncompressed data, and (e) means, when a hatch match occurs for different tables, for selecting a larger subblock size for the compressed data output of step (d). - View Dependent Claims (59, 60, 61)
-
-
62. Apparatus for compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) means for hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) means for hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs in at least one hash table, means for outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, means for outputting at least the first character of an input subblock as uncompressed data, and (e) wherein where a hash match occurs for more than one of the hash tables, means for determining the longer match length of the string of input data for each respective hash table, and means for outputting the longer match length as the compressed data output of step (d). - View Dependent Claims (65, 66, 67, 68)
-
-
63. Apparatus for compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) means for hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) means for hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs in at least one hash table, means for outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, means for outputting at least the first character of an input subblock as uncompressed data, (e) wherein where a hash match occurs in a table related to a larger subblock size relative to a hash match in a table related to a smaller subblock size, means for selecting the hash match of the larger subblock size for the compressed data output of step (d), and (f) wherein when the one subblock size is larger than the second subblock size, means for computing hash keys for the larger subblock size, using at least part of the value of hash keys from the smaller subblock size. - View Dependent Claims (69, 70, 71, 72)
-
-
64. Apparatus for compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) means for hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) means for hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs in at least one hash table, means for outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, means for outputting at least the first character of an input subblock as uncompressed data, and (e) wherein when a larger subblock size does not match on a minimum match length, means for selecting the smaller subblock size hash match if such smaller subblock value matches.
-
-
73. Apparatus for compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) means for hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) means for hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs in at least one hash table, means for outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, means for outputting at least the first character of an input subblock as uncompressed data, and (e) when a hash miss occurs in at least one of the hash tables, means for effecting hashing for an increased subblock size for the hash table, by employing hash keys from the hash miss whereby a reduced number of operations is required for hashing. - View Dependent Claims (74, 75, 76, 77, 78)
-
-
79. Apparatus for compressing a stream of input data into a compressed stream of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) means for hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) means for hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs in at least one hash table, means for outputting the subsequent occurrence of the string as compressed data, and if a hash match for each of the subblocks does not occur in any hash table, means for outputting at least the first character of an input subblock as uncompressed data, and (e) wherein, when a hash match occurs in a table related to a larger subblock size relative to a hash match and a table related to a smaller subblock size, means for selectively applying the hash match from either the larger subblock size or the smaller subblock size. - View Dependent Claims (80, 81, 82, 83)
-
-
84. Apparatus for compressing a stream of input data into a compressed string of output data comprising:
-
(a) maintaining multiple hash tables, each respectively for a different data subblock size, and each hash table having entries having respective hash keys, (b) means for hashing a string of input characters of the input data for each of the different data subblock sizes to obtain the hash keys, and using these hash keys to address hash table entries containing hash information to facilitate location of string matches, (c) means for hashing, for each of the subblock sizes, subsequent strings of data and searching for a match of prior strings related to the information addressed by hash keys in at least one of the hashing tables, (d) if a hash match occurs, means for compressing and outputting the subsequent occurrence of the string as compressed data, and if a hash miss occurs, means for outputting such data as uncompressed data, and (e) when a hash miss occurs in the hash table, means for effecting hashing for a next key for the hash table, by employing hash keys from the hash miss whereby a reduced number of operations is required for hashing. - View Dependent Claims (85)
-
-
86. A generator for hashing a stream of input data comprising:
-
(a) interfaces to multiple hash tables designating a different data subblock size for each hash table, (b) means for hashing strings of input characters of the input data for each of the different data subblock sizes to produce hash keys, and means for entering the hash information into hash entries of the hash table addressed by hash keys, (c) means for producing multiple hash keys for each of the subblock sizes, and (d) means for producing a hash key for a first subblock size from a hash key of a second subblock size. - View Dependent Claims (87, 88)
-
-
89. A generator for hashing a stream of data comprising:
-
(a) interface means to a hash table with a designated input data subblock size for the hash table, and the hash table having hash keys, (b) means for hashing strings of input characters of the input data for each input data subblocks, and means for entering the hashed information into hash entries of the hash table addressed by the hash keys, and (c) means for producing a next hash key from a prior hash key on occurrence of a hash miss. - View Dependent Claims (90, 91)
-
Specification