General purpose, hash-based technique for single-pass lossless data compression
First Claim
1. A method of compressing digital data of bytes of K bits each received from a data stream to form and to output a compressed data stream, comprising the steps of:
- (a) receiving the first byte of the data stream and storing the first byte in a compressor window;
(b) receiving the next byte of the data stream, storing the byte in a compressor window, and hashing the byte with the immediately preceding byte to obtain a hash value;
(c) addressing a reference table with the hash value obtained in step (b) to obtain any reference pointers to bytes within the compressor window stored at that reference-table address and to store at that reference-table address a new reference pointer to the current byte in the compressor window;
(d) for each reference pointer, if any, obtained in step (c), comparing the current byte with the byte in the compressor window pointed to by that reference pointer;
(e) in the event no reference pointers were obtained in step (c) or the current byte does not match the byte in the compressor window pointed to by any of the reference pointers obtained in step (c), then outputting within the compressed data stream a flag indicating an alphabet token and a compressed representation of the byte preceding the current byte;
(f) in the event the current byte matches the byte in the compressor window pointed to by any of the reference pointers addressed in step (c), then(1) receiving one or more additional bytes from the data stream and, for each, comparing the additional byte with the byte in the compressor window immediately after each match to determine if any match is extended by the additional byte, until the additional byte does not further extend any match, and(2) outputting, as part of the compressed data stream, a flag indicating string-substitution compression and a representation of the location within the compressor window where the match was found and a representation of the number of bytes in the match;
(g) repeating steps (b) through (f) to receive each byte of the data stream subsequent to the first byte.
5 Assignments
0 Petitions
Accused Products
Abstract
A general-purpose, single-pass, adaptive, and lossless data compression invention implements an LZ1-like method using a hash-based architecture. It is suitable for use in data storage and data communications applications. Implementation efficiency, in terms of required memory and logic gates relative to the typical compression ratio achieved, is highly optimized. An easy-to-implement and quick-to-verify hash function is used. Differential copy lengths may be used to reduce the number of bits required to encode the copy-length field within copy tokens. That is, if multiple matches to a sequence of input bytes are found in the current window, then the length of the copy may be encoded as the difference between the lengths of the longest and the second-longest match, which results in a smaller copy length which likely has a shorter encoded representation. To further increase the compression achieved, literals are not used, but rather input bytes without window matches are mapped into alphabet tokens of variable length using a unary-length code. Other unary-length codes are used to represent the copy-length field and the displacement field within copy tokens.
-
Citations
96 Claims
-
1. A method of compressing digital data of bytes of K bits each received from a data stream to form and to output a compressed data stream, comprising the steps of:
-
(a) receiving the first byte of the data stream and storing the first byte in a compressor window; (b) receiving the next byte of the data stream, storing the byte in a compressor window, and hashing the byte with the immediately preceding byte to obtain a hash value; (c) addressing a reference table with the hash value obtained in step (b) to obtain any reference pointers to bytes within the compressor window stored at that reference-table address and to store at that reference-table address a new reference pointer to the current byte in the compressor window; (d) for each reference pointer, if any, obtained in step (c), comparing the current byte with the byte in the compressor window pointed to by that reference pointer; (e) in the event no reference pointers were obtained in step (c) or the current byte does not match the byte in the compressor window pointed to by any of the reference pointers obtained in step (c), then outputting within the compressed data stream a flag indicating an alphabet token and a compressed representation of the byte preceding the current byte; (f) in the event the current byte matches the byte in the compressor window pointed to by any of the reference pointers addressed in step (c), then (1) receiving one or more additional bytes from the data stream and, for each, comparing the additional byte with the byte in the compressor window immediately after each match to determine if any match is extended by the additional byte, until the additional byte does not further extend any match, and (2) outputting, as part of the compressed data stream, a flag indicating string-substitution compression and a representation of the location within the compressor window where the match was found and a representation of the number of bytes in the match; (g) repeating steps (b) through (f) to receive each byte of the data stream subsequent to the first byte. - View Dependent Claims (2, 3, 48)
-
-
4. An adaptive, lossless, single-pass, general-purpose compression apparatus for compressing an input stream of a finite number of K-bit digital bytes, comprising:
-
window memory means of size M bytes to store the up-to-M most recent bytes of said input stream; hash means to generate hash values, each based on the H most-recent bytes of said input stream; reference table means, addressable by said hash values, to hold pointers into said window memory means, each pointer pointing to a sequence of H bytes within said window memory means that generates the hash value that is the reference-table address of that pointer; comparator means to verify matches between said H most-recent bytes and the window byte sequences pointed to by any pointers having a reference-table address given by the hash value of said H most-recent bytes, longest-match determining means, operable in the case where one or more said matches are verified, to identify which of said window byte sequences results in the longest match between the current input-byte sequence and that window byte sequence, and to determine the length of said longest match; copy-token encoding means, operable whenever said longest-match determining means has identified said longest match and has determined its length, said copy-token encoding means to output from said compression apparatus a copy token comprising a copy-token identifier field, a displacement field pointing to said longest match within said window memory means and a copy-length field representing the length of said longest match; alphabet-token encoding means, operable in the case where no matches are verified by said comparator means, to output from said compression apparatus an alphabet token comprising an alphabet-token identifier field and a field of variable length representing the oldest byte within said H most-recent bytes. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 71)
-
-
16. In an adaptive, lossless, single-pass, general-purpose compression apparatus for compressing an input stream of a finite number of K-bit digital bytes, an improvement comprising:
-
window memory means of size M bytes to store up-to-M most recent bytes of said input stream; hash means to generate hash values, each based on the H most-recent bytes of said input stream; reference table means, addressable by said hash values, to hold pointers into the window memory means, each pointer pointing to a sequence of H bytes within said window memory means that generates the hash value that is the reference-table address of that pointer; comparator means to verify matches between said H most-recent bytes and the window byte sequences pointed to by any pointers having a reference-table address given by the hash value of said H most-recent bytes, longest-match determining means, operable in the case where one or more said matches are verified, to identify which of said window byte sequences results in the longest match between the current input-byte sequence and that window byte sequence, and to determine the length of said matches; copy-token encoding means, operable whenever said longest-match determining means has identified said longest match, has determined its length, and has determined that said longest match either is the only verified match or is the match closest in input sequence to the most-recent byte of said input stream, said copy-token encoding means to output from said compression apparatus a copy token comprising a copy-token identifier field, a displacement field pointing to said longest match within said window memory means and a copy-length field representing the length of said longest match; differential copy-token encoding means, operable whenever said comparator means verifies more than one match and said longest-match determining means identifies said longest match as a match further in input sequence from the most-recent input byte of said input stream than one or more other verified matches, said differential copy-token encoding means to output from said compression apparatus a copy token comprising a copy-token identifier field, a displacement field pointing to said longest match within said window memory means, and a copy-length field representing the difference between the length of said longest match and the length of the longest of said one or more other verified matches. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 55, 56)
-
-
33. An adaptive, lossless, single-pass, general-purpose compression apparatus for compressing an input stream of a finite number of K-bit digital bytes, comprising:
-
window memory means of size M bytes to store up-to-M most recent bytes of said input stream; hash means to generate hash values each based on the H most-recent bytes of said input stream; reference table means, addressable by said hash values, to hold pointers into said window memory means, each pointer pointing to a sequence of H bytes within said window memory means that generates the hash value that is the reference-table address of that pointer; comparator means to verify matches between said H most-recent bytes and the window byte sequences pointed to by any pointers having a reference table address given by the hash value of said H most-recent bytes, longest-match determining means, operable in the case where one or more said matches are verified, to identify which of said window byte sequences results in the longest match between the current input-byte sequence and that window byte sequence and to determine the length of said matches; copy-token encoding means, operable whenever said longest-match determining means has identified said longest match, has determined its length, and has determined that said longest match either is the only verified match or is the match closest in input sequence to the most-recent byte of said input stream, said copy-token encoding means to output from said compression apparatus a copy token comprising a copy-token identifier field, a displacement field pointing to said longest match within said window memory means and a copy-length field representing the length of said longest match; differential copy-token encoding means, operable whenever said comparator means verifies more than one match and said longest-match determining means identifies said longest match as a match further in input sequence from the most-recent input byte of said input stream than one or more other verified matches, said differential copy-token encoding means to outputs from said compression apparatus a copy token comprising a copy-token identifier field, a displacement field pointing to the longest match within said window, and a copy-length field representing the difference between the length of said longest match and the length of the longest of said one or more other verified matches; and alphabet-token encoding means, operable in the case where no said matches are verified by said comparator means, to output from said compression apparatus an alphabet token comprising an alphabet-token identifier field and a field of variable length representing the oldest byte within said H most-recent bytes. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. In an adaptive, lossless, single-pass, general-purpose compression apparatus for compressing an input stream of a finite number of K-bit digital bytes, an improvement comprising:
-
window memory means of size M bytes to store up-to-M most recent bytes of said input stream; hash means to generate hash values, each based on the H most-recent bytes of said input stream; reference table means, addressable by said hash values, to hold pointers into the window memory means, each pointer pointing to a sequence of H bytes within said window memory means that generates the hash value that is the reference-table address of that pointer; comparator means to verify one or more matches between said current input-byte sequence and the window byte sequences pointed to by any pointers having a reference table address given by the hash value of said two most-recent bytes; said hash means generating hash values having the characteristic that for all possible values of any particular one of the H bytes, every possible value of the remaining bytes generates a unique hash value; said comparator means verifying matches between said H most-recent bytes and the window byte sequences pointed to by the pointers having a reference-table address given by the hash value of said H most-recent bytes by comparing one less than H bytes. - View Dependent Claims (46, 47, 49, 50, 51, 52)
-
-
53. In an adaptive, lossless, single-pass, general-purpose compression apparatus for compressing an input stream of a finite number of K-bit digital bytes, an improvement comprising:
-
window memory means of size M bytes to store up-to-M most recent bytes of said input stream; hash means to generate hash values, each based on the H most-recent bytes of said input stream; reference table means, addressable by said hash values, to hold pointers into the window memory means, each pointer pointing to a sequence of H bytes within said window memory means that generates the hash value that is the reference-table address of that pointer, said reference table means holding a predetermined maximum number of said pointers per reference table address; for each new byte of said input stream, said hash means generates a new hash value to address said reference table and said reference table means is updated to hold a new pointer into said window memory means, and if said predetermined maximum number of pointers per reference-table address is exceeded, then said new pointer overwrites a pointer previously stored at that reference-table address on a first in, first out basis. - View Dependent Claims (54)
-
-
57. A method of decompressing a compressed data stream of alphabet tokens based on a prefix code and copy tokens into an output data stream of uncompressed K-bit bytes, comprising:
-
parsing the compressed data stream into copy tokens and alphabet tokens; decompressing each parsed token by; (i) for each alphabet token, discarding the alphabet-token identifier field, decoding the alphabet token using a prefix-code decoder to obtain an output byte, storing the output byte at the current byte within an output memory means, advancing the current byte, and outputting the output byte to the output data stream; (ii) for each copy token, discarding its copy-token identifier field, decoding its displacement field and its copy-length field to obtain copy-length and displacement values, generating a sequence of output bytes responsive to the displacement value, the copy-length value, and the contents of the output memory means, storing the sequence of output bytes starting at the current byte within the output memory means, advancing the current byte, and outputting the sequence of output bytes to the output data stream. - View Dependent Claims (60, 61)
-
-
58. A method of decompressing a compressed data stream of alphabet tokens based on a unary-length code and copy tokens into an output data stream of uncompressed K-bit bytes, comprising:
-
parsing the compressed data stream into copy tokens and alphabet tokens; decompressing each parsed token by; (i) for each alphabet token, decoding the alphabet token using a unary-length decoder to obtain an output byte, storing the output byte at the current byte within an output memory means, advancing the current byte, and outputting the output byte to the output data stream; (ii) for each copy token, decoding its displacement field and its copy-length field to obtain copy-length and displacement values, generating a sequence of output bytes responsive to the displacement value, the copy-length value, and the contents of the window memory means, storing the sequence of output bytes starting at the current byte within the output memory means, advancing the current byte, and outputting the sequence of output bytes to the output data stream.
-
-
59. A method of decompressing a compressed data stream of alphabet tokens based on a Huffman code and copy tokens into an output data stream of uncompressed K-bit bytes, comprising:
-
parsing the compressed data stream into copy tokens and alphabet tokens; decompressing each parsed token by; (i) for each alphabet token, decoding the alphabet token using a Huffman decoder to obtain an output byte, storing the output byte at the current byte within an output memory means, advancing the current byte, and outputting the output byte to the output data stream; (ii) for each copy token, decoding its displacement field and its copy-length field to obtain copy-length and displacement values, generating a sequence of output bytes responsive to the displacement value, the copy-length value, and the contents of the window memory means, storing the sequence of output bytes starting at the current byte within the output memory means, advancing the current byte, and outputting the sequence of output bytes to the output data stream.
-
-
62. A method of decompressing a compressed data stream of alphabet tokens, copy tokens, and differential copy tokens into an output data stream of uncompressed K-bit bytes, comprising:
-
parsing said compressed data stream into copy tokens and alphabet tokens; decompressing each parsed token by; (i) for each alphabet token, decoding said alphabet token to obtain an output byte, storing said output byte at a current byte within an output memory means, advancing said current byte, outputting said output byte to said output data stream, hashing the H most-recent bytes of said output stream to generate a hash value, and storing in a reference table means at an address responsive to said hash value a pointer to said H most-recent output bytes within said output memory means; (ii) for each copy token, decoding its displacement field and its copy-length field to obtain copy-length and displacement values;
generating a sequence of output bytes in response to said displacement value, to said copy-length value, to the contents of said output memory means, and to the contents of said reference table means;
storing said output-byte sequence starting at said current byte within said output memory means;
advancing said current byte; and
, for each byte of said output-byte sequence, outputting that byte to said output stream and hashing the H most-recent bytes of said output stream to generate a hash value and storing in said reference table means, at an address responsive to said hash value, a pointer to said H most-recent output bytes within said output memory means;said output-byte-sequence generating comprising; (a) locating in said output memory means in response to said displacement value a target sequence of H bytes; (b) hashing said target sequence to obtain a hash value; (c) addressing said reference table means using said hash value to obtain any pointers stored at that reference-table address pointing to any sequences of H bytes within said output memory means that are closer to said current byte than is said target sequence; (d) for each pointer obtained in step (c), comparing the sequence of H bytes pointed to by that pointer with said target sequence; (e) for each sequence of H bytes found to be identical in step (d), comparing the bytes adjacent to said target sequence with the bytes adjacent to that identical sequence to determine the length of the match and to determine the length of the longest such match; (f) if said longest match length is determined in step (e), then modifying said copy-length value responsive to said length of said longest such match; (g) generating a sequence of output bytes that includes said target sequence and that is responsive to said displacement value, said copy-length value, and the contents of said output memory means. - View Dependent Claims (63)
-
-
64. An adaptive, lossless, single-pass, general-purpose compression method for compressing an input stream of a finite number of K-bit digital bytes, comprising:
-
storing up-to-M most recent bytes of the input stream in a window memory means; hashing the H most-recent bytes of the input stream to generate hash values; holding in a reference table means, addressable by the hash values, pointers into the window memory means, each pointer pointing to a sequence of H bytes within the window memory means that generates the hash value that is the reference-table address of that pointer; comparing to verify matches between the H most-recent bytes and the window byte sequences pointed to by any pointers having a reference-table address given by the hash value of the H most-recent bytes; in the case where any matches are verified, identifying which window byte sequence results in the longest match between the current input-byte sequence and that window byte sequence and determining the length of the longest match; whenever the longest match has been identified and its length determined, outputting a copy token comprising a copy-token identifier field, a displacement field pointing to the longest match, and a copy-length field representing the length of the match; in the case where no matches are verified, outputting an alphabet token comprising an alphabet-token identifier field and a field of variable length to represent the oldest byte within the H most-recent bytes. - View Dependent Claims (65, 66)
-
-
67. In an adaptive, lossless, single-pass, general-purpose compression method for compressing an input stream of a finite number of K-bit digital bytes, an improvement comprising:
-
storing up-to-M most-recent bytes of the input stream in a window memory; hashing the H most-recent bytes of the input stream to generate hash values; holding in a reference table addressable by the hash values, one or more pointers into the window memory for each reference table address, each pointer pointing to a sequence of H bytes within the window memory that generates the hash value that addresses the respective reference table location of said pointers; comparing to verify one or more matches between the current input-byte sequence and any of the window byte sequences pointed to by a pointer having a reference table address given by the hash value of the H most-recent bytes, in the case where one or more matches are verified, identifying which one of the one or more window byte sequences results in the longest match between the current input-byte sequence and the window byte sequence and determining the length of the longest match; whenever the comparison verifies more than one match between the current input-byte sequence and the window byte sequences, and the longest match is identified as the match further in input sequence from the most-recent input byte than a second longest match, outputting a copy token comprising a copy-token identifier field, a displacement field for pointing to the longest match and a copy-length field representing the difference between the length of the longest match and the length of the second longest match. - View Dependent Claims (68, 69)
-
-
70. An adaptive, lossless, single-pass, general-purpose compression method for compressing an input stream of a finite number of K-bit digital bytes, comprising:
-
storing up-to-M most recent bytes of the input stream in a window memory; hashing the H most-recent bytes of the input stream to generate hash values; holding in a reference table addressable by the hash values, one or more pointers into the window memory for each the reference table address, each pointer pointing to a sequence of H bytes within the window memory that generates the hash value that addresses the respective reference table location of said pointers; comparing to verify one or more matches between the current input-byte sequence and any of the window byte sequences pointed to by a pointer having a reference table address given by the hash value of the H most-recent bytes, in the case where one or more matches are verified, identifying which one of the one or more window byte .sequences results in the longest match between the current input-byte sequence and the window byte sequence and determining the length of the longest match; whenever the longest match has been identified and its length determined, and the longest match either is the match closest in input sequence to the most-recent input byte in the event of more than one match or is the only match verified by the comparison of the current input-byte sequence and any of the window byte sequences pointed to by a pointer in the reference table at the hash value address location, outputting a copy-token identifier field, a displacement field for pointing to the longest match and a copy-length field representing the length of the match; whenever more than one match between the current input-byte sequence and the window byte sequences has been verified and the longest match is further in input sequence from the most-recent input byte than a second longest match, outputting a copy token comprising a copy-token identifier field, a displacement field for pointing to the longest match and a copy-length field representing the difference between the length of the longest match and the length of the second longest match. when no matches are verified between the H most-recent bytes and any of the window byte sequences, outputting an alphabet token .comprising an alphabet-token identifier field and a field of variable length to represent the oldest byte within the H most-recent bytes. - View Dependent Claims (72)
-
-
73. In an adaptive, lossless, single-pass, general-purpose compression method for compressing an input stream of a finite number of K-bit digital bytes, an improvement comprising:
-
storing up-to-M most recent bytes of the input stream in a window memory; hashing the H most-recent bytes of the input stream to generate hash values; holding in a reference table addressable by the hash values, multiple pointers into the window memory for each reference table address, each pointer pointing to a sequence of H bytes within the window memory that generate the hash value that addresses the respective reference table location of said pointers; generating for each new byte of said input stream, a hash value with the immediately previous H bytes of said input stream, and storing in the reference table a pointer to the sequence of the last H bytes hashed after any pointer pointing to a sequence of H bytes for the input stream within the window memory for that hash value is retrieved from the reference table means, the reference table first storing a pointer at a hash value address location not then storing a pointer for the input stream, and thereafter overwriting a previously stored pointer at that hash value address on a first in, first out basis.
-
-
74. In an adaptive, lossless, single-pass, general-purpose compression method for compressing input streams, each of finite length, an improvement comprising the steps of:
-
storing up-to-M most recent bytes of the input stream in a window memory; hashing the H most-recent bytes of the input stream to generate hash values; holding in a reference table addressable by the hash values, a row validity flag and one or more pointers into the window memory for each reference table address, each pointer pointing to a sequence of H bytes within the window memory that generates the hash value that addresses the respective reference table location of said pointers; and
,after completing the compression of one input stream, resetting in a single reset operation, the row validity flags for each reference table address to invalidate pointers stored therein prior to the compression of the next input stream. - View Dependent Claims (75, 86)
-
-
76. In an adaptive, lossless, single-pass, general-purpose compression method for compressing input streams, each of finite length, an improvement comprising the steps of:
-
storing up-to-M most recent bytes of the input stream in a window memory; hashing the H most-recent bytes of the input stream to generate hash values; holding in a reference rabid addressable by the hash values, a module count field and one or more pointers into the window memory for each reference table address, each pointer pointing to a sequence of H bytes within the window memory that generates the hash value that addresses the respective reference table of said pointers, the module count field holding the count of the input streams of finite length compressed; and
,during the compression of successive input streams, addressing successive reference table addresses to determine whether the respective modulo count field is equal to the current modulo count, and when the respective modulo count field is not equal to the current modulo count, clearing each pointer at that reference table address and updating the respective modulo count field to be equal to the current modulo count; the addressing of successive reference table addresses to determine whether the respective modulo count field is equal to the current modulo count occurring sufficiently frequent so that the pointers at all reference table addresses having a non-current modulo count in the modulo count field therein are cleared and the modulo count field thereof is updated before the modulo count returns to any of such non-current modulo counts. - View Dependent Claims (77)
-
-
78. An adaptive, lossless, single-pass, general-purpose compression apparatus for compressing an input stream of a finite number of K-bit digital bytes, comprising:
-
current input-byte register means of K-bits to receive the most-recent byte of said input stream, and to store and output the same as a current input byte; previous input-byte register means of K-bits to store the next most-recent byte of said input stream and to output the same as a previous input byte; window memory means of size M bytes to store the up-to-M most recent bytes of said input stream; hash means to generate hash values, based on said current input byte and said previous input byte; reference table means, addressable by said hash values, able to hold at each address an old pointer and a new pointer into said window memory means, each pointer pointing to a sequence of two bytes within said window memory means that generates the hash value that is the reference-table address of that pointer; comparator means to verify matches between said current input byte, said previous input byte, and the window byte sequences pointed to by any pointers having a reference-table address given by the hash value of said current input byte and said previous input byte; byte counter means to generate a count responsive to the number of input bytes received from said input stream and to output said count; old-pointer search counter means to address the match based on said old pointer; new-pointer search counter means to address the match based on said new pointer; longest-match determining means to determine when each of the matches addressed by said new-pointer search counter means and by said old-pointer search counter means miscompare with said input stream; copy-length counter means, responsive to said longest-match determining means, to count the number of input bytes received from said input stream that match these stored in said window memory means to determine the length of the longest match; length encoding means to encode said longest-match length into a copy-length field; displacement encoding means to determine a displacement value responsive to said old-pointer search counter means, to said new-pointer search counter means, and to said byte counter means, and to encode said value into a displacement field pointing to said longest match within said window memory means; alphabet encoding means, responsive to the absence of a match addressed either by said new-pointer search counter means or said old-pointer seach counter means, to generate an alphabet field of variable length from said previous input byte; packing means to generate an alphabet token from said alphabet field, to generate a copy token responsive to said displacement field and said copy-length field, and to pack said copy tokens and said alphabet tokens into compressed N-bit bytes. - View Dependent Claims (79, 80, 81, 87)
-
-
82. An adaptive, lossless, single-pass, general-purpose, differential copy-length compression apparatus for compressing an input stream of a finite number of K-bit digital bytes, comprising:
-
current input-byte register means of K-bits to receive the most-recent byte of said input stream, and to store and output the same as a current input byte; previous input-byte register means of K-bits to store the next most-recent byte of said input stream and to output the same as a previous input byte; window memory means of size M bytes to store the up-to-M most recent bytes of said input stream; hash means to generate hash values, based on said current input byte and said previous input byte; reference table means, addressable by said hash values, able to hold at each address an old pointer and a new pointer into said window memory means, each pointer pointing to a sequence of two bytes within said window memory means that generates the hash value that is the reference-table address of that pointer; comparator means to verify matches between said current input byte, said previous input byte, and the window byte sequences pointed to by any pointers having a reference-table address given by the hash value of said current input byte and said previous input byte; byte counter means to generate a count responsive to the number of input bytes received from said input stream and to output said count; old-pointer search counter means to address the match based on said old pointer; new-pointer search counter means to address the match based on said new pointer; differential longest-match determining means to determine when each of the matches addressed by said new-pointer search counter means and by aid old-pointer search counter means miscompare with said input stream; copy-length counter means, responsive to said differential longest-match determining means, to determine the differential length of the longest match (i) by counting the number of input bytes received from said input stream that match these stored in said window memory means, and (ii) in the case where each of said old-pointer search counter means and said new-pointer search counter means contains an address to a match, by setting itself to an initial value when the match addressed by said new-pointer search counter means miscompares prior to the time when the match addressed by said old-pointer search counter means miscompares; length encoding means to encode said differential longest-match length into a copy-length field; displacement encoding means to determine a displacement value responsive to said old-pointer search counter means, to said new-pointer search counter means, and to said byte counter means, and to encode said value into a displacement field pointing to said longest match within said window memory means; alphabet encoding means, responsive to the absence of a match addressed either by said new-pointer search counter means or said old-pointer seach counter means, to generate an alphabet field of variable length from said previous input byte; packing means to generate an alphabet token from said alphabet field, to generate a copy token responsive to said displacement field and said copy-length field, and to pack said copy tokens and said alphabet tokens into compressed N-bit bytes. - View Dependent Claims (83, 84, 85)
-
-
88. An apparatus for decompressing a compressed data stream of alphabet tokens and copy tokens into an output data stream of decompressed K-bit bytes, comprising:
-
unpacking means to receive said compressed data stream, to parse the same into copy tokens and variable length alphabet tokens, to generate displacement fields and copy-length fields responsive to said copy tokens, and to generate alphabet fields responsive to said alphabet tokens; length decoding means to decode said copy-length field into a copy-length value; displacement decoding means to decode said displacement field into a displacement value; alphabet decoding means to decode said alphabet field into a decompressed alphabet byte; output register means of K-bits to read and to store said decompressed alphabet bytes and to output the same within said output data stream; byte counter means to generate a count responsive to the output of said output data stream and to output said count; copy-length counter means, responsive to said copy-length value, to control the number of bytes in copied output sequences; address counter means, responsive to said byte counter means and to said displacement value, to generate the address within said window memory means used in generating said copied output sequences, and to output said address; window memory means of size M bytes to store the up-to-M most recent bytes of said output data stream and to generate said copied output sequences responsive to said address counter means and to said copy-length counter means; said output register means further to read and to store said copied output sequences and to output the same within said output data stream. - View Dependent Claims (90, 91, 92)
-
-
89. An apparatus for decompressing a compressed data stream of alphabet tokens and copy tokens, including differential copy tokens, into an output data stream of decompressed K-bit bytes, comprising:
-
unpacking means to receive said compressed data stream, to parse the same into copy tokens and variable length alphabet tokens, to generate displacement fields and copy-length fields responsive to said copy tokens, and to generate alphabet fields responsive to said alphabet tokens; length decoding means to decode said copy-length field into a copy-length value; displacement decoding means to decode said displacement field into a displacement value; alphabet decoding means to decode said alphabet field into a decompressed alphabet byte; output register means of K-bits to read and to store said decompressed alphabet bytes and to output the same as a current output byte within said output data stream; byte counter means to generate a count responsive to the output of said output data stream and to output said count; address counter means, responsive to said byte counter means and to said displacement value, to generate the address within said window memory means used in generating copied output sequences, and to output said address; window memory means of size M bytes to store the up-to-M most recent bytes of said output data stream; previous output register means of K-bits to store the previous contents of said output register means and to output the same as a previous output byte; hash function means to generate hash values based on said current output byte and said previous output byte; reference table means, addressable by said hash values, able to hold at each address an old pointer and a new pointer into said window memory means, each pointer pointing to a sequence of two bytes within said window memory means that generates the hash value that is the reference-table address of that pointer; second-match counter means to address the match based on said new pointer; differential copy length detection means first to compare the contents of said second-match counter means with said address counter means and second to compare the contents of said window memory means with said current output byte to detect the presence of a differential copy length by a match of said first comparison and a mismatch of said second comparison means; differential length determining means, operative in the case where a differential copy length is present, to determine when the match addressed by second-match counter means miscompares with the match addressed by said address counter means, whereby the difference between the actual and differential copy lengths is determined; copy-length counter means, responsive to said copy-length value and to said differential length determining means, to control the number of bytes in said copied output sequences; said window memory means further to generate said copied output sequences responsive to said address counter means and to said copy-length counter means.
-
-
93. In an apparatus for hash-based data compression/decompression, an improvement wherein said apparatus includes:
-
reference table means having more than one row, each said row being addressable by a particular hash value, each said row including a used flag of a single bit, and each said row being able to store one or more window-memory pointers; reference-table clearing means, operable prior to each operation of said compression/decompression apparatus, to initialize said used bit of every reference-table row to a first predetermined value; reference-table updating means, operable when storing one or more window-memory pointers into a particular row of said reference table, including means to set the used bit of said particular row to a second predetermined value; row validity determining means, operable when reading a particular row of said reference table, including means to determine whether or not the window-memory pointers of said particular row are valid based on the value of the used bit of said particular row; whereby every window-memory pointer need not be cleared to an otherwise illegal value prior ro each operation of said compression/decompression apparatus.
-
-
94. In an apparatus for hash-based data compression/decompression, an improvement wherein said apparatus includes:
-
reference table means having more than one row, each said row being addressable by a particular hash value, each said row including a modulo count field of M bits, M being greater than one, and each said row being able to store one or more window-memory pointers; reference-table clearing means, operable prior to the first operation of said compression/decompression apparatus, to initialize said modulo count field of every reference-table row to a first predetermined value; modulo operation counter means of M bits, to generate and output a modulo operation count responsive to the number of operations of said compression/decompression apparatus; reference-table updating means, operable when storing one or more window-memory pointers into a particular row of said reference table, including means to set the modulo operation field of said particular row to said modulo operation count; row validity determining means, operable when reading a particular row of said reference table, including means to determine that the window-memory pointers of said particular row are valid based on the value of the modulo operation field of said particular row matching said modulo operation count; whereby every window-memory pointer need not be cleared to an otherwise illegal value prior ro each operation of said compression/decompression apparatus.
-
-
95. In an adaptive, lossless, single-pass, general-purpose compression method for compressing an input stream of a finite number of K-bit digital bytes, an improvement comprising:
-
storing up-to-M most recent bytes of the input stream in a window memory; hashing the most-recent two bytes of the input stream to generate hash values; holding in a reference table addressable by the hash values, at least one pointer for the input stream for each reference table address, each pointer pointing to a sequence of two bytes within the window memory that generates the hash value that addresses the respective reference table location of said pointers; the hash values having the characteristic that for any possible value of a first of the two bytes, every possible value of the second of the two bytes will hash to a unique table address; comparing to verify one or more matches between the current input-byte sequence and any of the window byte sequences pointed to by a pointer having a reference table address given by the hash value of the most recent two bytes by comparing the first of the two bytes with the corresponding byte of each byte sequence pointed to by a pointer having a reference table address given by the hash value.
-
-
96. In an adaptive, lossless, single-pass, general-purpose compression method for compressing an input stream of a finite number of K-bit digital bytes, an improvement comprising:
-
storing up-to-M most recent bytes of the input stream in a window memory; hashing the most-recent two bytes of the input stream to generate hash values; holding in a reference table addressable by the hash values, at least one pointer for the input stream for each reference table address, each pointer pointing to a sequence of two bytes within the window memory that generates the hash value that addresses the respective reference table location of said pointers; the hash values having the characteristic that for any possible value of a first of the two bytes, every possible value of the second of the two bytes will hash to a unique table address; comparing to verify one or more matches between the current input-byte sequence and any of the window byte sequences pointed to by a pointer having a reference table address given by the hash value of the most recent two bytes by comparing the second of the two bytes of the input stream to be be received with the corresponding byte of each byte sequence pointed to by a pointer having a reference table address given by the hash value.
-
Specification