Lempel- Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases
First Claim
1. A data compression method for compressing a text containing a sequence of characters, comprising the steps of:
- (a) selecting a pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters;
(b) initializing a pointer to a first character in said sequence of characters for compression;
(c) comparing characters starting at said pointer with sequences of characters stored in said pre-filled data compression dictionary and determining a longest match of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary;
(d) storing a dictionary pointer to said longest match in said pre-filled data compression dictionary in a memory as a compressed representation of said characters making up said longest match;
(e) moving said pointer to a character in said sequence of characters which follows said longest match; and
(f) repeating steps c-e for all characters in said sequence of characters to be compressed.
3 Assignments
0 Petitions
Accused Products
Abstract
An adaptive compression technique which is an improvement to Lempel-Ziv (LZ) compression techniques, both as applied for purposes of reducing required storage space and for reducing the transmission time associated with transferring data from point to point. Pre-filled compression dictionaries are utilized to address the problem with prior Lempel-Ziv techniques in which the compression software starts with an empty compression dictionary, whereby little compression is achieved until the dictionary has been filled with sequences common in the data being compressed. In accordance with the invention, the compression dictionary is pre-filled, prior to the beginning of the data compression, with letter sequences, words and/or phrases frequent in the domain from which the data being compressed is drawn. The letter sequences, words, and/or phrases used in the pre-filled compression dictionary may be determined by statistically sampling text data from the same genre of text. Multiple pre-filled dictionaries may be utilized by the compression software at the beginning of the compression process, where the most appropriate dictionary for maximum compression is identified and used to compress the current data. These modifications are made to any of the known Lempel-Ziv compression techniques based on the variants detailed in 1977 and 1978 articles by Ziv and Lempel.
286 Citations
68 Claims
-
1. A data compression method for compressing a text containing a sequence of characters, comprising the steps of:
-
(a) selecting a pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters; (b) initializing a pointer to a first character in said sequence of characters for compression; (c) comparing characters starting at said pointer with sequences of characters stored in said pre-filled data compression dictionary and determining a longest match of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary; (d) storing a dictionary pointer to said longest match in said pre-filled data compression dictionary in a memory as a compressed representation of said characters making up said longest match; (e) moving said pointer to a character in said sequence of characters which follows said longest match; and (f) repeating steps c-e for all characters in said sequence of characters to be compressed. - View Dependent Claims (2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
7. A method as in step 6, wherein said step of selecting said one pre-filled data compression dictionary comprises the step of selecting an empty data compression dictionary in the event that it is determined while performing steps b-e for said subset of characters of said sequence of data characters to be compressed that starting with an empty data compression dictionary would allow for the most compression of said sequence of data characters.
-
16. A data compression method for compressing a text containing a sequence of characters, comprising the steps of:
-
(a) selecting a pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters; (b) initializing a data specific data compression dictionary; (c) initializing a pointer to a first character in said sequence of characters for compression; (d) comparing characters starting at said pointer with sequences of characters stored in said pre-filled data compression dictionary and said data specific data compression dictionary and determining a dictionary entry number of a longest match of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary and said data specific data compression dictionary; (e) storing said dictionary entry number and an extension character in a memory as a compressed representation of said characters making up said longest match and said extension character, said extension character being that character in said sequence of characters to be compressed which occurs after said longest match starting at said pointer; (f) selectively storing said characters making up said longest match and said extension character in said data specific data compression dictionary as a new dictionary entry; (g) moving said pointer to a character in said sequence of characters which follows said extension character; and (h) repeating steps d-g for all characters in said sequence of characters to be compressed. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A data compression method for compressing a text containing a sequence of characters, comprising the steps of:
-
(a) initializing a character window which contains a predetermined number of characters; (b) appending a pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters to said character window; (c) initializing a pointer to a first character in said sequence of characters for compression; (d) comparing characters starting at said pointer with sequences of characters in said character window with said pre-filled data compression dictionary appended thereto and determining a window pointer to and a length of a longest match of said characters starting at said pointer with said sequences of characters in said character window with said pre-filled data compression dictionary appended thereto; (e) storing said window pointer and said length of said longest match in a memory as a compressed representation of said characters making up said longest match; (f) updating said character window to include the characters making up said longest match; (g) moving said pointer to a character in said sequence of characters which follows said longest match; and (h) repeating steps d-g for all characters in said sequence of characters to be compressed. - View Dependent Claims (24, 25)
-
-
26. A data compression method for compressing a text containing a sequence of characters, comprising the steps of:
-
(a) selecting a pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters; (b) initializing a character window which contains a predetermined number of characters; (c) initializing a pointer to a first character in said sequence of characters for compression; (d) comparing characters starting at said pointer with sequences of characters stored in said pre-filled data compression dictionary and with sequences of characters in said character window and determining longest matches of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary and with sequences of characters in said character window; (e) determining whether greater compression will be obtained by representing said characters starting at said pointer as a dictionary entry number of a longest match of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary or by representing said characters starting at said pointer as a window pointer to and a length of a longest match of said characters starting at said pointer with said sequences of characters in said character window; (f) storing said window pointer and said length of said longest match in a memory as a compressed representation of said characters making up said longest match when it is determined in step e that greater compression will be obtained by representing said characters starting at said pointer as said window pointer to and said length of said longest match of said characters starting at said window pointer with said sequences of characters in said character window;
otherwise storing said dictionary entry number of said longest match of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary;(g) updating said character window to include the characters making up said longest match; (h) moving said pointer to a character in said sequence of characters which follows said longest match; and (I) repeating steps d-h for all characters in said sequence of characters to be compressed. - View Dependent Claims (27)
-
-
28. A method of decompressing a compressed representation of a sequence of characters, said compressed representation comprising dictionary pointers to respective longest matches of sequences of characters starting at a pointer to particular characters within said sequence of characters with sequences of characters stored in a pre-filled data compression dictionary, said pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters, said decompressing method comprising the steps of:
-
(a) initializing a pointer to a first dictionary pointer in said compressed representation of said characters; (b) retrieving a dictionary entry from said pre-filled data compression dictionary using said dictionary pointer pointed to by said pointer; (c)storing said dictionary entry as a decompressed representation of the characters making up said longest match; (d) moving said pointer to a next dictionary pointer in said compressed representation of said characters; and (e) repeating steps b-d for all dictionary pointers in said compressed representation of said characters until all compressed characters in said sequence of characters have been decompressed. - View Dependent Claims (29)
-
-
30. A method of decompressing a compressed representation of a sequence of characters, said compressed representation comprising extension characters and dictionary entry numbers of respective longest matches of sequences of characters starting at a pointer to particular characters within said sequence of characters with sequences of characters stored in a pre-filled data compression dictionary and a data specific data compression dictionary, said extension character being that character in the sequence of characters which occurs after the longest match starting at said pointer, said pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters, said decompressing method comprising the steps of:
-
(a) initializing a data specific data decompression dictionary; (b) initializing a pointer to a first dictionary entry number in said compressed representation of said characters; (c) retrieving a dictionary entry and an extension character from one of said pre-filled data compression dictionary and said data specific data decompression dictionary using a dictionary entry number pointed to by said pointer; (d) selectively storing said characters making up said longest match and said extension character into said data specific data decompression dictionary; (e) moving said pointer to a next dictionary entry number in said compressed representation of said characters; and (f) repeating steps c-e for all dictionary entry numbers and extension characters in said compressed representation of said characters until all compressed characters in said sequence of characters have been decompressed. - View Dependent Claims (31)
-
-
32. A method of decompressing a compressed representation of a sequence of characters, said compressed representation comprising window pointers and lengths of respective longest matches of sequences of characters starting at a pointer to particular characters within said sequence of characters with sequences of characters in a character window of a predetermined size with a pre-filled data compression dictionary appended thereto, said pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters, said decompressing method comprising the steps of:
-
(a) initializing a pointer to a first window pointer and length in said compressed representation of said characters; (b) retrieving a number of characters determined by said length starting at a character within a current character window pointed to by a window pointer which is pointed to by said pointer; (c) storing said retrieved characters as a decompressed representation of the characters making up said longest match; (d) moving said pointer to a next window pointer and length in said compressed representation of said characters; and (e) repeating steps b-d for all window pointers and lengths in said compressed representation of said characters until all compressed characters in said sequence of characters have been decompressed. - View Dependent Claims (33)
-
-
34. A method of decompressing a compressed representation of a sequence of characters, said compressed representation comprising (1) window pointers and lengths of respective longest matches of sequences of characters starting at a pointer to particular characters within said sequence of characters with sequences of characters in a character window of a predetermined size and (2) dictionary entry numbers of respective longest matches of sequences of characters starting at said pointer to particular characters within a pre-filled data decompression dictionary, said pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters, said decompressing method comprising the steps of:
-
(a) initializing a pointer to a first entry in said compressed representation of said characters; (b) determining whether a current entry in said compressed representation of said characters pointed to by said pointer is (1) a window pointer and a length or (2) a dictionary entry number; (c) if said current entry is a window pointer and a length, retrieving a number of characters determined by said length starting at a character within a current character window pointed to by said window pointer; (d) if said current entry is a dictionary entry number, retrieving characters at a dictionary entry in said pre-filled data decompression dictionary identified by said dictionary entry number; (e) storing characters retrieved in steps c or d as a decompressed representation of the characters making up the longest match for the current entry; (f) moving said pointer to a next entry in said compressed representation of said characters; and (g) repeating steps b-f for all entries in said compressed representation of said characters until all compressed characters in said sequence of characters have been decompressed. - View Dependent Claims (35)
-
-
36. A data compression system for compressing a text containing a sequence of characters, comprising:
-
a pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters; a memory which stores said text after it has been compressed; and compressing means for performing the steps of (a) initializing a pointer to a first character in said sequence of characters for compression, (b) comparing characters starting at said pointer with sequences of characters stored in said pre-filled data compression dictionary and determining a longest match of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary, (c) storing a dictionary pointer to said longest match in said pre-filled data compression dictionary in said memory as a compressed representation of said characters making up said longest match, (d) moving said pointer to a character in said sequence of characters which follows said longest match, and (e) repeating steps b-d for all characters in said sequence of characters to be compressed. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
-
49. A data compression system for compressing a text containing a sequence of characters, comprising:
-
a pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters; a data specific data compression dictionary; a memory which stores said text after it has been compressed; and compressing means for performing the steps of (a) initializing a pointer to a first character in said sequence of characters for compression, (b) comparing characters starting at said pointer with sequences of characters stored in said pre-filled data compression dictionary and said data specific data compression dictionary and determining a dictionary entry number of a longest match of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary and said data specific data compression dictionary, (c) storing said dictionary entry number and an extension character in said memory as a compressed representation of said characters making up said longest match and said extension character, said extension character being that character in said sequence of characters to be compressed which occurs after said longest match starting at said pointer, (d) selectively storing said characters making up said longest match and said extension character in said data specific data compression dictionary as a new dictionary entry, (e) moving said pointer to a character in said sequence of characters which follows said extension character, and (f) repeating steps b-e for all characters in said sequence of characters to be compressed. - View Dependent Claims (50, 51, 52, 53, 54, 55)
-
-
56. A data compression system for compressing a text containing a sequence of characters, comprising:
-
a dictionary memory which stores a character window which contains a predetermined number of characters and a pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters; a compressed data memory which stores said text after it has been compressed; and compressing means for performing the steps of (a) initializing a pointer to a first character in said sequence of characters for compression, (b) comparing characters starting at said pointer with sequences of characters in said dictionary memory and determining a window pointer to and a length of a longest match of said characters starting at said pointer with said sequences of characters in said pre-filled data compression dictionary and said character window, (c) storing said window pointer and said length of said longest match in said compressed data memory as a compressed representation of said characters making up said longest match, (d) updating said character window to include the characters making up said longest match, (e) moving said pointer to a character in said sequence of characters which follows said longest match, and (f) repeating steps b-e for all characters in said sequence of characters to be compressed. - View Dependent Claims (57, 58)
-
-
59. A data compression system for compressing a text containing a sequence of characters, comprising:
-
a pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters; a character window which contains a predetermined number of characters; a compressed data memory which stores said text after it has been compressed; and compressing means for performing the steps of (a) initializing a pointer to a first character in said sequence of characters for compression, (b) comparing characters starting at said pointer with sequences of characters stored in said pre-filled data compression dictionary and with sequences of characters in said character window and determining longest matches of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary and with sequences of characters in said character window, (c) determining whether greater compression will be obtained by representing said characters starting at said pointer as a dictionary entry number of a longest match of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary or by representing said characters starting at said pointer as a window pointer to and a length of a longest match of said characters starting at said pointer with said sequences of characters in said character window, (d) storing said window pointer and said length of said longest match in said compressed data memory as a compressed representation of said characters making up said longest match when it is determined in step c that greater compression will be obtained by representing said characters starting at said pointer as said window pointer to and said length of said longest match of said characters starting at said window pointer with said sequences of characters in said character window;
otherwise storing said dictionary entry number of said longest match of said characters starting at said pointer with said sequences of characters stored in said pre-filled data compression dictionary, (e) updating said character window to include the characters making up said longest match, (f) moving said pointer to a character in said sequence of characters which follows said longest match, and (g) repeating steps b-f for all characters in said sequence of characters to be compressed. - View Dependent Claims (60)
-
-
61. A data decompression system which decompresses a compressed representation of a sequence of characters, said compressed representation comprising dictionary pointers to respective longest matches of sequences of characters starting at a pointer to particular characters within said sequence of characters with sequences of characters stored in a pre-filled data compression dictionary, said pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters, said decompressing system comprising:
-
a compressed data memory for storing said compressed representation of said sequence of characters; a decompressed data memory for storing said sequence of characters after decompression; and decompressing means for performing the steps of;
(a) initializing a pointer to a first dictionary pointer in said compressed representation of said characters in said compressed data memory, (b) retrieving a dictionary entry from said pre-filled data compression dictionary using said dictionary pointer pointed to by said pointer, (c) storing said dictionary entry in said decompressed data memory as a decompressed representation of the characters making up said longest match, (d) moving said pointer to a next dictionary pointer in said compressed representation of said characters, and (e) repeating steps b-d for all dictionary pointers in said compressed representation of said characters in said compressed data memory until all compressed characters in said sequence of characters have been decompressed. - View Dependent Claims (62)
-
-
63. A data decompression system which decompresses a compressed representation of a sequence of characters, said compressed representation comprising extension characters and dictionary entry numbers of respective longest matches of sequences of characters starting at a pointer to particular characters within said sequence of characters with sequences of characters stored in a pre-filled data compression dictionary and a data specific data compression dictionary, said extension character being that character in the sequence of characters which occurs after the longest match starting at said pointer, said pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters, said data decompression system comprising:
-
a compressed data memory for storing said compressed representation of said sequence of characters; a decompressed data memory for storing said sequence of characters after decompression; and decompressing means for performing the steps of;
(a) initializing a data specific data decompression dictionary, (b) initializing a pointer to a first dictionary entry number in said compressed representation of said characters in said compressed data memory, (c) retrieving a dictionary entry and an extension character from one of said pre-filled data compression dictionary and said data specific data decompression dictionary using a dictionary entry number pointed to by said pointer, (d) storing said dictionary entry in said decompressed data memory as a decompressed representation of the characters making up said longest match,(e) selectively storing said characters making up said longest match and said extension character into said data specific data decompression dictionary, (f) moving said pointer to a next dictionary entry number in said compressed representation of said characters in said compressed data memory, and (g) repeating steps c-f for all dictionary entry numbers and extension characters in said compressed representation of said characters until all compressed characters in said sequence of characters have been decompressed. - View Dependent Claims (64)
-
-
65. A data decompression system which decompresses a compressed representation of a sequence of characters, said compressed representation comprising window pointers and lengths of respective longest matches of sequences of characters in a character window having a pre-filled data compression dictionary appended thereto with a sequence of characters to be compressed, said pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters, said data decompression system comprising:
-
a compressed data memory for storing said compressed representation of said sequence of characters; a decompressed data memory for storing said sequence of characters after decompression; and decompressing means for performing the steps of;
(a) initializing a pointer to a first window pointer and length in said compressed representation of said characters in said compressed data memory, (b) retrieving a number of characters determined by said length starting at a character within a current character window pointed to by a window pointer which is pointed to by said pointer, (c) storing said retrieved characters as a decompressed representation of the characters making up said longest match in said decompressed data memory, (d) moving said pointer to a next window pointer and length in said compressed representation of said characters in said compressed data memory, and (e) repeating steps b-d for all window pointers and lengths in said compressed representation of said characters until all compressed characters in said sequence of characters have been decompressed. - View Dependent Claims (66)
-
-
67. A data decompression system which decompresses a compressed representation of a sequence of characters, said compressed representation comprising (1) window pointers and lengths of respective longest matches of sequences of characters starting at a pointer to particular characters within said sequence of characters with sequences of characters in a character window of a predetermined size and (2) dictionary entry numbers of respective longest matches of sequences of characters starting at said pointer to particular characters within a pre-filled data decompression dictionary, said pre-filled data compression dictionary containing predetermined combinations of characters likely to occur frequently in said sequence of characters, said data decompression system comprising:
-
a compressed data memory for storing said compressed representation of said sequence of characters; a decompressed data memory for storing said sequence of characters after decompression; and decompressing means for performing the steps of;
(a) initializing a pointer to a first entry in said compressed representation of said characters in said compressed data memory, (b) determining whether a current entry in said compressed representation of said characters pointed to by said pointer is (1) a window pointer and a length or (2) a dictionary entry number, (c) if said current entry in said compressed representation is a window pointer and a length, retrieving a number of characters determined by said length starting at a character within a current character window pointed to by said window pointer, (d) if said current entry in said compressed representation is a dictionary entry number, retrieving characters at a dictionary entry in said pre-filled data decompression dictionary identified by said dictionary entry number, (e) storing characters retrieved in steps c or d in said decompressed data memory as a decompressed representation of the characters making up the longest match for the current entry, (f) moving said pointer to a next entry in said compressed representation of said characters in said compressed data memory, and (g) repeating steps b-f for all entries in said compressed representation of said characters until all compressed characters in said sequence of characters have been decompressed. - View Dependent Claims (68)
-
Specification