Text compression and expansion method and apparatus
First Claim
Patent Images
1. A method of data compression, comprising steps of:
- separating an uncompressed, coded data stream into units;
comparing said units against at least one user-selected dictionary of units having compressed code equivalents for each unit stored in association with the uncompressed encoded units; and
outputting a header for compressed data comprising indications for defining the identity of each of said user-selected dictionaries used in compressing said data;
outputting the compressed code equivalents for incoming units for which a true comparison is found in said comparing step and outputting the uncompressed encoded character stream for any said unit for which no true comparison is found; and
arranging the entries in said dictionaries for said units in an order of merit which is the weighted frequency of use established by multiplying the average frequency of occurrence of each said unit in the language in which it is used by its length in characters and arranging the resulting products of said multiplying in decreasing order of magnitude.
1 Assignment
0 Petitions
Accused Products
Abstract
A text compression method and apparatus are disclosed that enable overall compression ratios of more than six or eight to one for normal language text. Plural multiple-word dictionaries that are specialized for the particular field of use are employed together with a header transmission format that identifies which dictionaries are to be used. In addition, entries in these dictionaries are categorized by a weighted frequency of use ranking in which the product of the word length in characters and the frequency of occurrence of that word in the text is taken as the weighted figure of merit for ranking words to be placed in the individual dictionaries.
-
Citations
3 Claims
-
1. A method of data compression, comprising steps of:
-
separating an uncompressed, coded data stream into units; comparing said units against at least one user-selected dictionary of units having compressed code equivalents for each unit stored in association with the uncompressed encoded units; and outputting a header for compressed data comprising indications for defining the identity of each of said user-selected dictionaries used in compressing said data; outputting the compressed code equivalents for incoming units for which a true comparison is found in said comparing step and outputting the uncompressed encoded character stream for any said unit for which no true comparison is found; and arranging the entries in said dictionaries for said units in an order of merit which is the weighted frequency of use established by multiplying the average frequency of occurrence of each said unit in the language in which it is used by its length in characters and arranging the resulting products of said multiplying in decreasing order of magnitude. - View Dependent Claims (2)
-
-
3. A data compression and decompression system, comprising:
-
a transmitter and a receiver and means for connecting said transmitter to said receiver for the transmission and reception of compressed data; said transmitter including means for accepting an incoming uncompressed encoded data stream and for separating said stream into units; said transmitter also comprising means for comparing said units against at least one user selected compiled dictionary of units having compressed code equivalents for each unit stored in association with the uncompressed encoded units and further comprising; means for outputting a compressed data header comprising indications for defining the identity of each said user selected dictionary used in compressing said data; and means for outputting the compressed code equivalents for incoming units for which a true comparison is found in said comparing step and outputting the uncompressed encoded data stream for any said unit for which no true comparison is found; said receiver including means for receiving and analyzing said header for defining the identity of each of said user selected dictionaries used in compressing said compressed data stream and; means for separating said incoming compressed encoded data stream into units; means for comparing said units against at least one of said user selected and header identified compiled dictionary of units having compressed code equivalents for each unit stored in association with the uncompressed encoded units; and means for outputting uncompressed equivalents for incoming units for which a true comparison is found in said comparing step and for outputting said data stream for any said unit for which no true comparison is found.
-
Specification