Apparatus and method for compressing data signals and restoring the compressed data signals
First Claim
1. A data compression and decompression system for compressing digital data and recovering the original data from the compressed data, said system including means for reading the digital data from a medium to provide a stream of data symbol signals, said data symbol signals being members of an alphabet of symbol signals comprising a predetermined number of symbol signals, said system including compression apparatus responsive to said stream of data symbol signals for providing a compressed stream of code signals corresponding thereto and applying said code signals to a medium, said system further including decompression apparatus responsive to said code signals for recovering the original data therefrom, said compression apparatus comprising:
- means for parsing said stream of data symbol signals into segments, each segment comprising a prefix and the next data symbol signal occurring in said stream after said prefix, said prefix comprising the longest match with a previous segment, said parsing means defining a first search tree having nodes and branches emanating from said nodes, each node having at most one incoming branch, the outgoing branches emanating from a node being representative of said symbol signals of said alphabet, respectively, said nodes having respective node labels associated therewith and respective addresses associated therewith, said first search tree having a unique root node without an incoming branch and leaf nodes without outgoing branches,said parsing means comprising means for tracing said stream of data symbol signals through said first search tree along a path from said root node to a leaf node, including means responsive to said stream of data symbol signals for determining the branches of said path in accordance with said symbol signals respectively,said parsing means comprisingfirst storage means comprising a plurality of storage locations corresponding respectively to said addresses of said nodes of said tree, said storage locations being utilized for storing said node labels, a node label stored at a particular storage location identifying the node whose address corresponds to said particular storage location,address generator means for generating an address signal and for providing said address signal to said first storage means for accessing said storage locations thereof, said first storage means providing, in response thereto, the node label stored in the storage location accessed by said address signal, said address generator means being responsive to a node label provided by said first storage means and to a data symbol signal from said stream of data symbol signal for generating said address signal in accordance therewith,first detecting means for detecting when an accessed storage location of said first storage means does not contain a node label, said storage location not containing a node label defining a leaf node of said first search tree, andnode counting means for inserting a node label into said accessed storage location of said first storage means detected by said first detecting means as not containing a node label;
said address generator means comprising means for sequentially generating said address signals and for providing said address signals to said first storage means, the node label provided by said first storage means in response to an address signal providing the node label for generating the next address signal, said address generator means sequentially generating said address signals in accordance with the node labels sequentially provided by said first storage means respectively and in accordance with successive symbol signals of said stream of data symbol signals respectively, said address generator means sequentially generating said address signals until said first detecting means detects an accessed storage location of said first storage means without a node label; and
means responsive to the address of said accessed storage location not containing a node label for generating a pointer signal in accordance therewith for each said segment, said pointer signal being representative of said previous segment matching said prefix and of said next occurring data symbol signal after said prefix,thereby providing pointer signals for segments of said stream of data symbol signals, respectively,said pointer signals comprising said compressed stream of code signals.
2 Assignments
0 Petitions
Accused Products
Abstract
A compressor parses the input data stream into segments where each segment comprises a prefix and the next symbol in the data stream following the prefix. The prefix of a segment is the longest match with a previously parsed segment of the data stream. The compressor constructs a search tree data base to effect the parsing and to generate a pointer for each segment pointing to the previous segment matching the prefix. The search tree comprises internal nodes including a root and external nodes denoted as leaves. The nodes are interconnected by branches representative of symbols of the alphabet. Each parsed segment of the input data is represented by a path from the root to a leaf. The tree is adaptively constructed from the input data such that as each new segment is parsed, one new internal node of the tree is created from a leaf and new leaves are defined, one for each symbol already encountered by the encoder plus an additional branch to represent all potential but unseen symbols. The compressor transmits a leaf pointer signal for each parsed segment representative of the prefix thereof and the suffixed symbol of the alphabet. A decompressor constructs an identical search tree in response to the received leaf pointers so as to reconstitute the original data stream.
348 Citations
244 Claims
-
1. A data compression and decompression system for compressing digital data and recovering the original data from the compressed data, said system including means for reading the digital data from a medium to provide a stream of data symbol signals, said data symbol signals being members of an alphabet of symbol signals comprising a predetermined number of symbol signals, said system including compression apparatus responsive to said stream of data symbol signals for providing a compressed stream of code signals corresponding thereto and applying said code signals to a medium, said system further including decompression apparatus responsive to said code signals for recovering the original data therefrom, said compression apparatus comprising:
-
means for parsing said stream of data symbol signals into segments, each segment comprising a prefix and the next data symbol signal occurring in said stream after said prefix, said prefix comprising the longest match with a previous segment, said parsing means defining a first search tree having nodes and branches emanating from said nodes, each node having at most one incoming branch, the outgoing branches emanating from a node being representative of said symbol signals of said alphabet, respectively, said nodes having respective node labels associated therewith and respective addresses associated therewith, said first search tree having a unique root node without an incoming branch and leaf nodes without outgoing branches, said parsing means comprising means for tracing said stream of data symbol signals through said first search tree along a path from said root node to a leaf node, including means responsive to said stream of data symbol signals for determining the branches of said path in accordance with said symbol signals respectively, said parsing means comprising first storage means comprising a plurality of storage locations corresponding respectively to said addresses of said nodes of said tree, said storage locations being utilized for storing said node labels, a node label stored at a particular storage location identifying the node whose address corresponds to said particular storage location, address generator means for generating an address signal and for providing said address signal to said first storage means for accessing said storage locations thereof, said first storage means providing, in response thereto, the node label stored in the storage location accessed by said address signal, said address generator means being responsive to a node label provided by said first storage means and to a data symbol signal from said stream of data symbol signal for generating said address signal in accordance therewith, first detecting means for detecting when an accessed storage location of said first storage means does not contain a node label, said storage location not containing a node label defining a leaf node of said first search tree, and node counting means for inserting a node label into said accessed storage location of said first storage means detected by said first detecting means as not containing a node label; said address generator means comprising means for sequentially generating said address signals and for providing said address signals to said first storage means, the node label provided by said first storage means in response to an address signal providing the node label for generating the next address signal, said address generator means sequentially generating said address signals in accordance with the node labels sequentially provided by said first storage means respectively and in accordance with successive symbol signals of said stream of data symbol signals respectively, said address generator means sequentially generating said address signals until said first detecting means detects an accessed storage location of said first storage means without a node label; and means responsive to the address of said accessed storage location not containing a node label for generating a pointer signal in accordance therewith for each said segment, said pointer signal being representative of said previous segment matching said prefix and of said next occurring data symbol signal after said prefix, thereby providing pointer signals for segments of said stream of data symbol signals, respectively, said pointer signals comprising said compressed stream of code signals. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 239)
-
13. The compression apparatus of claim 1 in which said first storage means includes a directly addressed portion and a hash addressed portion and said address generator means further comprises
hashing limit comparator means for comparing said address signal to a predetermined hashing limit, and hashing means for hashing said address signal thereby providing a hashed address signal, said address generator means including means for providing either said address signal or said hashed address signal to said first storage means in accordance with the comparison between said address signal and said hashing limit. -
14. The compression apparatus of claim 1 in which said means defining a first search tree comprises means defining a first search tree having alpha-branches emanating from said nodes representative of all of the symbol signals of said alphabet not yet encountered during said compressing of said stream of data symbol signals.
-
15. The compression apparatus of claim 10 in which said means defining a first search tree comprises means defining a first seach tree having alpha-branches emanating from said nodes representative of all of the symbol signals of said alphabet not yet encountered during said compressing of said stream of data symbol signals, an alpha-branch of said first search tree being encountered when said second detecting means detects an accessed storage location of said second storage means not containing a symbol position signal.
-
16. The compression apparatus of claim 10 in which said means for generating a pointer signal includes leaves-only converter means responsive to said address signals from said address generator means and said node labels from said first storage means for providing a leaves-only pointer signal in accordance therewith.
-
17. The compression apparatus of claim 16 in which said leaves-only converter means comprises means for providing said leaves-only pointer signal in accordance with the difference between said address signals and said node labels, respectively.
- 18. The compression apparatus of claim 17 in which said leaves-only converter means comprises means for providing said leaves-only pointer signal in accordance with
- space="preserve" listing-type="equation">D=C-E-1
where D comprises said leaves-only pointer signal, C comprises said address signal and E comprises said node label.
-
-
19. The compression apparatus of claim 17 in which said leaves-only converter means includes means for inhibiting taking said difference between an address signal and a node label whenever the symbol signal that resulted in the address signal is a predetermined symbol signal of said alphabet.
-
20. The compression apparatus of claim 17 in which said leaves-only converter means includes means for inhibiting taking said difference between an address signal and a node label whenever the symbol signal that resulted in the addressing signal is the first symbol signal encountered by said encoding apparatus in compressing said stream of data symbol signals.
-
21. The compression apparatus of claim 17 in which said leaves-only converter means includes means responsive to said symbol position signals for inhibiting taking said difference between an address signal and a node label whenever the symbol position signal resulting in the address signal is that symbol position signal assigned to a predetermined symbol signal.
-
22. The compression apparatus of claim 17 in which said leaves-only converter means includes means responsive to said symbol position signals for inhibiting taking said difference between an address signal and a node label whenever the symbol position signal resulting in the address signal is that symbol position signal assigned to the first symbol signal of said alphabet encountered by said encoding apparatus in compressing said stream of data symbol signals.
-
23. The compression apparatus of claim 16 in which said means for generating a pointer signal further includes green-leaves converter means responsive to said leaves-only pointer signal for generating a green-leaves pointer signal corresponding thereto in accordance with the number of symbol signals of said alphabet not yet encountered by said encoding apparatus during said compressing of said stream of data symbol signals.
-
24. The compression apparatus of claim 16 in which said means for generating a pointer signal further includes green-leaves converter means responsive to said leaves-only pointer signal for generating a green-leaves pointer signal corresponding thereto in accordance with the number of symbol signals of said alphabet currently encountered by said encoding apparatus during said compressing of said stream of data symbol signals.
-
25. The compression apparatus of claim 23 in which greenleaves converter means includes means for generating said green-leaves pointer signal further in accordance with said number of symbol signals of said alphabet.
- 26. The compression apparatus of claim 25 in which said green-leaves converter means comprises means for generating said green-leaves pointer signal in accordance with
- space="preserve" listing-type="equation">D.sub.g
space="preserve" listing-type="equation">=D+(1-NSYM)(D/(|A|-1))where Dg comprises said green-leaves pointer signal, D comprises said leaves-only pointer signal, NSYM comprises a signal representative of the number of symbol signals of said alphabet not yet encountered by said encoding apparatus during said compressing of said stream of data symbol signals, |A| comprises the number of symbol signals of said alphabet and Dg is generated utilizing integer arithmetic.
- space="preserve" listing-type="equation">TB=2H-KB
space="preserve" listing-type="equation">Converted D.sub.g =(2D.sub.g
space="preserve" listing-type="equation">-TB)/2
- space="preserve" listing-type="equation">TA=2NSYM-KA
space="preserve" listing-type="equation">Converted R=(2R-TA)/2
- space="preserve" listing-type="equation">node label=1+(C-2)/|A|
- space="preserve" listing-type="equation">address signal=i|A|-|A|+2
- space="preserve" listing-type="equation">p=|A|·
Cl-C+2
- space="preserve" listing-type="equation">TB=2H-KB
space="preserve" listing-type="equation">Converted Y.sub.k =(Y.sub.k +TB)/2
- space="preserve" listing-type="equation">D=D.sub.g +CSYM((D.sub.g -1)/BSYM)
- space="preserve" listing-type="equation">address signal=D+(D+2|A|-3)/(|A|-1)
- space="preserve" listing-type="equation">TA=2NSYM-KA
space="preserve" listing-type="equation">new R=(R+TA)/2
- space="preserve" listing-type="equation">TA=2NSYM-KA
space="preserve" listing-type="equation">converted R=(R+TA)/2
-
123. A data compression and decompression method for compressing digital data and recovering the original data from the compressed data, said method including reading the digital data from a medium to provide a stream of data symbol signals, said data symbol signals being members of an alphabet of symbol signals comprising a predetermined number of symbol signals, said data compression and decompression method including a compression method for providing a compressed stream of code signals corresponding to said stream of data symbol signals and applying said code signals to a medium, said data compression and decompression method further including a decompression method for recovering the original data from said code signals, said compression method comprising:
-
parsing said stream of data symbol signals into segments, each segment comprising a prefix and the next data symbol signal occurring in said stream after said prefix, said prefix comprising the longest match with a previous segment, said parsing step defining a first search tree having nodes and branches emanating from said nodes, each node having at most one incoming branch, the outgoing branches emanating from a node being representative of said symbol signals of said alphabet respectively, said nodes having respective node labels associated therewith and respective addresses associated therewith, said first search tree having a unique root node without an incoming branch and leaf nodes without outgoing branches, said parsing step comprising the step of tracing said stream of data symbol signals through said first search tree along a path from said root node to a leaf node, including the step of determining the branches of said path in accordance with said symbol signals, respectively, of said stream of data symbol signals, said parsing step comprising the steps of storing said node labels at respective storage locations of a first storage means, said storage locations of said first storage means corresponding respectively to said addresses of said nodes of said tree, a node label stored at a particular storage location identifying the node whose address corresponds to said particular storage location, generating an address signal, accessing said storage locations of said first storage means with said address signal, said first storage means providing, in response thereto, the node label stored in the storage location accessed by said address signal, said step of generating said address signal comprising the step of generating said address signal in accordance with a node label provided by said first storage means and a data symbol from said stream of data symbols, detecting when an accessed storage location of said first storage means does not contain a node label, said storage location not containing a node label defining a leaf node of said first search tree, and inserting a node label into said accessed storage location of said first storage means detected as not containing a node label; said step of generating said address signal comprising the steps of sequentially generating address signals, providing said address signals to said first storage means, the node label provided by said first storage means in response to an address signal providing the node label for generating the next address signal, said step of sequentially generating said address signals comprising the step of sequentially generating said address signals in accordance with the node labels sequentially provided by said first storage means, respectively, and in accordance with successive symbols of said stream of data symbols, respectively, said address signals being sequentially generated in response to said sequentially provided node labels until an accessed storage location of said first storage means is detected without a node label; and generating a pointer signal, for each said segment, in accordance with the address of said accessed storage location of said first storage means not containing a node label, said pointer signal being representative of said previous segment matching said prefix and of said next occurring data symbol signal after said prefix, thereby providing pointer signals for segments of said stream of data symbol signals, respectively, said pointer signals comprising said compressed stream of code signals. - View Dependent Claims (124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 240, 241, 242, 243, 244)
-
135. The compression method of claim 123 in which said first storage means includes a directly addressed portion and a hash addressed portion, said step of generating said address signal further comprising the steps of
comparing said address signal to a predetermined hashing limit, hashing said address signal thereby providing a hashed address signal, and providing either said address signal or said hashed address signal to said first storage means in accordance with the comparison between said address signal and said hashing limit. -
136. The compression method of claim 123 in which said step of defining a first search tree comprises the step of defining a first search tree having alpha-branches emanating from said nodes representative of all of the symbol signals of said alphabet not yet encountered during said parsing step.
-
137. The compression method of claim 132 in which said step of defining a first search tree comprises the step of defining a first search tree having alpha-branches emanating from said nodes representative of all of the symbol signals of said alphabet not yet encountered during said parsing step, an alpha-branch of said first search tree being encountered when an accessed storage location of said second storage means is detected as not containing a symbol position signal.
-
138. The compression method of claim 132 in which said step of generating a pointer signal includes the step of generating a leaves-only pointer signal in accordance with said node labels from said first storage means and said address signals.
-
139. The compression method of claim 138 in which said step of generating said leaves-only pointer signal comprises the step of generating said leaves-only pointer signal in accordance with the difference between said address signals and said node labels, respectively.
- 140. The compression method of claim 139 in which said step of generating said leaves-only pointer signal comprises the step of generating said leaves-only pointer signal in accordance with
- space="preserve" listing-type="equation">D=C-E-1
where D comprises said leaves-only pointer signal, C comprises said address signal and E comprises said node label.
-
-
141. The compression method of claim 139 in which said step of generating said leaves-only pointer signal includes the step of inhibiting taking said difference between an address signal and a node label whenever the symbol that resulted in the address signal is a predetermined symbol of said alphabet.
-
142. The compression method of claim 139 in which said step of generating said leaves-only pointer signal includes the step of inhibiting taking said difference between an address signal and a node label whenever the symbol signal that resulted in the address signal is the first symbol signal of said alphabet encountered during the performance of said encoding method.
-
143. The compression method of claim 139 in which said step of generating said leaves-only pointer signal includes the step of inhibiting taking said difference between an address signal and a node label whenever the symbol position signal resulting in the address signal is that symbol position signal assigned to a predetermined symbol.
-
144. The compression method of claim 139 in which said step of generating said leaves-only pointer signal includes the step of inhibiting taking said difference between an address signal and a node label whenever the symbol position signal resulting in the address signal is that symbol position signal assigned to the first symbol signal of said alphabet encountered during the performance of said encoding method.
-
145. The compression method of claim 138, in which said step of generating a pointer signal further includes the step of generating a green-leaves pointer signal corresponding to said leaves-only pointer signal in accordance with the number of symbols of said alphabet not yet encountered during the performance of said encoding method.
-
146. The compression method of claim 138 in which said step of generating a pointer signal further includes the step of generating a green-leaves pointer signal corresponding to said leaves-only pointer signal in accordance with the number of symbol signals of said alphabet currently encountered during said encoding method.
-
147. The compression method of claim 145 in which said step of generating said green-leaves pointer signal further includes the step of generating said green-leaves pointer signal in accordance with said number of symbol signals of said alphabet.
- 148. The compression method of claim 147 in which said step of generating said green-leaves pointer signal comprises the step of generating said green-leaves pointer signal in accordance with
- space="preserve" listing-type="equation">D.sub.g =D+(1-NSYM)(D/ (|A|-1))
where Dg comprises said green-leaves pointer signal, D comprises said leaves-only pointer signal, NSYM comprises a signal representative of the number of symbol signals of said alphabet not yet encountered during the performance of said encoding method, |A| comprises the number of symbol signals of said alphabet and Dg is generated utilizing integer arithemtic.
- space="preserve" listing-type="equation">TB=2H-KB
space="preserve" listing-type="equation">converted D.sub.g =(2D.sub.g -TB)/2
- space="preserve" listing-type="equation">TA=2 NSYM-KA
space="preserve" listing-type="equation">converted R=(2R-TA)/2
- space="preserve" listing-type="equation">node label=1+(C-2)/|A|
- space="preserve" listing-type="equation">address signal=i|A|-|A|+2
- space="preserve" listing-type="equation">p+|A|·
C1-C+2
- space="preserve" listing-type="equation">TB=2H-KB
space="preserve" listing-type="equation">converted Y.sub.k =(Y.sub.k +TB)/2
- space="preserve" listing-type="equation">D=D.sub.g +CSYM((D.sub.g -1)/BSYM)
- space="preserve" listing-type="equation">address signal=D+(D+2|A|-3)/(|A|-1)
Specification