Polynomial-time, sequential, adaptive system and method for lossy data compression
First Claim
1. A lossy data compression method for converting a source sequence over a source alphabet to a distorted sequence over a reproduction alphabet by intentionally introducing errors, the method comprising:
- parsing the source sequence into a plurality of source phrases;
mapping each source phrase to a distorted phrase of equal length and that includes a distortion that is less than an allowed per-symbol distortion budget; and
synthesizing the distorted sequence by concatenating the distorted phrases.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method are provided for lossy compression of finite alphabet source sequences subject to an average-per-letter distortion constraint. The source sequence is sequentially parsed into phrases and each source phrase is mapped to a distorted phrase such that average per-letter distortion between the two phrases does not exceed the desired distortion. The present system adaptively maintains a codebook as the collection of all one-letter extensions of previously emitted distorted phrases. The present system uses approximate string matching and carries out a sequential procedure by iterating the following steps: (i) given the current codebook find the longest source phrase that can be transmitted at a given distortion, (ii) from all codewords that match the source phrase carefully choose that which is most likely to be useful in the future. For every new source phrase, the present system judiciously selects one of the many approximately matching codewords to balance between the code rate for the current phrase versus the code rate from resulting codebooks for the future source phrases. The present system outputs a distorted sequence that can be naturally losslessly compressed using the Lempel-Ziv algorithm or any variation thereof. Such judicious codeword selection is intended to iteratively improve the codebook quality. The entire present sequence can be implemented in quadratic-time in the length of the source sequence. The present system is sequential and adaptive.
32 Citations
60 Claims
-
1. A lossy data compression method for converting a source sequence over a source alphabet to a distorted sequence over a reproduction alphabet by intentionally introducing errors, the method comprising:
-
parsing the source sequence into a plurality of source phrases;
mapping each source phrase to a distorted phrase of equal length and that includes a distortion that is less than an allowed per-symbol distortion budget; and
synthesizing the distorted sequence by concatenating the distorted phrases. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
wherein computing the number of bits required to encode the entire resulting processed distorted sequence subject to the entire processed source sequence as side information comprises using a lossless compression code with a side information sequence.
-
-
20. The method of claim 19, wherein the lossless compression code with a side information sequence is a Lempel-Ziv code with side information.
-
21. The method of claim 19, wherein the lossless compression code with a side information sequence is an adaptive arithmetic code with side information.
-
22. The method of claim 19, wherein the lossless compression code with a side information sequence is an dynamic Huffman code with side information.
-
23. The method of claim 19, wherein the lossless compression code with a side information sequence is an Lempel-Ziv-Welch code with side information.
-
24. The method of claim 1, further comprising losslessly compressing the distorted sequence.
-
25. The method of claim 24, further comprising decompressing the compressed distorted sequence.
-
26. The method of claim 25, wherein losslessly compressing and decompressing the distorted sequence comprises using a Lempel-Ziv code.
-
27. The method of claim 25, wherein losslessly compressing and decompressing the distorted sequence comprises using an adaptive arithmetic code.
-
28. The method of claim 25, wherein losslessly compressing and decompressing the distorted sequence comprises using a Lempel-Ziv-Welch code.
-
29. The method of claim 25, wherein losslessly compressing and decompressing the distorted sequence comprises using a dynamic Huffman code.
-
30. The method of claim 25, wherein losslessly compressing and decompressing the distorted sequence comprises using a locally adaptive move-to-front code.
-
31. A computer program product having instruction codes for converting a source sequence over a source alphabet to a distorted sequence over a reproduction alphabet by intentionally introducing errors, the computer program product comprising:
-
a first set of instruction codes for parsing the source sequence into a plurality of source phrases;
a second set of instruction codes for mapping each source phrase to a distorted phrase of equal length and that includes a distortion that is less than an allowed per-symbol distortion budget; and
a third set of instruction codes for synthesizing the distorted sequence by concatenating the distorted phrases. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
wherein the second set of instruction codes further computes the number of bits required to encode the entire resulting processed distorted sequence subject to the entire processed source sequence as side information using a lossless compression code with a side information sequence.
-
-
50. The computer program product of claim 33, wherein the fourth set of instruction codes losslessly compresses the distorted sequence, and further decompresses the compressed distorted sequence.
-
51. A system for converting a source sequence over a source alphabet to a distorted sequence over a reproduction alphabet by intentionally introducing errors, the system comprising:
-
means for parsing the source sequence into a plurality of source phrases;
means for mapping each source phrase to a distorted phrase of equal length and that includes a distortion that is less than an allowed per-symbol distortion budget; and
means for synthesizing the distorted sequence by concatenating the distorted phrases. - View Dependent Claims (52, 53, 54, 55, 56, 57, 58, 59, 60)
wherein a new distorted phrase is a shortest phrase that is not the same as any of previously processed distorted phrases;
wherein a new source phrase is a prefix of an unprocessed source sequence that matches at least one codeword in the codebook within the allowed per-symbol distortion budget; and
wherein each new source phrase is a longest prefix of the unprocessed source sequence that matches at least one codeword in the codebook within the allowed per-symbol distortion budget.
-
-
56. The system of claim 55, wherein the mapping means maps the source phrase to a codeword in the codebook that matches the source phrase within the allowed per-symbol distortion budget;
-
wherein the mapping means further maps the source phrase to the codeword that has a smallest per-symbol distortion amongst all matching codewords;
wherein the mapping means further maps the source phrase to the codeword that is a most recent codeword amongst all matching codewords;
wherein the mapping means further maps the source phrase to a codeword amongst all matching codewords so as to maximize a number of bits required to encode an entire processed source sequence subject to an entire resulting processed distorted sequence as side information; and
wherein the mapping means further computes the number of bits required to encode an entire processed source sequence subject to the entire resulting processed distorted sequence as side information using a lossless compression code with a side information sequence.
-
-
57. The system of claim 56, wherein the lossless compression code with a side information sequence is any one of:
-
a Lempel-Ziv code with side information;
an adaptive arithmetic code with side information;
a dynamic Huffman code with side information;
ora Lempel-Ziv-Welch code with side information.
-
-
58. The system of claim 56, wherein the mapping means maps the source phrase to a codeword amongst all matching codewords so that a difference between the number of bits required to encode the entire processed source sequence subject to an entire resulting processed distorted sequence as side information and the number of bits required to encode an entire resulting processed distorted sequence subject to an entire processed source sequence as side information is maximized.
-
59. The system of claim 58, wherein the mapping means computes the number of bits required to encode the entire processed source sequence subject to the entire resulting processed distorted sequence as side information using a lossless compression code with a side information sequence;
- and
wherein the mapping means further computes the number of bits required to encode the entire resulting processed distorted sequence subject to the entire processed source sequence as side information using a lossless compression code with a side information sequence.
- and
-
60. The system of claim 53, wherein the means losslessly compressing compresses the distorted sequence, and further decompresses the compressed distorted sequence.
Specification