Method and apparatus for adaptive data compression
First Claim
1. A method for adaptively compressing an input string comprising the steps of:
- searching an encoder dictionary for each symbol received in the input string, a symbol width selected dependent on a type of data in the input string;
upon detecting a symbol is not stored in the encoder dictionary, learning the symbol by storing the symbol at a next sequential index in the encoder dictionary and transmitting the symbol in a code word to a decoder, the code word including an identifier, the state of the identifier indicating the code word includes the symbol to be learned;
upon detecting the symbol is stored in the encoder dictionary, transmitting in the code word a single index at which a previously learned symbol is stored in the encoder dictionary, the state of the identifier indicating the code word includes the single index; and
upon detecting a compression ratio based on the current symbol width is less than a threshold compression ratio, modifying the symbol width and communicating a new symbol width to the decoder.
4 Assignments
0 Petitions
Accused Products
Abstract
We present a method and apparatus for performing adaptive data compression. An alphabet and vocabulary in the encoder and decoder is built adaptively and stored in a dictionary as symbols are to be encoded and decoded. Each time an unknown symbol is to be encoded by the encoder, the encoder adds the symbol to the dictionary and transmits it in plain in the encoded string. The code words transmitted by the encoder include symbols and indexes. The state of a prefix bit preceding the code word indicates whether the code word is a plain symbol or an index of a symbol or string of symbols stored in the dictionary. The decoder examines the prefix bit of each code word as it is received to determine if the code word stores a symbol in plain or in index. If the code word stores a symbol in plain, the decoder learns the symbol by adding a sequence of symbols resulting from the concatenation of previously decoded symbols and the first symbol of the currently decoded symbol and by adding the symbol to its dictionary. If the code word stores an index, the decoder decodes the code word by extracting the symbol or sequence of symbols stored in the dictionary at the respective index in the dictionary.
-
Citations
29 Claims
-
1. A method for adaptively compressing an input string comprising the steps of:
-
searching an encoder dictionary for each symbol received in the input string, a symbol width selected dependent on a type of data in the input string;
upon detecting a symbol is not stored in the encoder dictionary, learning the symbol by storing the symbol at a next sequential index in the encoder dictionary and transmitting the symbol in a code word to a decoder, the code word including an identifier, the state of the identifier indicating the code word includes the symbol to be learned;
upon detecting the symbol is stored in the encoder dictionary, transmitting in the code word a single index at which a previously learned symbol is stored in the encoder dictionary, the state of the identifier indicating the code word includes the single index; and
upon detecting a compression ratio based on the current symbol width is less than a threshold compression ratio, modifying the symbol width and communicating a new symbol width to the decoder. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
upon detecting that a sequence of symbols is not stored in the dictionary, transmitting an index at which a longest prefix match for the sequence of symbols is stored in the dictionary, the state of the identifier indicating the code word includes the index.
-
-
3. The method as claimed in claim 1 wherein the index is a variable width.
-
4. The method as claimed in claim 3 wherein the width of the index is dependent on a number of learned items stored in the dictionary.
-
5. The method as claimed in claim 1 wherein the step of searching searches for a sequence of symbols, the step of learning learns the sequence of symbols by storing the sequence of symbols in the dictionary, and the step of transmitting transmits in the code word an index at which a longest prefix match for the sequence of symbols is stored.
-
6. The method as claimed in claim 1 wherein the symbol is a plurality of bytes.
-
7. The method as claimed in claim 1 where in the symbol width is modified by including a symbol width change notification in the code word.
-
8. The method as claimed in claim 7 further comprising the step of:
-
initializing the encoder dictionary and decoder dictionary after the symbol width is modified; and
learning symbols of the modified width.
-
-
9. The method as claimed in claim 7 further comprising the step of:
-
flushing all symbols and strings stored in the encoder dictionary and decoder dictionary having widths which are not multiples of the modified width; and
learning symbols of the modified width.
-
-
10. An apparatus for adaptively compressing an input string comprising:
-
an encoder dictionary for storing each symbol received in the string of symbols a symbol width selected dependent on the type of data in the input string; and
control logic which searches the dictionary for each symbol, transmits the symbol in a code word to a decoder upon detecting the symbol is not stored in the dictionary and learns the symbol by storing the symbol at a next sequential index in the dictionary, the code word including an identifier, the state of the identifier indicating the code word includes the symbol to be learned, upon detecting the symbol is stored in the encoder dictionary, transmits in the code word a single index at which the stored symbol is stored in the encoder dictionary, the state of the identifier indicating the code word includes the single index and upon detecting a compression ratio based on the current symbol width is less than a threshold compression ratio, modifies the symbol width and communicates a new symbol width to the decoder. - View Dependent Claims (11, 12, 13, 14)
-
-
15. An apparatus for adaptively compressing an input string comprising:
-
an encoder dictionary for storing each symbol received in the string of symbols, a symbol width selected dependent on a type of data in the input string;
means for searching the dictionary for each symbol;
means for transmitting the symbol to a decoder in a code word upon detecting the symbol is not stored in the encoder dictionary, the code word including an identifier, the state of the identifier indicating the code word includes the symbol to be learned; and
means for learning the symbol by storing the symbol at a next sequential index the dictionary;
upon detecting the symbol is stored in the encoder dictionary, means for transmitting in the code word a single index at which the stored symbol is stored in the encoder dictionary, the state of the identifier indicating the code word includes the single index; and
upon detecting a compression ratio based on the current symbol width is less than a threshold compression ratio, means for modifying the symbol width and communicating a new symbol width to the decoder. - View Dependent Claims (16, 17)
-
-
18. A method for adaptively compressing an input string comprising the steps of:
-
searching a dictionary for a longest prefix match for a sequence of symbols received in the string of symbols, a symbol width selected dependent on a type of data in the input string;
upon detecting a symbol in the string of symbols is not stored in the encoder dictionary, transmitting the symbol in a code word, the code word including an identifier, the state of the identifier identifying whether the code word is a plain symbol to be learned or is an index, and learning the symbol by storing the symbol at a next sequential index in the encoder dictionary;
upon detecting that the sequence of symbols is stored in the dictionary, transmitting in the code word, the index at which the sequence of symbols is stored in the dictionary, the width of the index dependent on a number of learned items stored in the dictionary;
upon detecting the symbol is stored in the encoder dictionary, transmitting in the code word a single index at which the stored symbol is stored in the encoder dictionary, the state of the identifier indicating the code word includes the single index; and
upon detecting a compression ratio based on the current symbol width is less than a threshold compression ratio, modifying the symbol width and communicating a new symbol width to the decoder.
-
-
19. A method for decompressing a sequence of code words comprising the steps of:
-
receiving a code word, the code word including an identifier, the width of the code word dependent on a symbol width selected dependent on a type of data in the input string;
upon detecting from the state of the identifier that a symbol is stored in the code word, learning the symbol by, storing the symbol at a next sequential index in a decoder dictionary; and
providing the symbol as decoded data. - View Dependent Claims (20, 21, 22, 23)
learning a sequence of symbols by concatenating a previously learned symbol with the symbol; and
storing the sequence of symbols at a next sequential index in the decoder dictionary.
-
-
21. The method as claimed in claim 19 further comprising the step of:
upon detecting a dictionary index stored in the code word, providing a translation of the index.
-
22. The method as claimed in claim 21 wherein the index is a variable width.
-
23. The method as claimed in claim 21 wherein the translation of the index is a symbol.
-
24. An apparatus for decompressing a sequence of code words comprising:
-
a dictionary for storing at a next sequential index, a plain symbol received in a code word, the code word including an identifier, the state of the identifier indicating the code word includes the symbol to be learned; and
logic which receives a code word, detects the plain symbol stored in the code word, stores the symbol in the dictionary and provides the symbol as decoded data, upon detecting the symbol is stored in the encoder dictionary, transmits in the code word a single index at which the stored symbol is stored in both the encoder dictionary and the decoder dictionary, the state of the identifier indicating the code word includes the single index and upon detecting a compression ratio based on a current symbol width is less than a threshold compression ratio, modifies the current symbol width and communicates a new symbol width to the decoder. - View Dependent Claims (25, 26, 27, 28, 29)
-
Specification