System and method for lossless data compression and decompression
DC CAFCFirst Claim
1. A method for compressing input data comprising a plurality of data blocks, the method comprising the steps of:
- detecting if the input data comprises a run-length sequence of data blocks;
outputting an encoded run-length sequence, if a run-length sequence of data blocks is detected;
maintaining a dictionary comprising a plurality of code words, wherein each code word in the dictionary is associated with a unique data block string;
building a data block string from at least one data block in the input data that is not part of a run-length sequence;
searching for a code word in the dictionary having a unique data block string associated therewith that matches the built data block string; and
outputting the code word representing the built data block string.
1 Assignment
Litigations
3 Petitions
Accused Products
Abstract
Systems and methods for providing lossless data compression and decompression are disclosed which exploit various characteristics of run-length encoding, parametric dictionary encoding, and bit packing to comprise an encoding/decoding process having an efficiency that is suitable for use in real-time lossless data compression and decompression applications. In one aspect, a method for compressing input data comprising a plurality of data blocks comprises the steps of: detecting if the input data comprises a run-length sequence of data blocks; outputting an encoded run-length sequence, if a run-length sequence of data blocks is detected; maintaining a dictionary comprising a plurality of code words, wherein each code word in the dictionary is associated with a unique data block string; building a data block string from at least one data block in the input data that is not part of a run-length sequence; searching for a code word in the dictionary having a unique data block string associated therewith that matches the built data block string; and outputting the code word representing the built data block string.
-
Citations
30 Claims
-
1. A method for compressing input data comprising a plurality of data blocks, the method comprising the steps of:
-
detecting if the input data comprises a run-length sequence of data blocks;
outputting an encoded run-length sequence, if a run-length sequence of data blocks is detected;
maintaining a dictionary comprising a plurality of code words, wherein each code word in the dictionary is associated with a unique data block string;
building a data block string from at least one data block in the input data that is not part of a run-length sequence;
searching for a code word in the dictionary having a unique data block string associated therewith that matches the built data block string; and
outputting the code word representing the built data block string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
receiving an input data block;
identifying a run-length sequence if at least the next s successive data blocks in the input data are similar to the input data block.
-
-
3. The method of claim 2, wherein the step of outputting an encoded run-length sequence comprises the step of consecutively outputting a first control code word indicating a run-length sequence, a code word in the dictionary having a unique data block string associated therewith that corresponds to the input data block, and a word corresponding to the number of successive data blocks that are similar to the input data block.
-
4. The method of claim 1, wherein the step of maintaining a dictionary comprises the steps of:
-
dynamically generating a new code word corresponding to a built data block string, if the built data block string does not match a unique data block string in the dictionary; and
adding the new code word in the dictionary.
-
-
5. The method of claim 4, wherein the step of maintaining the dictionary further comprises the step of initializing the dictionary if the number of code words exceeds a predetermined threshold.
-
6. The method of claim 5, wherein the step of initializing the dictionary comprises the steps of:
-
resetting the dictionary to include all possible code words corresponding to a unique data block string comprising a single data block; and
outputting a control code word indicating that the dictionary has been initialized.
-
-
7. The method of claim 1, wherein the code words in the dictionary further comprises at least one control code word representing one of dictionary initialization, a run-length encoded sequence, an end of the input data, and a combination thereof.
-
8. The method of claim 1, wherein each code word in the dictionary comprises a dictionary index.
-
9. The method of claim 1, further comprising the step of bit-packing encoded run-length sequences and code words that are output.
-
10. The method of claim 1, wherein the step of building a data block string comprises the steps of:
-
(a) iteratively storing in a first data structure, a next successive data block in the input data to build a current data block string; and
(b) for each iteration in step (a), updating a previous code word stored in a second data structure to a current code word corresponding to the current data block string in the first data structure, if the code word for the current data block string in the first data structure is found in the dictionary; and
further wherein the step of outputting the code word representing the built data block string comprises the steps of outputting the previous code word stored in the second data structure, if a code word is not found in the dictionary corresponding to the current data block string in the first data structure.
-
-
11. The method of claim 10, further comprising the step of adding the current data block string to the dictionary.
-
12. The method of claim 11, further comprising the steps of:
-
storing, in a third data structure, the last data block input in the first data structure, if the current data block string is not found in the dictionary; and
repeating steps (a) and (b) starting with the data block in the third data structure, if the data block in the third data structure is not part of a run-length sequence.
-
-
13. The method of claim 1, further comprising the step of maintaining a hash table comprising a plurality of arrays, wherein each array comprises all code words in the dictionary that are associated with a unique data block having a first data block whose value corresponds with an index of the array, and wherein the hash table is used for the step of searching for a code word in the dictionary.
-
14. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for compressing input data comprising a plurality of data blocks, the method comprising the steps of:
-
detecting if the input data comprises a run-length sequence of data blocks;
outputting an encoded run-length sequence, if a run-length sequence of data blocks is detected;
maintaining a dictionary comprising a plurality of code words, wherein each code word in the dictionary is associated with a unique data block string;
building a data block string from at least one data block in the input data that is not part of a run-length sequence;
searching for a code word in the dictionary having a unique data block string associated therewith that matches the built data block string; and
outputting the code word representing the built data block string. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
receiving an input data block;
identifying a run-length sequence if at least the next s successive data blocks in the input data are-similar to the input data block.
-
-
16. The program storage device of claim 15, wherein the instructions for performing the step of outputting an encoded run-length sequence comprise instructions for performing the step of consecutively outputting a first control code word indicating a run-length sequence, a code word in the dictionary having a unique data block string associated therewith that corresponds to the input data block, and a word corresponding to the number of successive data blocks that are similar to the input data block.
-
17. The program storage device of claim 14, wherein the instructions for performing the step of maintaining a dictionary comprise instructions for performing the steps of:
-
dynamically generating a new code word corresponding to a built data block string, if the built data block string does not match a unique data block string in the dictionary; and
adding the new code word in the dictionary.
-
-
18. The program storage device of claim 17, wherein the instructions for performing the step of maintaining the dictionary comprise instructions for performing the step of initializing the dictionary if the number of code words exceeds a predetermined threshold.
-
19. The program storage device of claim 18, wherein the instructions for performing the step of initializing the dictionary comprise instructions for performing the steps of:
-
resetting the dictionary to include all possible code words corresponding to a unique data block string comprising a single data block; and
outputting a control code word indicating that the dictionary has been initialized.
-
-
20. The program storage device of claim 14, wherein the code words in the dictionary further comprise at least one control code word representing one of dictionary initialization, a run-length encoded sequence, an end of the input data, and a combination thereof.
-
21. The program storage device of claim 14, wherein each code word in the dictionary comprises a dictionary index.
-
22. The program storage device of claim 14, further comprising instructions for performing the step of bit-packing encoded run-length sequences and code words that are output.
-
23. The program storage device of claim 14, wherein the instructions for performing the step of building a data block string comprise instructions for performing the steps of:
-
(a) iteratively storing in a first data structure, a next successive data block in the input data to build a current data block string; and
(b) for each iteration in step (a), updating a previous code word stored in a second data structure to a current code word corresponding to the current data block string in the first data structure, if the code word for the current data block string in the first data structure is found in the dictionary; and
further wherein the instructions for performing the step of outputting the code word representing the built data block string comprise instructions for performing the step of outputting the previous code word stored in the second data structure, if a code word is not found in the dictionary corresponding to the current data block string in the first data structure.
-
-
24. The program storage device of claim 23, further comprising instructions for performing the step of adding the current data block string to the dictionary.
-
25. The program storage device of claim 24, further comprising instructions for performing the steps of:
-
storing, in a third data structure, the last data block input in the first data structure, if the current data block string is not found in the dictionary; and
repeating steps (a) and (b) starting with the data block in the third data structure, if the data block in the third data structure is not part of a run-length sequence.
-
-
26. The program storage device of claim 14, further comprising instructions for performing the step of maintaining a hash table comprising a plurality of arrays, wherein each array comprises all code words in the dictionary that are associated with a unique data block having a first data block whose value corresponds with an index of the array, and wherein the hash table is used for the step of searching for a code word in the dictionary.
-
27. A method for decompressing an encoded data stream comprising a plurality of code words, the method comprising the steps of:
-
maintaining a dictionary comprising a plurality of code words utilized to generate the encoded data stream, wherein the code words in the dictionary comprise control code words and code words that are each associated with a unique data block string;
decoding and outputting a run-length sequence of data blocks associated with an input code word of the encoded data stream, if the input code word is a control code word in the dictionary that indicates an encoded run-length sequence;
outputting a unique data block string in the dictionary that is associated with an input code word of the encoded data stream, if the input code word is found in the dictionary; and
if the input code word is not found in the dictionary, building a new data block string comprising (1) the unique data block string associated with a previous control word found in the dictionary and (2) the first data block of the unique data block string, adding the new string to the dictionary and outputting the new string.
-
-
28. A system for compressing input data comprising a plurality of data blocks, the system comprising:
-
a dictionary comprising a plurality of code words, wherein the code words comprise control code words and code words that are each mapped to a unique data block string;
a run-length encoder for encoding a sequence of similar data blocks in the input data using at least one code word in the dictionary; and
a dictionary encoder for encoding a data block string comprising at least one data block in the input data using a code word in the dictionary, wherein output of the run-length encoder and dictionary encoder are combined to form an encoded data stream. - View Dependent Claims (29, 30)
a dictionary comprising a plurality of code words utilized to generate the encoded data stream, wherein the code words in the dictionary comprise control code words and code words that are each associated with a unique data block string;
a run-length decoder for decoding and outputting a run-length sequence of data blocks associated with an input code word of the encoded data stream, if the input code word is a control code word in the dictionary that indicates an encoded run-length sequence;
a dictionary decoder for outputting a unique data block string in the dictionary that is associated with an input code word of the encoded data stream, if the input code word is found in the dictionary; and
if the input code word is not found in the dictionary, building a new data block string comprising (1) the unique data block string associated with a previous control word found in the dictionary and (2) the first data block of the unique data block string, adding the new string to the dictionary and outputting the new string.
-
-
30. The system of claim 29, wherein the compression and decompression systems are employed for accelerated data storage and retrieval.
Specification