Method and apparatus for enhanced decompressor parsing

US 6,310,563 B1
Filed: 05/12/2000
Issued: 10/30/2001
Est. Priority Date: 05/12/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A data compression/decompression system comprising:

a compression engine for receiving uncompressed information units comprising raw characters and working strings and compressing them into compressed information units, said compressor engine having an associated dictionary table including entries comprising raw characters and reference strings, each reference string representing matched sequence of two or more bytes of data from an input working string; and

, a decompression engine for receiving compressed information units and generating said uncompressed information units in decompression engine cycles, said decompression engine including a parser device for extracting consecutive data phrases of compressed information units, each data phrase comprising one of a predetermined set of characters, compressed strings or combinations thereof, wherein a data phrase includes a variable bit-length compression code for representing a matched sequence of bytes defining a compressed string, each variable bit-length compression code including a primary length field and a secondary length field for defining a string'"'"'s length, and a position length field for encoding a starting address of the reference string in said dictionary to which a starting position of a working string refers, said parser device parsing said primary length fields and secondary length fields in different decompression engine cycles to enhance parser device throughput.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a high-speed computer system using multiple compression and decompression engines, a method and apparatus for coding and parsing compressed data in the decompressor in order to avoid bottlenecks within the decompressor that prevent it from achieving optimum latency and throughput acceptable to the system processor.

Citations

46 Claims

1. A data compression/decompression system comprising:
- a compression engine for receiving uncompressed information units comprising raw characters and working strings and compressing them into compressed information units, said compressor engine having an associated dictionary table including entries comprising raw characters and reference strings, each reference string representing matched sequence of two or more bytes of data from an input working string; and
  
  , a decompression engine for receiving compressed information units and generating said uncompressed information units in decompression engine cycles, said decompression engine including a parser device for extracting consecutive data phrases of compressed information units, each data phrase comprising one of a predetermined set of characters, compressed strings or combinations thereof, wherein a data phrase includes a variable bit-length compression code for representing a matched sequence of bytes defining a compressed string, each variable bit-length compression code including a primary length field and a secondary length field for defining a string'"'"'s length, and a position length field for encoding a starting address of the reference string in said dictionary to which a starting position of a working string refers, said parser device parsing said primary length fields and secondary length fields in different decompression engine cycles to enhance parser device throughput.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The data compression/decompression system as claimed in claim 1, wherein for statistically probable matching strings comprising up to three bytes, said secondary length field is unused and said primary length field comprises data of minimum bit length, thereby enabling said parser to process said data phrase in a single decompression engine cycle.
  - 3. The data compression/decompression system as claimed in claim 2, wherein said minimum bit length comprises a total of two bits for identifying a string length.
  - 4. The data compression/decompression system as claimed in claim 3, wherein a data phrase includes a fixed bit-length compression code for representing a single byte character, said parser differentiating a data phrase of compressed character type or compressed string type by a single flag bit.
  - 5. The data compression/decompression system as claimed in claim 4, wherein said position length field comprises a first dictionary set field representing a dictionary in an engine where said reference string for said data phrase is found and a variable-width sector address field representing said starting address of the reference string in said dictionary.
  - 6. The data compression/decompression system as claimed in claim 5, wherein for matching strings of up to 256 bytes in length, said position length field comprises a range from 2 to 10 bits.
  - 7. The data compression/decompression system as claimed in claim 1, wherein for statistically probable matching strings comprising up to three bytes, a corresponding data phrase for parsing ranges anywhere from 5 bits to 13 bits.
  - 8. The data compression/decompression system as claimed in claim 2, wherein said compression engine employs Ziv-Lempel compression algorithm.
  - 9. The data compression/decompression system as claimed in claim 2, wherein said decompression engine further comprises a tokenizer device for converting each data phrase produced by said parser into fixed-bit length tokens, said tokens including an indicator for identifying the phrase as comprising one of a raw character or a string.
  - 10. The data compression/decompression system as claimed in claim 9, wherein said tokenizer device converts a phrase comprising a string of two or more bytes into consecutive position tokens, said position tokens based on a string position and length fields.
  - 11. The data compression/decompression system as claimed in claim 10, wherein said position token further includes an identifier for enabling said decompression engine to distinguish between consecutive position tokens belonging to a single stream, and consecutive position tokens belonging to different streams.
  - 12. The data compression/decompression system as claimed in claim 10, wherein said tokenizer device further comprises a queue mechanism for temporarily storing generated tokens before transferring them downstream to said decompressor engine, said tokenizer queue mechanism isolating said parser and enabling continued parser operation during occurrence of decompressor stall cycles.
  - 13. The data compression/decompression system as claimed in claim 12, wherein each of said predetermined set of characters, compressed strings or combinations extracted by said parser optimally meet and sustain decompressor output bandwidth requirements.
  - 14. The data compression/decompression system as claimed in claim 13, wherein a first predetermined set of data phrases includes phrases that are able to be converted into two or more tokens per decompression engine cycle.
  - 15. The data compression/decompression system as claimed in claim 12, wherein said first optimum set of data phrases includes phrases having a maximum bit length ranging up to 23 bits.
  - 16. The data compression/decompression system as claimed in claim 15, wherein a second optimal set of data phrases includes phrases representing two raw characters, a single raw character, a string'"'"'s primary length field, and a string'"'"'s secondary length field.
  - 17. The data compression/decompression system as claimed in claim 16, wherein said second optimal set of data phrases includes phrases having a maximum bit length ranging up to 18 bits.

18. A data decompression system for decompressing information units received in compressed form, said data decompression system comprising:
- a parser device for extracting consecutive data phrases of compressed information units at a parser device engine rate, each data phrase comprising one of a predetermined set of characters, compressed strings or combinations thereof;
  
  a tokenizer device associated with said parser device for receiving said data phrases and converting each data phrase into fixed bit-length tokens of shortened bit-length, each token including an indicator for identifying its corresponding phrase as comprising one of a raw character or a string; and
  
  , a decompression engine for receiving said tokens and generating corresponding uncompressed information units for output thereof at a predetermined data bus rate, whereby said parser device, tokenizer device and decompression engine operate at engine rates equal to or greater than said predetermined data bus rate for maximizing decompression throughput and reducing decompression time.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 19. The data decompression system as claimed in claim 18, further comprising a dictionary table associated with a decompression engine having entries comprising raw characters and reference strings, each reference string representing a matched sequence of two or more bytes of data, said decompression engine accessing said dictionary table for generating uncompressed information units for output thereof.
  - 20. The data decompression system as claimed in claim 19, wherein said parser device retrieves data phrases comprising either one of compressed code type data of fixed bit-length for representing a compressed character, and a compressed code type data of variable bit-length type for representing a matched sequence of bytes for representing an incoming working string of uncompressed information units, each variable bit-length type including a primary length field and a secondary length field for defining a string'"'"'s length, and a position length field for encoding a starting address of the reference string in said dictionary to which a starting position of a working string refers.
  - 21. The data decompression system as claimed in claim 20, wherein for statistically probable matching strings comprising up to three bytes, said secondary length field is unused and said primary length field comprises data of minimum bit length thereby enabling said parser to process said data phrase in a single decompression engine cycle.
  - 22. The data decompression system as claimed in claim 20, wherein for matching strings comprising more than three bytes, said parser device parsing said primary length fields and secondary length fields of a data phrase in different decompression engine cycles.
  - 23. The data decompression system as claimed in claim 21, wherein said minimum bit length comprises a total of two bits for identifying said string length code.
  - 24. The data decompression system as claimed in claim 20, wherein a data phrase of compressed code type data or compressed string code type is differentiated by said parser by a single flag bit.
  - 25. The data decompression system as claimed in claim 24, wherein said position length field comprises a first dictionary set field representing a dictionary in an engine where said reference string for said data phrase is found and a variable-width sector address field representing said starting address of the reference string in said dictionary.
  - 26. The data decompression system as claimed in claim 25, wherein for matching strings of up to 256 bytes in length, said position length field length total bits ranging from 2 to 10 bits.
  - 27. The data decompression system as claimed in claim 20, wherein for statistically probable matching strings comprising up to three bytes, a corresponding data phrase for parsing ranges anywhere from 5 bits to 13 bits.
  - 28. The data decompression system as claimed in claim 18, wherein said tokenizer device converts a phrase comprising a string of two or more bytes into consecutive position tokens, said position tokens based on a string position and length fields.
  - 29. The data decompression system as claimed in claim 28, wherein said position token further includes an identifier for enabling said decompression engine to distinguish between consecutive position tokens belonging to a single stream, and consecutive position tokens belonging to different streams.
  - 30. The data decompression system as claimed in claim 28, wherein said tokenizer device further comprises a queue mechanism for temporarily storing generated tokens before transferring them to said decompressor engine, said tokenizer queue mechanism isolating said parser and enabling continued parser operation during occurrence of decompressor stall cycles.
  - 31. The data decompression system as claimed in claim 30, wherein each of said predetermined set of characters, compressed strings or combinations extracted by said parser optimally meet and sustain decompressor output bandwidth requirements.
  - 32. The data decompression system as claimed in claim 31, wherein a first optimum set of data phrases includes phrases that are able to be converted into two or more tokens per decompression engine cycle.
  - 33. The data decompression system as claimed in claim 32, wherein a second optimal set of data phrases includes phrases representing two raw characters, a single raw character, a string'"'"'s primary length field, and a string'"'"'s secondary length field.

34. A method for decompressing compressed data information units generated by a data compression engine, said method comprising the steps of:
- a) parsing said compressed information units and extracting consecutive data phrases at a parsing engine rate, each data phrase comprising one of a predetermined set of characters, compressed strings or combinations thereof;
  
  b) converting each extracted data phrase into fixed bit-length tokens of shortened bit-length, each said token including an indicator for identifying its corresponding phrase as comprising one of a compressed raw character or a compressed string;
  
  c) implementing a decompression engine for receiving said fixed bit-length tokens; and
  
  , d) processing said tokens at a decompression engine rate for outputting corresponding uncompressed information units at a predetermined data bus rate, wherein said parsing engine rate, tokenizing rate and decompression engine rate are equal to or greater than said predetermined data bus rate for maximizing decompression throughput and reducing decompression time.
- View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 35. The method as claimed in claim 34, wherein prior to step a), the step of:
36. The method as claimed in claim 35, wherein a data phrase comprises either one of a compressed code type data of fixed bit-length for representing a compressed character, and a compressed code type data of variable bit-length type for representing a sequence of bytes defining a compressed string, said method including representing a string'"'"'s length in each variable bit-length type as a primary length field and a secondary length field and a position length field for encoding a starting address of the reference string in said dictionary to which a starting position of a working string refers.
37. The method as claimed in claim 36, wherein said representing step comprises the steps of:
- determining statistically probable matching strings comprising three bytes of less, and forming a corresponding data phrase for said strings to include a primary length field comprising data of minimum bit length and a position length field, whereby said parsing step for such statistically probable matching strings is performed in a single decompression engine cycle.
38. The method as claimed in claim 36, wherein for matching strings comprising more than three bytes, the step of forming a corresponding data phrase for said strings to include a primary length field, secondary length fields and a position length field, whereby said parsing step includes parsing said primary length fields and secondary length fields of a data phrase in different decompression engine cycles.
39. The method as claimed in claim 37, wherein said minimum bit length comprises a total of two bits for identifying said string length code.
40. The method as claimed in claim 37, further said parsing step includes:
- differentiating a data phrase of compressed code type data or compressed string code type according to a single flag bit.
41. The method as claimed in claim 36, wherein said converting step b) includes converting a data phrase representing a string of two or more bytes into consecutive position tokens, said position tokens based on a string position and length fields.
42. The method as claimed in claim 41, wherein said converting step b) includes providing an identifier in each token for enabling a decompression engine to distinguish between consecutive position tokens belonging to a single stream, and consecutive position tokens belonging to different streams.
43. The method as claimed in claim 42, wherein prior to step d) the step of temporarily storing generated tokens before transferring them to said decompressor engine, whereby said continued parsing steps are enabled during occurrence of a decompressor stall cycles.
44. The method as claimed in claim 43, wherein each data phrase of predetermined set of characters, compressed strings or combinations thereof optimally meet and sustain predetermined data bus rate requirements.
45. The method as claimed in claim 44, wherein a first optimum set of data phrases includes phrases that are able to be converted into two or more tokens per decompression engine cycle.
46. The method as claimed in claim 45, wherein a second optimal set of data phrases includes phrases representing two raw characters, a single raw character, a string'"'"'s primary length field, and a string'"'"'s secondary length field.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Schulz, Charles O., Har, David
Primary Examiner(s)
PHAN, TRONG Q

Application Number

US09/569,558
Time in Patent Office

536 Days
Field of Search

341/50, 341/51, 341/67
US Class Current

341/50
CPC Class Codes

H03M 7/3086 employing a sliding window,...

Method and apparatus for enhanced decompressor parsing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

46 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for enhanced decompressor parsing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

46 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links