Block-wise adaptive statistical data compressor

US 6,075,470 A
Filed: 02/26/1998
Issued: 06/13/2000
Est. Priority Date: 02/26/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method of compressing data blocks having a plurality of characters, wherein the plurality of characters form an alphabet of N characters, comprising the steps of:

assigning the N characters of the alphabet into M super-character groups based upon the expected frequency of occurrence of each of the N characters in the data block, wherein M is less than N;

accumulating statistics in the M super-character groups regarding the frequency of occurrence of each character in the data block;

generating a plurality of super-character codewords that model the frequencies of occurrence for each character, wherein each super-character codeword includes a variable length prefix value that identifies the super-character group and a fixed length index value that identifies the particular character in the group; and

replacing the characters with the super-character codewords to form a compressed data block.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A block-wise adaptive statistical data compressor is disclosed that operates by replacing characters in a data block with super-character codewords comprising a variable length prefix and a fixed length index. The codewords are determined by treating a plurality of groups of characters as super-character groups and then adapting the codewords, for each data block, based upon the actual frequency of occurrence of the characters in each group. The super-character prefix value identifies the group to which a particular character belongs, and the index value identifies the individual character of the group. By grouping and indexing the characters into these super-character groups, the present invention models a particular data block using a fraction of the information generally required by a fixed statistical compressor. Also disclosed are multi-stage lossless block data compressors that include the block-wise adaptive statistical compressor and also include a clustering stage and a reordering stage. The clustering stage clusters like characters into similar locations within the data block, and the reordering stage reorders the data to generate an expected skew in the frequency distribution of characters in the data block so that the block can be more efficiently compressed by the block-wise adaptive statistical compressor.

Citations

29 Claims

1. A method of compressing data blocks having a plurality of characters, wherein the plurality of characters form an alphabet of N characters, comprising the steps of:
- assigning the N characters of the alphabet into M super-character groups based upon the expected frequency of occurrence of each of the N characters in the data block, wherein M is less than N;
  
  accumulating statistics in the M super-character groups regarding the frequency of occurrence of each character in the data block;
  
  generating a plurality of super-character codewords that model the frequencies of occurrence for each character, wherein each super-character codeword includes a variable length prefix value that identifies the super-character group and a fixed length index value that identifies the particular character in the group; and
  
  replacing the characters with the super-character codewords to form a compressed data block.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the assignment of characters to particular super-character groups results in certain super-characters being assigned a low number of frequently occurring characters and other super-characters being assigned a high number of infrequently occurring characters.
  - 3. The method of claim 1, wherein the super-character codewords are Huffman codes.
  - 4. The method of claim 1, further including the step of:
    - normalizing the accumulated statistics to a predetermined value.

5. A block-wise adaptive statistical compressor for compressing a data block having a plurality of characters, the plurality of characters forming an alphabet of N characters comprising:
- means for assigning the N characters to M super-character groups based upon the expected frequency of occurrence for each character, wherein M is less than N;
  
  means for accumulating statistics in the super-character groups regarding the frequency of occurrence of each character in the data block;
  
  means for generating a plurality of super-character codewords that model the frequencies of occurrence for each character; and
  
  means for replacing the characters with the super-character codewords to form a compressed data block.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. The block-wise adaptive statistical compressor of claim 5, wherein the super-character codewords include a variable length prefix that identifies the super-character group to which a particular character has been assigned, and a fixed index that identifies the particular character in the group.
  - 7. The block-wise adaptive statistical compressor of claim 6, wherein the super-character codewords are provided to a decompression device along with the compressed data block to enable decompression.
  - 8. The block-wise adaptive statistical compressor of claim 5, wherein the means for assigning characters to particular super-character groups assigns a low number of frequently occurring characters to particular super-character groups and a high number of infrequently occurring characters to other super-character groups.
  - 9. The block-wise adaptive statistical compressor of claim 5, wherein the compressor is programmed into a mobile data communication device.
  - 10. The block-wise adaptive statistical compressor of claim 9, wherein the mobile data communication device communicates via a packet data network.
  - 11. The block-wise adaptive statistical compressor of claim 9, wherein the compressor is permanently stored within the memory of the mobile data communication device.
  - 12. The block-wise adaptive statistical compressor of claim 9, wherein the mobile data communication device is a two-way paging computer.

13. A multi-stage data compressor for compressing a data file, comprising:
- means for partitioning the data file into blocks of characters;
  
  a clustering stage for transforming each data block into a clustered block;
  
  a reordering stage for reordering each clustered block into a reordered block; and
  
  a block-wise adaptive statistical data compressor for compressing the reordered blocks of data, comprising;
  
  means for assigning the characters to a plurality of super-character groups based upon the expected frequency of occurrence for each character, wherein at least one of the super-character groups is assigned a plurality of characters;
  
  means for accumulating statistics in the super-character groups regarding the frequency of occurrence of each character in the data block;
  
  means for generating a plurality of super-character codewords that model the frequencies of occurrence for each character; and
  
  means for replacing the characters with the super-character codewords to form a compressed data block.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25)
- - 14. The multi-stage data compressor of claim 13, wherein the reordering stage is a move to the front reordering stage.
  - 15. The multi-stage data compressor of claim 14, wherein the move to the front reordering stage replaces the individual characters in the data block with numerical values having a skewed frequency distribution.
  - 16. The multi-stage data compressor of claim 15, wherein the move to the front reordering stage comprises:
    - means for providing an initial queue containing an alphabet of available characters, each character being assigned a numerical value that corresponds to its order in the queue; and
      
      for each character in the clustered data block, means for replacing each character in the clustered block with the current numerical value of the particular character in the queue and for moving the particular character to the front of the queue.
  - 17. The multi-stage data compressor of claim 16, wherein the initial queue is preordered with an alphabet of characters based on the projected frequency of occurrence of the characters in the clustered data block.
  - 18. The multi-stage data compressor of claim 13, wherein the compressor is programmed into a mobile data communication device.
  - 19. The multi-stage data compressor of claim 18, wherein the mobile data communication device communicates via a packet data network.
  - 20. The multi-stage data compressor of claim 18, wherein the compressor is permanently stored within the memory of the mobile data communication device.
  - 21. The multi-stage data compressor of claim 18, wherein the mobile data communication device is a two-way paging computer.
  - 22. The multi-stage data compressor of claim 13, wherein the super-character codewords include a variable length prefix that identifies the super-character group to which a particular character has been assigned, and a fixed index that identifies the particular character in the group.
  - 24. The multi-stage data compressor of claim 22, wherein the super-character codewords are provided to a decompression device along with the compressed data block to enable decompression.
  - 25. The multi-stage data compressor of claim 13, wherein the clustering stage utilizes the Burrows-Wheeler transform to transform the data characters in a data block so that like characters are clustered together in certain locations of the data block, thereby forming the clustered block.

23. The multi-stage data compressor of 13, wherein the means for assigning characters to particular super-character groups assigns a low number of frequently occurring characters to particular super-character groups and a high number of infrequently occurring characters to other super-character groups.

26. A method of compressing a data file comprising the steps of:
- partitioning the data file into a plurality of data blocks, wherein each data block includes a plurality of characters;
  
  clustering the characters in the data block so that similar characters are grouped together within the block;
  
  reordering the data block by replacing the characters with N numerical values having a skewed frequency distribution; and
  
  adaptively compressing each data block by accumulating the numerical values into M super-character groups, wherein M is less than N, and the N numerical values are assigned to the M super-character groups based upon their expected frequency of occurrence in the data block, and generating super-character codewords that replace the numerical values in order to compress the data block.
- View Dependent Claims (27)
- - 27. The method of claim 26, wherein tile super-character codewords include a variable length prefix value that identifies the particular super-character group to which the numerical value is assigned and a fixed length index that identifies the particular numerical value within the super-character group.

28. A method of compressing a data file, comprising:
- partitioning the data file into data blocks containing N bytes;
  
  for each data block in the data file;
  
  clustering the N bytes in the data block using a clustering algorithm;
  
  reordering the N bytes of data in the clustered data block;
  
  adaptively compressing the reordered data block using a statistically coder;
  
  determining whether the adaptively compressed data block is smaller than the original data block; and
  
  if the adaptively compressed data block is smaller than the original data block, then outputting a header byte indicating that the data block is compressed along with the compressed data block, else outputting a header byte indicating that the data block is not compressed along with the original data block.

29. A method of compressing a data block comprising a plurality of characters, wherein the plurality of characters are associated with an alphabet of N characters, comprising the steps of:
- forming a super-character counting array having M elements, wherein each element of the super-character counting array is a super-character group associated with one or more characters in the alphabet, and wherein M is less than N;
  
  accumulating statistics in the super-character counting array regarding the frequency of occurrence of the characters in the data block by incrementing the elements in the array based on the occurrence of a particular character in the data block that is associated with a particular super-character group;
  
  selecting a normalization value for one of the elements in the array and normalizing the other elements in the array based on the normalization value;
  
  generating super-character codewords for each character associated with the super-character groups by selecting a variable length prefix value and a fixed length index value for each character, wherein the length of the variable length prefix value when combined with the length of the fixed length index value is less than the length of an uncompressed character; and
  
  compressing the data block by replacing the characters with the super-character codewords.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Blackberry Limited
Original Assignee
Research In Motion Limited (Blackberry Limited)
Inventors
Hind, Hugh R., Little, Herb A.
Primary Examiner(s)
Young, Brian
Assistant Examiner(s)
KOST, JASON L

Application Number

US09/031,418
Time in Patent Office

838 Days
Field of Search

341/106, 341/107
US Class Current

341/107
CPC Class Codes

H03M 7/3086   employing a sliding window,...

H03M 7/40   Conversion to or from varia...

H03M 7/46   Conversion to or from run-l...

Block-wise adaptive statistical data compressor

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Block-wise adaptive statistical data compressor

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links