Apparatus and method for very high data rate-compression incorporating lossless data compression and expansion utilizing a hashing technique

US 5,049,881 A
Filed: 06/18/1990
Issued: 09/17/1991
Est. Priority Date: 06/18/1990
Status: Expired due to Fees

First Claim

Patent Images

1. A compression method for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression method comprising the steps of:

a. initializing a hash table and initializing an SRC pointer;

b. processing input data in the order in which the characters in the data appear and hashing input data subblocks of the minimum compression size selected;

c. maintaining a hash table which contains at each entry, an SRC pointer which points to a previous subblock which hashed to this hash table entry, such that the possibility of any string of data previously occurring in the input block may be tested by hashing the current subblock to a hash table entry, obtaining the previous SRC pointer contained in that entry, and comparing the two strings of data;

d. if the two strings of data match on at least the size of the subblock, then generating a backwards pointer to the previous occurrence of the same string of data and thereby compressing the second occurrence of the string of data;

e. if the two strings of data do not match, then storing the string of data as incompressible data; and

f. continuing steps b. through e. until the entire input data has been processed.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for compressing digital data that is represented as a sequence of characters drawn from an alphabet. An input data block is processed into an output data block composed of sections of variable length. Unlike most prior art methods which emphasize the creation of a dictionary comprised of a tree with nodes or a set of strings, the present invention creates its own pointers from the sequence characters previously processed and emphasizes the highest priority on maximizing the data rate-compression factor product. The use of previously input data acting as the dictionary combined with the use of a hashing algorithm to find candidates for string matches and the absence of a traditional string matching table and associated search time allows the compressor to very quickly process the input data block. Therefore, the result is a high data rate-compression factor product achieved due to the absence of any string storage table and matches being tested only against one string.

Citations

15 Claims

1. A compression method for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression method comprising the steps of:
- a. initializing a hash table and initializing an SRC pointer;
  
  b. processing input data in the order in which the characters in the data appear and hashing input data subblocks of the minimum compression size selected;
  
  c. maintaining a hash table which contains at each entry, an SRC pointer which points to a previous subblock which hashed to this hash table entry, such that the possibility of any string of data previously occurring in the input block may be tested by hashing the current subblock to a hash table entry, obtaining the previous SRC pointer contained in that entry, and comparing the two strings of data;
  
  d. if the two strings of data match on at least the size of the subblock, then generating a backwards pointer to the previous occurrence of the same string of data and thereby compressing the second occurrence of the string of data;
  
  e. if the two strings of data do not match, then storing the string of data as incompressible data; and
  
  f. continuing steps b. through e. until the entire input data has been processed.
- View Dependent Claims (2, 3)
- - 2. The method in accordance with claim 1 wherein said backwards pointer is encoded with a variable number of bits and storing the backwards pointer to the most recent occurrence of a string, thereby creating a short backwards distance to repeated strings.
  - 3. A method for expanding the stream of compressed data which has been compressed in accordance with claim 1 back into the original stream of data, the expansion method comprising the steps of:
    - a. determining for each segment of the compressed input data, whether it comprises incompressible data or is a back pointer reference;
      
      b. if the input data segment comprises incompressible data, copying the input data segment to the output data stream;
      
      c. if the input data segment comprises a back pointer reference, computing the location of the previously expanded data referred to, and copying that previously expanded data to the output stream; and
      
      d. continuing steps a. through c. until the entire compressed input data has been processed.

4. A compression method for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data string to be compressed, said compression method comprising the creation of a hash table, hashing each occurrence of a string of input data and subsequently searching for identical strings of input data and if such an identical string of input data is located whose string size is at least equal to the minimum compression size selected, compressing the second and all subsequent occurrences of such identical string of data, if a string of data is located which does not match to a previously compressed string of data, storing such data as uncompressed data, and for each input string after each hash is used to find a possible previous match location for the string, the location of the string is stored in the hash table, thereby using the previously processed data to act as a compression dictionary.
- View Dependent Claims (5)
- - 5. A method for expanding the stream of compressed data which has been compressed in accordance with claim 4 back into the original stream of data, the expansion method comprising sequentially analyzing each string of compressed data and if the string contains incompressible data then duplication of the string, and if the string references previously expanded data then duplication of the previously expanded data and repeating the steps until the entire compressed input data stream has been expanded back to the original data.

6. A compression method for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression method comprising the steps of:
- a. creating an identifier header having a count value for data to follow and marking the identifier header with a code to designate whether incompressible or compressible data follows;
  
  b. reading input data in the order in which the input data appears and hashing subblocks and comparing them to previous subblocks with the same hash until a match to a previous subblock occurs which is a match on a string up to the maximum count value in the identifier header, and then generating a backwards pointer to designate the second occurrence of a character string equal to a previous identical string of characters of at least minimum match size and thereby compressing the second occurrence of the string; and
  
  c. continuing generating identifier headers and data until the entire input data stream has been processed.
- View Dependent Claims (7, 8)
- - 7. The method in accordance with claim 6 wherein said backwards pointer is encoded with a variable number of bits and storing the backwards pointer to the most recent occurrence of a string, thereby creating a short backwards distance to repeated strings.
  - 8. The method in accordance with claim 6 further comprising a method for expanding the stream of compressed data back into the original stream of data, the expansion method comprising the steps of:
    - a. evaluating each section sequentially and determining if it contains incompressible data or a backwards pointer immediately following the identifier header;
      
      b. if incompressible data is found, then the count value from the identifier header determines the amount of data to be copied from the input stream;
      
      c. if a backwards pointer is found, then the count value from the identifier header determines the amount of data to be copied from the previously expanded data referred to by the backwards pointer; and
      
      d. repeating steps a. through c. until the entire compressed input stream has been expanded back to the original data.

9. In a data compression method, having defined at least one variable length compression match output token which includes a count field containing a maximum count value and an incompressible data indicator field which together form an identification header, a backwards pointer to a previously encountered data field, and a size of backwards pointer field, having further defined at least one variable length incompressible data output token which consists of the identification header followed by incompressible data of length specified by the count field, and having further defined a subblock size of a minimum length for performing hash computations that cannot result in the compressed data being larger than the uncompressed data, a method for compressing input data into compressed output data, said compression method comprising the steps of:
- a. initializing a hash table, a source pointer and a destination pointer;
  
  b. initializing a count value and initializing and storing an identification header at the location of the destination pointer and then incrementing the destination pointer;
  
  c. reading input subblocks pointed to by the source pointer, and computing hash values for the subblocks in the order in which they appear, and comparing the string located by the previous entry for this hash value to determine potential matches and unconditionally replacing the hash table entry with the current source pointer;
  
  d. if a hash match does not occur, the data character pointed to by the source pointer is copied to the data area pointed to by the destination pointer, and the source pointer, destination pointer, and count value are incremented such that if the count value then is equal to the maximum count value, the maximum count value is stored in the current identification header and processing resumes with step b.;
  
  e. if a hash match occurs, the match length of the data pointed to by the current hash table value and the data pointed to by the prior identical hash table value is determined;
  
  f. if the match length so computed is less than the minimum match length, step d. is performed as if no hash value match had occurred;
  
  g. if the match length so computed is equal to or greater than the minimum match length, the current identification header is updated with the count value to complete a prior packet of uncompressible data and an output match token is then stored at the location pointed to by the destination point consisting of a count field containing the match count, a raw data indicator field indicating that a backwards pointer follows, and a variable length backwards pointer that points to the most recent previous occurrence of the matching data and then the source pointer is incremented by the match count and the destination pointer is incremented by the output match size; and
  
  h. repeating steps b. through g. until the source data has been processed.

10. A compression apparatus for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression apparatus comprising:
- a. means for initializing a hash table and means for initializing an SRC pointer;
  
  b. means for processing input data in the order in which the characters in the data appear and means for hashing input data subblocks of the minimum compression size selected;
  
  c. means for maintaining a hash table which contains at each entry, an SRC pointer which points to a previous subblock which hashed to this hash table entry, such that the possibility of any string of data previously occurring in the input block may be tested by hashing the current subblock to a hash table entry, means for obtaining the previous SRC pointer contained in that entry, and means for comparing the two strings of data;
  
  d. if the two strings of data match on at least the size of the subblock, then means for generating a backwards pointer to the previous occurrence of the same string of data and thereby compressing the second occurrence of the string of data; and
  
  e. if the two strings of data do not match, then means for storing the string of data as incompressible data.
- View Dependent Claims (11)
- - 11. Means for expanding the stream of compressed data which has been compressed in accordance with claim 10 back into the original stream of data, the expansion means comprising:
    - a. means for determining for each segment of the compressed input data, whether it comprises incompressible data or is a back pointer reference;
      
      b. if the input data segment comprises incompressible data, means for copying the input data segment to the output data stream; and
      
      c. if the input data segment comprises a back pointer reference, means for computing the location of the previously expanded data referred to, and copying that previously expanded data to the output stream.

12. A compression apparatus for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data string to be compressed, said compression apparatus comprising the creation of a hash table, means for hashing each occurrence of a string of input data and subsequently searching for identical strings of input data and if such an identical string of input data is located whose string size is at least equal to the minimum compression size selected, means for compressing the second and all subsequent occurrences of such identical string of data, if a string of data is located which does not match to a previously compressed string of data, means for storing such data as uncompressed data, and for each input string after each hash is used to find a possible previous match location for the string, means for storing the location of the string in the hash table, thereby using the previously processed data to act as a compression dictionary.
- View Dependent Claims (13)
- - 13. An apparatus for expanding the stream of compressed data which has been compressed in accordance with claim 12 back into the original stream of data, the expansion apparatus comprising means for sequentially analyzing each string of compressed data and if the string contains incompressible data then means for duplication of the string, and if the string references previously expanded data then means for duplication of the previously expanded data.

14. A compression apparatus for compressing a stream of input data into a compressed stream of output data based on a minimum number of characters in each input data subblock to be compressed, said compression apparatus comprising:
- a. means for creating an identifier header having a count value for data to follow and means for marking the identifier header with a code to designate whether incompressible or compressible data follows; and
  
  b. means for reading input data in the order in which the input data appears and means for hashing subblocks and comparing them to previous subblocks with the same hash until a match to a previous subblock occurs which is a match on a string up to the maximum count value in the identifier header, and then means for generating a backwards pointer to designate the second occurrence of a character string equal to a previous identical string of characters of at least minimum match size and thereby compressing the second occurrence of the string.
- View Dependent Claims (15)
- - 15. The apparatus in accordance with claim 14 further comprising an apparatus for expanding the stream of compressed data back into the original stream of data, the expansion apparatus comprising:
    - a. means for evaluating each section sequentially and determining if it contains incompressible data or a backwards pointer immediately following the identifier header;
      
      b. if incompressible data is found, then using means for copying data from the input stream according to the count value in the identifier header; and
      
      c. if a backwards pointer is found, then using means for copying data from the previously expanded data referred to by the backwards pointer according to the count value in the identifier header.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intersecting Concepts Incorporated
Original Assignee
Intersecting Concepts Incorporated
Inventors
Graybill, Mark D., Gibson, Dean K.
Primary Examiner(s)
Pellinen, A. D.
Assistant Examiner(s)
Young, Brian K.

Application Number

US07/539,880
Time in Patent Office

456 Days
Field of Search

341/51, 341/55, 341/60, 341/79, 341/81, 341/87, 341/65, 341/67, 341/95, 341/107, 364/200, 364/260.6, 364/900, 364/951.3, 383/56, 358/261.2
US Class Current

341/95
CPC Class Codes

H03M 7/3086 employing a sliding window,...

Apparatus and method for very high data rate-compression incorporating lossless data compression and expansion utilizing a hashing technique

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for very high data rate-compression incorporating lossless data compression and expansion utilizing a hashing technique

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links