Method and apparatus for data compression utilizing efficient pattern discovery
First Claim
1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for compression of a sequence of characters, said method steps comprising:
- identifying a set of proper templates;
identifying a first set of patterns based on said set of proper templates and said sequence of characters, wherein each pattern within said first set of patterns is contained within said sequence of characters; and
combining patterns within said first set of patterns to form a second set of patterns, wherein each pattern within said second set of patterns is contained within said sequence of characters;
selecting a subset of said second set of patterns; and
generating compressed data representing said sequence of characters, said compressed data comprising first data and second data, said first data representing each selected pattern of said subset, and second data representing said sequence of characters wherein occurrences of each selected pattern within said sequence of characters is replaced by a reference to first data corresponding to the selected pattern.
1 Assignment
0 Petitions
Accused Products
Abstract
The method of the present invention discovers patterns in an sequence of characters in two phases. In a sampling phase, preferably proper templates corresponding to the sequence of characters are generated. Patterns are then generated corresponding to the templates and stored in memory. In a convolution phase, the patterns stored in memory are combined to identify a set of maximal patterns. A subset of the maximal patterns is selected. Compressed data representing the sequence of characters is generated. The compressed data includes first data data representing each selected pattern of the subset, and second data representing the sequence of characters wherein occurrences of each selected pattern within the sequence of characters is replaced by a reference to first data corresponding to the selected pattern. The method is useful in compressing information stored in a database or compressing information communicated over a communication link.
-
Citations
46 Claims
-
1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for compression of a sequence of characters, said method steps comprising:
-
identifying a set of proper templates; identifying a first set of patterns based on said set of proper templates and said sequence of characters, wherein each pattern within said first set of patterns is contained within said sequence of characters; and combining patterns within said first set of patterns to form a second set of patterns, wherein each pattern within said second set of patterns is contained within said sequence of characters; selecting a subset of said second set of patterns; and generating compressed data representing said sequence of characters, said compressed data comprising first data and second data, said first data representing each selected pattern of said subset, and second data representing said sequence of characters wherein occurrences of each selected pattern within said sequence of characters is replaced by a reference to first data corresponding to the selected pattern. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for compression of a sequence of characters, said method steps comprising:
-
identifying a first set of patterns, wherein each pattern within said first set of patterns is contained within said sequence of characters; and combining convolvable patterns within said first set of patterns to form a second set of patterns, wherein each pattern within said second set of patterns is contained within said sequence of characters; selecting a subset of said second set of patterns; and generating compressed data representing said sequence of characters, said compressed data comprising first data and second data, said first data representing each selected pattern of said subset, and second data representing said sequence of characters wherein occurrences of each selected pattern within said sequence of characters is replaced by a reference to first data corresponding to the selected pattern. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
Specification