Data compression using a nested hierarchy of fixed phrase length dictionaries
First Claim
Patent Images
1. A method for compressing a stream of symbols, comprising:
- dividing the stream into fixed-length blocks;
for each of the fixed-length blocks, searching entries in a plurality of dictionaries for fixed-length phrases obtained from the each of the fixed-length blocks;
choosing one of a plurality of partitions of the each of the fixed-length blocks based on the results of the step of searching and on a specified plurality of allowed partitions, wherein the one of the plurality of partitions comprises a plurality of non-overlapping component phrases, and wherein a concatenation of the plurality of non-overlapping component phrases comprises the each of the fixed-length blocks; and
for each of the non-overlapping component phrases, obtaining one of a pointer and a literal to represent the each of the non-overlapping component phrases.
2 Assignments
0 Petitions
Accused Products
Abstract
Exemplary embodiments are described herein whereby blocks of data are losslessly compressed and decompressed using a nested hierarchy of fixed phrase length dictionaries. The dictionaries may be built using information related to the manner in which data is commonly organized in computer systems for convenient retrieval, processing, and storage. This results in low cost designs that give significant compression. Further, the methods can be implemented very efficiently in hardware.
71 Citations
30 Claims
-
1. A method for compressing a stream of symbols, comprising:
-
dividing the stream into fixed-length blocks;
for each of the fixed-length blocks, searching entries in a plurality of dictionaries for fixed-length phrases obtained from the each of the fixed-length blocks;
choosing one of a plurality of partitions of the each of the fixed-length blocks based on the results of the step of searching and on a specified plurality of allowed partitions, wherein the one of the plurality of partitions comprises a plurality of non-overlapping component phrases, and wherein a concatenation of the plurality of non-overlapping component phrases comprises the each of the fixed-length blocks; and
for each of the non-overlapping component phrases, obtaining one of a pointer and a literal to represent the each of the non-overlapping component phrases. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for compressing a stream of symbols in parallel, comprising:
-
dividing the stream into collections of fixed-length blocks, wherein each item in the collections comprises one fixed-length block;
for the each item, searching in parallel entries in a plurality of dictionaries for fixed-length phrases obtained from the each item;
for the each item, choosing one of a plurality of partitions based on (a) the results of the step of searching and (b) on a specified plurality of allowed partitions, wherein the one of the plurality of partitions comprises a plurality of non-overlapping component phrases, and wherein a concatenation of the plurality of non-overlapping component phrases comprises the each item; and
for the each item and for each component phrase of the one of the plurality of partitions, obtaining one of a pointer and a literal to represent the each component phrase. - View Dependent Claims (20, 21, 22, 23)
-
-
24. A method for hierarchically aligning a stream of symbols in which the length of phrases of smaller length divide the length of phrases of longer length, comprising:
for a given length, the given length comprising each incrementally longer length starting from the smallest length, (a) maintaining separate dictionaries for different alignments associated with the given length;
(b) counting the number of times a phrase is not found in each of the dictionaries and (c) choosing one of the different alignments based on the result of the step of counting.- View Dependent Claims (25, 26, 27)
-
28. In a system comprising a hierarchical data structure, wherein the hierarchical data structure comprises a first dictionary and a second dictionary, wherein the first dictionary comprises at least one first phrase of a first fixed-length, wherein the second dictionary comprises at least one second phrase of a second fixed-length differing from the first phrase length, wherein each of the at least one first phrase and at least one second, phrase is associated with a unique hash key, a method for compressing a block of data using the dictionary, comprising:
-
(a) segmenting the block into first plurality of subblocks, wherein the size of each of the first plurality of subblocks is the first fixed-length;
(b) segmenting the block into a second plurality of subblocks, wherein the size of each of the second plurality of subblocks is the second fixed-length;
(c) querying the first dictionary for each of the first plurality of subblocks to find a at least one first match;
(d) querying the second dictionary for each of the second plurality of subblocks to find at least one second match;
(e) if at least one of the first match is found in the dictionary, encoding the first match using a first unique pointer associated with the at least one first match; and
(f) if at least one of the second match is found in the dictionary, encoding the at least one second match using a second unique pointer associated with the at least one second match. - View Dependent Claims (29, 30)
-
Specification