Data compression method and apparatus
First Claim
1. A method for improving compression of a stream of data comprising:
- transforming the data in accordance with a schema to form a first portion and a second portion; and
separately transforming the first portion and the second portion to form a transformed output that includes the transformed first portion and the transformed second portion.
0 Assignments
0 Petitions
Accused Products
Abstract
An improved data compression method and apparatus is provided, particularly with regard to the compression of data in tabular form such as database records. The present invention achieves improved compression ratios by utilizing metadata to transform the data in a manner that optimizes known compression techniques. In one embodiment of the invention, a schema is generated which is utilized to reorder and partition the data into low entropy and high entropy portions which are separately compressed by conventional compression methods. The high entropy portion is further reordered and partitioned to take advantage of row and column dependencies in the data. The present invention enables not only much greater compression ratios but increased speed than is achieved by compressing the untransformed data.
-
Citations
16 Claims
-
1. A method for improving compression of a stream of data comprising:
-
transforming the data in accordance with a schema to form a first portion and a second portion; and
separately transforming the first portion and the second portion to form a transformed output that includes the transformed first portion and the transformed second portion. - View Dependent Claims (2, 3, 4, 5, 11, 12, 13, 14, 15, 16)
-
-
6. A method for generating a schema for improving compression of a stream of data comprising:
-
separating a sample of the data into a first portion of low entropy and a second portion of high entropy;
partitioning the second portion into columns;
identifying combinations of columns that minimize the compressed size of the sample.
-
-
7. An apparatus for improved compression of a stream of data comprising:
-
means for transforming the data in accordance with a schema to form a first portion and a second portion; and
means for compressing the first portion separately and for separately compressing the second portion. - View Dependent Claims (8, 9, 10)
-
Specification