Data compression for reducing storage requirements in a database system
First Claim
1. A method for reducing data storage requirements in a database system, comprising:
- identifying, by a computing device, a data candidate of column data specified to be of a fixed length data type property for compression in a row of data having an uncompressed row format of column positioning based on;
a predetermined threshold configured to identify compressible column data of fixed length data types according to data type properties; and
a boundary of compression;
providing, by the computing device, an offset bitmap for the identified data candidate within the row, according to the boundary of compression, wherein the offset bitmap indicates lengths of compressed columns of the row following compression and variable length columns of the row; and
storing, by a computing device, the row containing the offset bitmap for the identified data candidate and the identified data candidate as compressed data with a compressed row format of column positioning within a database system, wherein the compressed row format of the row is repositioned from the uncompressed row format based on the identified data candidate.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, and computer program product for reducing data storage requirements in a database system are described herein. An embodiment includes identifying at least one data candidate of fixed length data type in at least one row of database data for compression based upon a predetermined threshold level and a boundary of compression, providing at least one bit within the at least one row for an identified data candidate according to the boundary of compression, and storing the at least one row as compressed data in the database system. For compression based on a row boundary, the identified data candidates for compression include fixed length columns having lengths that do not fall below the predetermined threshold level in a row of data and the at least one bit comprises a bitmap for a length of the identified data candidates following compression. For compression based on a page boundary, the identified data candidates for compression include redundant byte string data in a page of data, the redundant byte string data including matching data across columns having lengths that do not exceed the predetermined threshold level.
35 Citations
32 Claims
-
1. A method for reducing data storage requirements in a database system, comprising:
-
identifying, by a computing device, a data candidate of column data specified to be of a fixed length data type property for compression in a row of data having an uncompressed row format of column positioning based on; a predetermined threshold configured to identify compressible column data of fixed length data types according to data type properties; and a boundary of compression; providing, by the computing device, an offset bitmap for the identified data candidate within the row, according to the boundary of compression, wherein the offset bitmap indicates lengths of compressed columns of the row following compression and variable length columns of the row; and storing, by a computing device, the row containing the offset bitmap for the identified data candidate and the identified data candidate as compressed data with a compressed row format of column positioning within a database system, wherein the compressed row format of the row is repositioned from the uncompressed row format based on the identified data candidate. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system configured to reduce data storage requirements in a database system, comprising:
-
one or more processors; and a data management module configured to; identify, using the one or more processors, a data candidate of column data specified to be of a fixed length data type property for compression in a row of data having an uncompressed row format of column positioning based on; a predetermined threshold configured to identify compressible column data of fixed length data types according to data type properties; and a boundary of compression; provide, using the one or more processors, an offset bitmap for the identified data candidate within the row, according to the boundary of compression, wherein the offset bitmap indicates lengths of compressed columns of the row following compression and variable length columns of the row; and store, using the one or more processors, the row containing the offset bitmap for the identified data candidate and the identified data candidate as compressed data with a compressed row format of column positioning within a database system, wherein the compressed row format of the row is repositioned from the uncompressed row format based on the identified data candidate. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer-readable storage device having instructions stored thereon that when executed by a processor, causes the processor to perform operations comprising:
-
identifying a data candidate of column data specified to be of a fixed length data type property for compression in a row of data having an uncompressed row format of column positioning based on; a predetermined threshold configured to identify compressible column data of fixed length data types according to data type properties, and a boundary of compression; providing an offset bitmap for the identified data candidate within the row, according to the boundary of compression, wherein the offset bitmap indicates lengths of compressed columns of the row following compression and variable length columns of the row; and storing the row containing the offset bitmap for the identified data candidate and the identified data candidate as compressed data with a compressed row format of column positioning within a database system, wherein the compressed row format of the row is repositioned from the uncompressed row format based on the identified data candidate. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32)
-
Specification