Structure of hierarchical compressed data structure for tabular data
First Claim
1. A method comprising:
- receiving a set of tabular data to populate a database table;
wherein the set of tabular data is divided into a plurality of rows in a particular row order, each of which includes data for a particular set of columns in a particular column order;
storing to a non-volatile storage, data for a first set of rows, of said plurality of rows, in a first compression unit, wherein the first set of rows is organized within the first compression unit in row-major format;
wherein the first set of rows includes a first row, a second row, and a third row;
wherein, within the particular row order, the first row precedes the second row, and the second row precedes the third row;
wherein the row-major format organization is such thatthe entirety of the first row precedes the entirety of the second row,the entirety of the second row precedes the entirety of the third row, andwithin each of the first, second and third rows, each column is stored in the non-volatile storage according to the particular column order;
storing to the non-volatile storage, data for a second set of rows, of said plurality of rows, in a second compression unit, wherein the second set of rows is organized within the second compression unit in column-major format;
wherein the second set of rows include a fourth row, a fifth row, and a sixth row;
wherein the column-major format is such thatvalues for a first column of the fourth, fifth and sixth rows precede values for a second column of the fourth, fifth and sixth rows; and
values for the second column of the fourth, fifth and sixth rows precede values for a third column of the fourth, fifth and sixth rows; and
for each of the first, second and third columns, values for the column are stored in the non-volatile storage according to the particular row order;
wherein the first compression unit and the second compression unit are structures on the non-volatile storage for organizing the tabular data;
storing data that indicates whether, within said first compression unit, data for said first set of rows is stored in column-major format or in row-major format;
wherein the steps of receiving and storing are performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
A highly flexible and extensible structure is provided for physically storing tabular data. The structure, referred to as a compression unit, may be used to store tabular data that logically resides in any type of table-like structure. According to one embodiment, compression units are recursive. Thus, a compression unit may have a “parent” compression unit to which it belongs, and may have one or more “child” compression units that belong to it. In one embodiment, compression units include metadata that indicates how the tabular data is stored within them. The metadata for a compression unit may indicate, for example, whether the data is stored in row-major or column major-format, the order of the columns within the compression unit (which may differ from the logical order of the columns dictated by the definition of their logical container), a compression technique for the compression unit, the child compression units (if any), etc.
-
Citations
52 Claims
-
1. A method comprising:
-
receiving a set of tabular data to populate a database table; wherein the set of tabular data is divided into a plurality of rows in a particular row order, each of which includes data for a particular set of columns in a particular column order; storing to a non-volatile storage, data for a first set of rows, of said plurality of rows, in a first compression unit, wherein the first set of rows is organized within the first compression unit in row-major format; wherein the first set of rows includes a first row, a second row, and a third row; wherein, within the particular row order, the first row precedes the second row, and the second row precedes the third row; wherein the row-major format organization is such that the entirety of the first row precedes the entirety of the second row, the entirety of the second row precedes the entirety of the third row, and within each of the first, second and third rows, each column is stored in the non-volatile storage according to the particular column order; storing to the non-volatile storage, data for a second set of rows, of said plurality of rows, in a second compression unit, wherein the second set of rows is organized within the second compression unit in column-major format; wherein the second set of rows include a fourth row, a fifth row, and a sixth row; wherein the column-major format is such that values for a first column of the fourth, fifth and sixth rows precede values for a second column of the fourth, fifth and sixth rows; and values for the second column of the fourth, fifth and sixth rows precede values for a third column of the fourth, fifth and sixth rows; and for each of the first, second and third columns, values for the column are stored in the non-volatile storage according to the particular row order; wherein the first compression unit and the second compression unit are structures on the non-volatile storage for organizing the tabular data; storing data that indicates whether, within said first compression unit, data for said first set of rows is stored in column-major format or in row-major format; wherein the steps of receiving and storing are performed by one or more computing devices. - View Dependent Claims (2)
-
-
3. A method comprising:
-
receiving a set of tabular data to populate a database table; wherein the set of tabular data is divided into a plurality of rows, each of which includes data for a particular set of columns; storing to a non-volatile storage, data for said particular set of columns in a first compression unit; wherein the first compression unit includes a plurality of child compression units, each of which stores data for a different one or more columns of the particular set of columns than other child compression units of the plurality of child compression units; wherein the first compression unit includes a first set of columns of the particular set of columns and a second set of different columns of the particular set of columns; and wherein the first set of columns is stored in a first child compression unit of the plurality of child compression units; and wherein the second set of different columns is stored in a second child compression unit of the plurality of child compression units; wherein the first child compression unit does not store any column of the second set of different columns and the second child compression unit does not store any column of the first set of columns; wherein the first child compression unit is different than the second child compression unit; storing metadata indicating that the first compression unit has children; wherein the first compression unit and each child compression unit of the plurality of child compression units are structures on the non-volatile storage for organizing the tabular data; and wherein the steps of receiving and storing are performed by one or more computing devices. - View Dependent Claims (4, 5, 6, 10)
-
-
7. A method comprising:
-
receiving a set of tabular data for populating a database table; wherein the set of tabular data is divided into a plurality of rows, each of which includes data for a particular set of columns; storing to a non-volatile storage, data for said plurality of rows in a parent compression unit that includes a plurality of child compression units; wherein the step of storing data in the parent compression unit includes either; (a) dividing the data between the child compression units based on rows, whereby each child compression unit stores a different set of rows of the plurality of rows, wherein the parent compression unit stores a first set of rows of the plurality of rows and a second set of different rows of the plurality of rows, and the first set of rows is stored in a first child compression unit of the plurality of child compression units and the second set of rows is stored in a second child compression unit of the plurality of child compression units, wherein the first child compression unit does not store any row of the second set of different rows and the second child compression unit does not store any row of the first set of rows, wherein the first child compression unit is different than the second child compression unit;
or(b) dividing the data between the child compression units based on columns, whereby each child compression unit stores a different set of columns of the particular set of columns, wherein the parent compression unit stores a first set of columns of the plurality of columns and a second set of different columns of the plurality of columns, and the first set of columns is stored in a third child compression unit of the plurality of child compression units and the second set of different columns is stored in a fourth child compression unit of the plurality of child compression units, wherein the third child compression unit does not store any column of the second set of different columns and the fourth child compression unit does not store any column of the first set of columns, wherein the third child compression unit is different than the fourth child compression unit; storing metadata indicating whether or not each compression unit of the parent compression unit and the plurality of child compression units has children; wherein the parent compression unit and each child compression unit of the plurality of child compression units are structures on the non-volatile storage for organizing the tabular data; and wherein steps of receiving and storing are performed by one or more computing devices. - View Dependent Claims (8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method comprising:
-
storing to a non-volatile storage, within a compression unit, data that logically belongs to a row of a table; wherein at least a portion of the data is compressed; and in response to a request to delete the row from the table, storing to the non-volatile storage metadata that indicates the row is deleted without deleting the data for the row from the compression unit; in response to a request to perform an operation while the data for the row is still in the compression unit, determining that the metadata indicates that the row is deleted and performing the operation as though the row were deleted; in response to a determination that the number of deleted rows of the compression unit exceeds a threshold, repackaging data from the compression unit into one or more new compression units; wherein the compression unit and each of the one or more new compression units are data structures on the non-volatile storage for organizing the data that logically belongs to the row of the table; wherein the method is performed by one or more computing devices. - View Dependent Claims (23, 24, 25, 26)
-
-
27. A non-transitory computer-readable storage medium storing instructions which, when executed by one or more processors, cause performance of:
-
receiving a set of tabular data to populate a database table; wherein the set of tabular data is divided into a plurality of rows in a particular row order, each of which includes data for a particular set of columns in a particular column order; storing to a non-volatile storage, data for a first set of rows, of said plurality of rows, in a first compression unit, wherein the first set of rows is organized within the first compression unit in row-major format; wherein the first set of rows includes a first row, a second row, and a third row; wherein, within the particular row order, the first row precedes the second row, and the second row precedes the third row; wherein the row-major format organization is such that the entirety of the first row precedes the entirety of the second row, the entirety of the second row precedes the entirety of the third row, and within each of the first, second and third rows, each column is stored in the non-volatile storage according to the particular column order; storing to the non-volatile storage, data for a second set of rows, of said plurality of rows, in a second compression unit, wherein the second set of rows is organized within the second compression unit in column-major format; wherein the second set of rows include a fourth row, a fifth row, and a sixth row; wherein the column-major format is such that values for a first column of the fourth, fifth and sixth rows precede values for a second column of the fourth, fifth and sixth rows; and values for the second column of the fourth, fifth and sixth rows precede values for a third column of the fourth, fifth and sixth rows; and for each of the first, second and third columns, values for the column are stored in the non-volatile storage according to the particular row order; wherein the first compression unit and the second compression unit are structures on the non-volatile storage for organizing the tabular data; storing data that indicates whether, within said first compression unit, data for said first set of rows is stored in column-major format or in row-major format. - View Dependent Claims (28)
-
-
29. A non-transitory computer-readable storage medium storing instructions which, when executed by one or more processors, cause performance of:
-
receiving a set of tabular data to populate a database table; wherein the set of tabular data is divided into a plurality of rows, each of which includes data for a particular set of columns; storing to a non-volatile storage, data for said particular set of columns in a first compression unit; wherein the first compression unit includes a plurality of child compression units, each of which stores data for a different one or more columns of the particular set of columns than other child compression units of the plurality of child compression units; wherein the first compression unit includes a first set of columns of the particular set of columns and a second set of different columns of the particular set of columns and the first set of columns is stored in a first child compression unit of the plurality of child compression units and the second set of different columns is stored in a second child compression unit of the plurality of child compression units; wherein the first child compression unit does not store any column of the second set of different columns and the second child compression unit does not store any column of the first set of columns; wherein the first child compression unit is different than the second child compression unit; storing metadata indicating that the first compression unit has children; wherein the first compression unit and each child compression unit of the plurality of child compression units are structures on the non-volatile storage for organizing the tabular data. - View Dependent Claims (30, 31, 32)
-
-
33. A non-transitory computer-readable storage medium storing instructions which, when executed by one or more processors, cause performance of:
-
receiving a set of tabular data for populating a database table; wherein the set of tabular data is divided into a plurality of rows, each of which includes data for a particular set of columns; storing to a non-volatile storage, data for said plurality of rows in a parent compression unit that includes a plurality of child compression units; wherein the step of storing data in the parent compression unit includes either; (a) dividing the data between the child compression units based on rows, whereby each child compression unit stores a different set of rows of the plurality of rows, wherein the parent compression unit stores a first set of rows of the plurality of rows and a second set of different rows of the plurality of rows, and the first set of rows is stored in a first child compression unit of the plurality of child compression units and the second set of rows is stored in a second child compression unit of the plurality of child compression units, wherein the first child compression unit does not store any row of the second set of different rows and the second child compression unit does not store any row of the first set of rows, wherein the first child compression unit is different than the second child compression unit;
or(b) dividing the data between the child compression units based on columns, whereby each child compression unit stores a different set of columns of the particular set of columns, wherein the parent compression unit stores a first set of columns of the plurality of columns and a second set of different columns of the plurality of columns, and the first set of columns is stored in a third child compression unit of the plurality of child compression units and the second set of different columns is stored in a fourth child compression unit of the plurality of child compression units, wherein the third child compression unit does not store any column of the second set of different columns and the fourth child compression unit does not store any column of the first set of columns, wherein the third child compression unit is different than the fourth child compression unit; storing metadata indicating whether or not each compression unit of the parent compression unit and the plurality of child compression units has children; wherein the parent compression unit and each child compression unit of the plurality of child compression units are structures on the non-volatile storage for organizing the tabular data. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 52)
-
-
48. A non-transitory computer-readable storage medium storing instructions which, when executed by one or more processors, cause performance of:
-
storing to a non-volatile storage, within a compression unit, data that logically belongs to a row of a table; wherein at least a portion of the data is compressed; and in response to a request to delete the row from the table, storing to the non-volatile storage metadata that indicates the row is deleted without deleting the data for the row from the compression unit; in response to a request to perform an operation while the data for the row is still in the compression unit, determining that the metadata indicates that the row is deleted and performing the operation as though the row were deleted; in response to a determination that the number of deleted rows of the compression unit exceeds a threshold, repackaging data from the compression unit into one or more new compression units; wherein the compression unit and each of the one or more new compression units are data structures on the non-volatile storage for organizing the data that logically belongs to the row of the table. - View Dependent Claims (49, 50, 51)
-
Specification