Generating a multi-column index for relational databases by interleaving data bits for selectivity
First Claim
Patent Images
1. A distributed data warehouse system, comprising:
- a plurality of compute nodes, each comprising one or more hardware processors, implementing;
one or more persistent storage devices providing storage for a columnar relational database table, wherein the one or more persistent storage devices comprise a plurality of data blocks;
a multi-column key generator, configured to;
identify at least two columns of a plurality of columns of the columnar relational database table; and
generate a multi-column index for the columnar relational database table based, at least in part, on an interleaving of respective data bits for selectivity from respective portions of respective data values from the identified at least two columns, wherein said multi-column index provides a respective index value for each entry of a plurality of entries of the columnar relational database table;
a write module, configured to;
direct the one or more persistent storage devices to store the plurality of entries of the columnar relational database table, wherein the plurality of entries of the columnar relational database table are directed to be stored in one or more of the plurality of data blocks of the one or more persistent storage devices in sorted order according to the respective index value for each of the plurality of entries; and
direct the one or more persistent storage devices to store metadata indicating multi-column index value ranges corresponding to the index values of the respective entries stored in each of the one or more data blocks.
1 Assignment
0 Petitions
Accused Products
Abstract
A multi-column index is generated based on an interleaving of data bits for selectivity for efficient processing of data in a relational database system. Two or more columns may be identified for inclusion in the multi-column index for a relational database table. Based, at least in part, on the interleaving of data bits for selectivity from the identified columns, a multi-column index is generated for the relational database table that provides a respective index value for each entry in the relational database table. The entries of the relational database table may then be stored according to the index values of the multi-column index.
27 Citations
22 Claims
-
1. A distributed data warehouse system, comprising:
a plurality of compute nodes, each comprising one or more hardware processors, implementing; one or more persistent storage devices providing storage for a columnar relational database table, wherein the one or more persistent storage devices comprise a plurality of data blocks; a multi-column key generator, configured to; identify at least two columns of a plurality of columns of the columnar relational database table; and generate a multi-column index for the columnar relational database table based, at least in part, on an interleaving of respective data bits for selectivity from respective portions of respective data values from the identified at least two columns, wherein said multi-column index provides a respective index value for each entry of a plurality of entries of the columnar relational database table; a write module, configured to; direct the one or more persistent storage devices to store the plurality of entries of the columnar relational database table, wherein the plurality of entries of the columnar relational database table are directed to be stored in one or more of the plurality of data blocks of the one or more persistent storage devices in sorted order according to the respective index value for each of the plurality of entries; and direct the one or more persistent storage devices to store metadata indicating multi-column index value ranges corresponding to the index values of the respective entries stored in each of the one or more data blocks. - View Dependent Claims (2, 3, 4)
-
5. A method, comprising:
performing, by one or more computing devices; identifying at least two columns of a plurality of columns of a relational database table; generating a multi-column index for the relational database table based, at least in part, on an interleaving of respective data bits for selectivity from respective portions of respective data values from the identified at least two columns, wherein said multi-column index provides a respective index value for each of a plurality of entries of the relational database table; storing the plurality of entries of the relational database table to persistent storage in sorted order according to the respective index value for each entry; and storing metadata indicating multi-column index value ranges corresponding to the index values of the respective entries stored in the persistent storage. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
-
14. A non-transitory, computer-readable storage medium, storing program instructions that when executed by one or more computing devices cause the one or more computing devices to implement a relational database system that implements:
-
identifying at least two columns of a plurality of columns of a relational database table; generating a multi-column index for the relational database table based, at least in part, on an interleaving of respective data bits for selectivity from respective portions of respective data values from the identified at least two columns, wherein said multi-column index provides a respective index value for each of a plurality of entries of the relational database table; and directing storage of; the plurality of entries of relational database table to persistent storage in sorted order according to the respective index value for each entry; and metadata indicating multi-column index value ranges corresponding to the index values of the respective entries stored in the persistent storage. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22)
-
Specification