OPTIMIZING AN ORDER OF EXECUTION OF MULTIPLE JOIN OPERATIONS
First Claim
1. A method for optimizing an order of execution of multiple join operations based on at least a first data column and a second data column in a database system having multiple processing units, the method comprising:
- providing, by one or more processors, at least a first partitioning of the first data column, wherein said at least the first partitioning splits the first data column into a plurality of first subsets of rows, each of the first subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the first subsets of rows are handled by processing units that differ from one another;
providing, by one or more processors, at least a second partitioning of the second data column, wherein said at least the second partitioning splits the second data column into a plurality of second subsets of rows, each of the second subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the second subsets of rows are handled by processing units that differ from one another;
estimating, by one or more processors, cardinalities of sub-tables derived by a respective joining of a subset of rows of the first data column and a subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein the cardinalities of the sub-tables describe a quantity of rows in the sub-tables, and wherein the cardinalities of the sub-tables derived by the respective joining of the subset of rows of the first data column and the subset of rows of the second data column are estimated according to;
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method, system, and/or computer program product optimizes an order of execution of column join operations. A first partitioning of the first data column splits the first data column into first subsets of rows. A second partitioning of the second data column splits the second data column into a second subsets of rows. Cardinalities of sub-tables derived by a respective joining of the subsets of rows of the first and second data columns are estimated, based on the first and second value frequency information. An order of execution of multiple join operations is then optimized based on the estimated cardinalities of the sub-tables.
-
Citations
18 Claims
-
1. A method for optimizing an order of execution of multiple join operations based on at least a first data column and a second data column in a database system having multiple processing units, the method comprising:
-
providing, by one or more processors, at least a first partitioning of the first data column, wherein said at least the first partitioning splits the first data column into a plurality of first subsets of rows, each of the first subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the first subsets of rows are handled by processing units that differ from one another; providing, by one or more processors, at least a second partitioning of the second data column, wherein said at least the second partitioning splits the second data column into a plurality of second subsets of rows, each of the second subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the second subsets of rows are handled by processing units that differ from one another; estimating, by one or more processors, cardinalities of sub-tables derived by a respective joining of a subset of rows of the first data column and a subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein the cardinalities of the sub-tables describe a quantity of rows in the sub-tables, and wherein the cardinalities of the sub-tables derived by the respective joining of the subset of rows of the first data column and the subset of rows of the second data column are estimated according to; - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18)
-
-
14. A database system having multiple processing units for parallel processing of join-operations based on at least a first and a second data column, the database system comprising:
-
hardware means for providing at least a first partitioning of the first data column splitting the first data column in a plurality of subsets of rows, each subset of rows being correlated with a distinct processing unit from multiple processing units, wherein each of the first subset of rows are handled by processing units that differ from one another; hardware means for providing at least a second partitioning of the second data column splitting the second data column into a plurality of subsets of rows, each subset of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the second subsets of rows are handled by processing units that differ from one another; hardware means for estimating the cardinalities of sub-tables derived by the respective joining of a subset of rows of the first data column and a subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein the cardinalities of the sub-tables describe a quantity of rows in the sub-tables, and wherein the cardinalities of the sub-tables derived by the respective joining of the subset of rows of the first data column and the subset of rows of the second data column are estimated according to;
-
-
15. A computer program product for optimizing an order of execution of multiple join operations based on at least a first data column and a second data column in a database system having multiple processing units, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code readable and executable by a processor to perform a method comprising:
-
providing at least a first partitioning of the first data column, wherein said at least the first partitioning splits the first data column into a plurality of first subsets of rows, each of the first subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the first subsets of rows are handled by processing units that differ from one another; providing at least a second partitioning of the second data column, wherein said at least the second partitioning splits the second data column into a plurality of second subsets of rows, each of the second subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the second subsets of rows are handled by processing units that differ from one another; estimating cardinalities of sub-tables derived by a respective joining of a subset of rows of the first data column and a subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein the cardinalities of the sub-tables describe a quantity of rows in the sub-tables, and wherein the cardinalities of the sub-tables derived by the respective joining of the subset of rows of the first data column and the subset of rows of the second data column are estimated according to; - View Dependent Claims (16)
-
Specification