OPTIMIZING AN ORDER OF EXECUTION OF MULTIPLE JOIN OPERATIONS

US 20180046674A1
Filed: 10/30/2017
Published: 02/15/2018
Est. Priority Date: 12/04/2012
Status: Active Grant

First Claim

Patent Images

1. A method for optimizing an order of execution of multiple join operations based on at least a first data column and a second data column in a database system having multiple processing units, the method comprising:

providing, by one or more processors, at least a first partitioning of the first data column, wherein said at least the first partitioning splits the first data column into a plurality of first subsets of rows, each of the first subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the first subsets of rows are handled by processing units that differ from one another;

providing, by one or more processors, at least a second partitioning of the second data column, wherein said at least the second partitioning splits the second data column into a plurality of second subsets of rows, each of the second subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the second subsets of rows are handled by processing units that differ from one another;

estimating, by one or more processors, cardinalities of sub-tables derived by a respective joining of a subset of rows of the first data column and a subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein the cardinalities of the sub-tables describe a quantity of rows in the sub-tables, and wherein the cardinalities of the sub-tables derived by the respective joining of the subset of rows of the first data column and the subset of rows of the second data column are estimated according to;

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method, system, and/or computer program product optimizes an order of execution of column join operations. A first partitioning of the first data column splits the first data column into first subsets of rows. A second partitioning of the second data column splits the second data column into a second subsets of rows. Cardinalities of sub-tables derived by a respective joining of the subsets of rows of the first and second data columns are estimated, based on the first and second value frequency information. An order of execution of multiple join operations is then optimized based on the estimated cardinalities of the sub-tables.

Citations

18 Claims

1. A method for optimizing an order of execution of multiple join operations based on at least a first data column and a second data column in a database system having multiple processing units, the method comprising:
- providing, by one or more processors, at least a first partitioning of the first data column, wherein said at least the first partitioning splits the first data column into a plurality of first subsets of rows, each of the first subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the first subsets of rows are handled by processing units that differ from one another;
  
  providing, by one or more processors, at least a second partitioning of the second data column, wherein said at least the second partitioning splits the second data column into a plurality of second subsets of rows, each of the second subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the second subsets of rows are handled by processing units that differ from one another;
  
  estimating, by one or more processors, cardinalities of sub-tables derived by a respective joining of a subset of rows of the first data column and a subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein the cardinalities of the sub-tables describe a quantity of rows in the sub-tables, and wherein the cardinalities of the sub-tables derived by the respective joining of the subset of rows of the first data column and the subset of rows of the second data column are estimated according to;
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18)
- - 2. The method according to claim 1, wherein a spread of estimated cardinalities of the sub-tables is evaluated for optimizing the order of execution of multiple join operations.
  - 3. The method according to claim 1, wherein the first value frequency information is provided as a set of first value frequency information, said set of first value frequency information comprising different first value frequency information, each first value frequency information being correlated with a certain partitioning of the first data column.
  - 4. The method according to claim 1, further comprising:
    - providing, by one or more processors, at least a first value frequency information for each processing unit from the multiple processing units, the first value frequency information indicating a frequency of attribute values within the subset of rows of the first data column processed by a respective processing unit from the multiple processing units; and
      
      providing, by one or more processors, at least a second value frequency information for each processing unit from the multiple processing units, the second value frequency information indicating a frequency of attribute values within the subset of rows of the second data column processed by the respective processing unit from the multiple processing units;
      
      wherein the second value frequency information is provided as a set of second value frequency information, the set of second value frequency information comprising different second value frequency information, each second value frequency information being correlated with a certain partitioning of the second data column.
  - 5. The method according to claim 4, wherein different first and second value frequency information is provided for multiple, frequently used partitionings.
  - 6. The method according to claim 1, wherein further statistical information regarding a quantity of unique values of data is provided in a column-separated, partitioning-separated and processing unit-separated manner.
  - 7. The method according to claim 4, wherein the first value frequency information and the second value frequency information are provided as a density distribution function describing a frequency of the attribute values within the subset of rows of the first or second data column processed by the respective processing units from the multiple processing units.
  - 8. The method according to claim 7, wherein the density distribution function is provided as an integrable function.
  - 9. The method according to claim 4, wherein the first value frequency information and the second value frequency information are derived based on feedback of previously performed queries to the first and second data columns.
  - 10. The method according to claim 4, wherein data for generating the first value frequency information and the second value frequency information are collected by the respective processing units from the multiple processing units.
  - 11. The method according to claim 4, further comprising:
    - transmitting data for generating the first value frequency information and the second value frequency information to a central processing unit; and
      
      generating, by the central processing unit, the first value frequency information and the second value frequency information.
  - 12. The method according to claim 4, further comprising:
    - storing the first value frequency information and the second value frequency information for later reuse.
  - 13. The method according to claim 1, wherein the cardinalities of sub-tables are estimated before starting the multiple join operations.
  - 17. The method according to claim 1, further comprising:
    - identifying, by one or more processors, the frequency of attribute values for the subset of rows of the first data column and the subset of rows in the second data column using sampling of data from the first data column and the second data column.
  - 18. The method according to claim 1, further comprising:
    - identifying, by one or more processors, all rows in a column in the database system that store a same attribute value; and
      
      merging, by one or more processors, all rows in the column in the database system that store the same attribute value in a same processing unit.

14. A database system having multiple processing units for parallel processing of join-operations based on at least a first and a second data column, the database system comprising:
- hardware means for providing at least a first partitioning of the first data column splitting the first data column in a plurality of subsets of rows, each subset of rows being correlated with a distinct processing unit from multiple processing units, wherein each of the first subset of rows are handled by processing units that differ from one another;
  
  hardware means for providing at least a second partitioning of the second data column splitting the second data column into a plurality of subsets of rows, each subset of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the second subsets of rows are handled by processing units that differ from one another;
  
  hardware means for estimating the cardinalities of sub-tables derived by the respective joining of a subset of rows of the first data column and a subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein the cardinalities of the sub-tables describe a quantity of rows in the sub-tables, and wherein the cardinalities of the sub-tables derived by the respective joining of the subset of rows of the first data column and the subset of rows of the second data column are estimated according to;

15. A computer program product for optimizing an order of execution of multiple join operations based on at least a first data column and a second data column in a database system having multiple processing units, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code readable and executable by a processor to perform a method comprising:
- providing at least a first partitioning of the first data column, wherein said at least the first partitioning splits the first data column into a plurality of first subsets of rows, each of the first subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the first subsets of rows are handled by processing units that differ from one another;
  
  providing at least a second partitioning of the second data column, wherein said at least the second partitioning splits the second data column into a plurality of second subsets of rows, each of the second subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the second subsets of rows are handled by processing units that differ from one another;
  
  estimating cardinalities of sub-tables derived by a respective joining of a subset of rows of the first data column and a subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein the cardinalities of the sub-tables describe a quantity of rows in the sub-tables, and wherein the cardinalities of the sub-tables derived by the respective joining of the subset of rows of the first data column and the subset of rows of the second data column are estimated according to;
- View Dependent Claims (16)
- - 16. The computer program product according to claim 15, wherein a spread of estimated cardinalities of the sub-tables is evaluated for optimizing the order of execution of multiple join operations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
GROCHOWSKI, MAREK, GRUSZECKI, ARTUR M., KAZALSKI, TOMASZ, MILKA, GRZEGORZ S., SKIBSKI, KONRAD K., STRADOMSKI, TOMASZ

Granted Patent

US 10,061,804 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/2453 Query optimisation

G06F 16/24544 Join order optimisation

OPTIMIZING AN ORDER OF EXECUTION OF MULTIPLE JOIN OPERATIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

OPTIMIZING AN ORDER OF EXECUTION OF MULTIPLE JOIN OPERATIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links