EFFICIENT JOIN WITH ONE OR MORE LARGE DIMENSION TABLES
First Claim
1. A method for efficient hash joins with a large dimension table, comprising:
- gathering table characteristics to support a query, the gathering including identifying a first set of at least one dimension table having a hash table size estimated to exceed a defined threshold, and identifying a second set of remaining tables having at least one local predicate;
executing a query responsive to the gathered table characteristics, the query execution including;
reducing each identified large dimension table in the first set, including;
building a join filter from each dimension table of the second set;
scanning a fact table and reducing the fact table by applying predicates local to the fact table, and applying each join predicate to each table in the second set by probing its corresponding join filter; and
for each large dimension table, creating a pre-join output filter from columns of the reduced fact table that join with each large dimension table;
joining all the dimension tables and the fact table in a star join while exploiting each pre-join output filter, including;
scanning each large dimension table of the first set, applying the created pre-join output filter as a local predicate during the scan, and for each qualifying row of the dimension table, creating an entry in a hash table;
scanning each dimension table in the second set, including applying predicates local to each remaining dimension table, and for each qualifying row of each remaining dimension table, creating an entry in an associated hash table; and
joining the fact table with all dimension tables, including scanning the fact table and probing each corresponding hash table for each reduced dimension table.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention relate to processing queries that utilize fact and/or dimension tables. In one aspect, a pre-join filtering phase precedes a star join. The necessary conditions for the pre-join filtering are considered for a given SQL query, including an estimated size of the hash table exceeding a threshold and presence of a local predicate either on the fact table or one or more dimension tables that is not a large dimension table. Once the necessary conditions are satisfied, the execution of the query exploits the pre-join filtering to build a pre-join output filter from columns of a reduced fact table that joins with each large dimension table. Thereafter, all the dimension tables and the fact table are joined in a star join while exploiting each pre-join filter. Accordingly, the order of when joins occur is changed in order to reduce the size of the fact table and to work from the fact table to reduce the size of large dimension tables.
-
Citations
23 Claims
-
1. A method for efficient hash joins with a large dimension table, comprising:
-
gathering table characteristics to support a query, the gathering including identifying a first set of at least one dimension table having a hash table size estimated to exceed a defined threshold, and identifying a second set of remaining tables having at least one local predicate; executing a query responsive to the gathered table characteristics, the query execution including; reducing each identified large dimension table in the first set, including; building a join filter from each dimension table of the second set; scanning a fact table and reducing the fact table by applying predicates local to the fact table, and applying each join predicate to each table in the second set by probing its corresponding join filter; and for each large dimension table, creating a pre-join output filter from columns of the reduced fact table that join with each large dimension table; joining all the dimension tables and the fact table in a star join while exploiting each pre-join output filter, including; scanning each large dimension table of the first set, applying the created pre-join output filter as a local predicate during the scan, and for each qualifying row of the dimension table, creating an entry in a hash table; scanning each dimension table in the second set, including applying predicates local to each remaining dimension table, and for each qualifying row of each remaining dimension table, creating an entry in an associated hash table; and joining the fact table with all dimension tables, including scanning the fact table and probing each corresponding hash table for each reduced dimension table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product for efficient hash joins in a query with a large dimension table, the computer program product comprising a computer-readable storage medium having program code embodied therewith, the program code being executable by a processor to:
-
gather table characteristics to support the query, the gathering including identification of a first set of at least one dimension table having a hash table size estimated to exceed a defined threshold, and identification of a second set of remaining tables having at least one local predicate; execute a query responsive to the gathered table characteristics, the query execution including; reduce each identified large dimension table in the first set, including; building a join filter from each dimension table of the second set; scanning a fact table and reducing the fact table by applying predicates local to the fact table, and applying each join predicate to each table in the second set by probing its corresponding join filter; and for each large dimension table, creating a pre-join output filter from columns of the reduced fact table that join with each large dimension table; and join all the dimension tables and the fact table in a star join while exploiting each pre-join output filter, including; scanning each large dimension table of the first set, applying the created pre-join output filter as a local predicate during the scan and for each qualifying row of the dimension table creating an entry in a hash table; scanning each dimension table in the second set, including applying predicates local to each remaining dimension table and for each qualifying row of each remaining dimension table creating an entry in an associated hash table; and joining the fact table with all dimension tables, including scanning the fact table and probing each corresponding hash table for each reduced dimension table. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
a processing unit in communication with data storage, the data storage having one or more tables of a database; a functional unit in communication with memory and the processing unit, the functional unit having tools to support query processing, the tools comprising; an identification manager to identify a first set of at least one dimension table having a hash table size estimated to exceed a defined threshold, and to identify a second set of remaining tables having at least one local predicate; a query manager in communication with the identification manager, the query manager to execute a query, the query execution including a first phase manager and a second phase manager; the first phase manager to reduce each identified large dimension table in the first set, including; build a join filter from each dimension table of the second set; scan a fact table and reduce the fact table by applying predicates local to the fact table, and apply each join predicate to each table in the second set by probing its corresponding join filter; and for each large dimension table, create a pre-join output filter from columns of the reduced fact table that join with each large dimension table; the second phase manager to join all the dimension tables and the fact table in a star join while exploiting each pre-join output filter from the first phase, including; scan each large dimension table of the first set, apply the created pre-join output filter as a local predicate during the scan and for each qualifying row of the dimension table create an entry in a hash table; and scan each dimension table in the second set, including applying predicates local to each remaining dimension table and for each qualifying row of each remaining dimension table create an entry in an associated hash table; and a join manager in communication with query manager, the join manager to join the fact table with all dimension tables, including scanning the fact table and probing each corresponding hash table for each reduced dimension table. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification