Fast algorithms for computing semijoin reduction sequences
First Claim
1. A method for executing an outer join query on data arranged in a plurality of databases in which all joins are evaluated by semijoins, the method comprising:
- generating, by one or more programmable processors executing one or more computer programs, a query graph of nodes maintaining the data arranged in the plurality of databases and target nodes for an outer join query, comprising;
computing, by the one or more programmable processors executing one or more computer programs, for each node the set of nodes influencing it;
computing, by the one or more programmable processors executing one or more computer programs, for each target node the set of its needed reducers;
predetermining, by the one or more programmable processors executing one or more computer programs, the effects of admissible move sequences;
forming, by the one or more programmable processors executing one or more computer programs, a directed graph for the admissible move sequences; and
performing by the one or more programmable processors executing one or more computer programs, an optimization process using a search algorithm on the directed graph to determine a least-cost execution plan for the query, wherein the optimization process further comprises determining a start state, from a plurality of states in the formed directed graph, from which the determined execution plan is to start by performing one or more start heuristics procedures, the one or more start heuristics procedures comprising one or more first and second order heuristics, the least-cost execution plan for the query reducing total communication costs with regard to the query;
wherein a snake heuristic (i) determines costs for a next move continuing in a same direction along the query graph as compared to costs for the next move turning and moving behind a last turning point; and
(ii) selects the next move with lesser costs.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for using optimization techniques to construct a nearly optimal execution plan for an outer join query are disclosed. A query graph of the outer join query is constructed, by computing for each node the set of nodes influencing it, for each target node the set of its needed reducers, and predetermining the effects of all admissible moves in all possible sequences. The directed graph of all admissible move sequences is formed. An optimization process includes dynamically generating good estimations for the target distance of a search state. Some heuristics are disclosed for providing start solutions for the optimization process.
-
Citations
17 Claims
-
1. A method for executing an outer join query on data arranged in a plurality of databases in which all joins are evaluated by semijoins, the method comprising:
-
generating, by one or more programmable processors executing one or more computer programs, a query graph of nodes maintaining the data arranged in the plurality of databases and target nodes for an outer join query, comprising; computing, by the one or more programmable processors executing one or more computer programs, for each node the set of nodes influencing it; computing, by the one or more programmable processors executing one or more computer programs, for each target node the set of its needed reducers; predetermining, by the one or more programmable processors executing one or more computer programs, the effects of admissible move sequences; forming, by the one or more programmable processors executing one or more computer programs, a directed graph for the admissible move sequences; and performing by the one or more programmable processors executing one or more computer programs, an optimization process using a search algorithm on the directed graph to determine a least-cost execution plan for the query, wherein the optimization process further comprises determining a start state, from a plurality of states in the formed directed graph, from which the determined execution plan is to start by performing one or more start heuristics procedures, the one or more start heuristics procedures comprising one or more first and second order heuristics, the least-cost execution plan for the query reducing total communication costs with regard to the query; wherein a snake heuristic (i) determines costs for a next move continuing in a same direction along the query graph as compared to costs for the next move turning and moving behind a last turning point; and
(ii) selects the next move with lesser costs. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for executing an outer join query on data arranged in a plurality of databases in which all joins are evaluated by semijoins, the method comprising:
-
generating by one or more programmable processors executing one or more computer programs, a query graph of nodes maintaining the data arranged in the plurality of databases and target nodes for the outer join query; and performing, by the one or more programmable processors executing one or more computer programs, an optimization process, using a search algorithm on a directed graph to determine a least-cost execution plan for the query, wherein the optimization process further comprises determining a start state, from a plurality of states in the directed graph, from which the determined execution plan is to start by performing one or more start heuristics procedures, the one or more start heuristics procedures comprising one or more first and second order heuristics; wherein a snake heuristic (i) determines costs for a next move continuing in a same direction along the query graph as compared to costs for the next move turning and moving behind a last turning point; and
(ii) selects the next move with lesser costs. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product embodied on a non-transitory storage media comprising, executable code to cause a data processing apparatus to perform operations comprising:
-
executing an outer join query on data arranged in a plurality of databases in which all joins are evaluated by semijoins in order to reduce a tuple list, the outer join query comprising; generating a query graph of nodes maintaining the data arranged in the plurality of databases and target nodes for a query, comprising; computing for each node the set of nodes influencing it;
computing for each target node the set of its needed reducers;predetermining the effects of admissible move sequences; forming a directed graph for the admissible move sequences; and performing an optimization process, using a search algorithm on the directed graph to determine a least-cost execution plan for the query, wherein the optimization process further comprises determining a start state, from a plurality of states in the formed directed graph, from which the determined execution plan is to start by performing one or more start heuristics procedures, the one or more start heuristics procedures comprising one or more first and second order heuristic; wherein a snake heuristic (i) determines costs for a next move continuing in a same direction along the query graph as compared to costs for the next move turning and moving behind a last turning point; and
(ii) selects the next move with lesser costs. - View Dependent Claims (17)
-
Specification