Progressive refinement of a federated query plan during query execution
First Claim
Patent Images
1. A computer implemented method for progressively refining a federated query execution plan in a federated data system, the computer implemented method comprising:
- executing a federated query according to a federated query execution plan for the federated data system, wherein the federated data system has a federated server, a plurality of data sources including a federated database and a plurality of remote databases, and further wherein the federated data system sends distributed requests to local databases and remote databases for processing, and wherein the remote databases do not store statistical data so that the federated query execution plan is initially compiled without knowledge about the remote databases; and
refining the federated query execution plan during execution of the federated query, wherein refining the federated query execution plan during execution of the federated query comprises;
placing at least one re-optimization constraint, in the federated query execution plan, at a remote database where remote data, including a statistical data, is received and further processed by the federated data system during compilation of the federated query;
responsive to the at least one re-optimization constraint being violated during execution of the federated query execution plan, optimizing the federated query execution plan by refining a model of the federated query execution plan based on the statistical data received from the remote databases during a partial execution of the federated query plan in order to compensate for a lack of data availability, accuracy, or completeness of federated statistics in regard to the remote databases;
recompiling the federated query; and
re-executing the federated query execution plan; and
wherein a plurality of checkpoints are placed in the query execution plan after a first round of re-optimization so that so that all other queries to any remote database are configured to also trigger re-optimizations and all intermediate results for all rounds of re-optimization are mapped back into the query execution plan; and
wherein a CHECK operator identifies whether an actual cardinality is within a validity range and triggers at least one re-optimization when the actual cardinality is not within the validity range, and further comprising;
repeating the placing, optimizing, recompiling, and re-executing steps until a query result is achieved;
wherein in a plurality of multiple re-optimizations of a query with a limited initial knowledge about a plurality of data, each re-optimization adds an actual knowledge about a single table only, and where |σ
(p6)T|>
|σ
(p4,5)S|>
|σ
(p1,2,3)R| and an initial query execution plan chooses a plurality of wrong physical join operators, but chooses a correct join order, and further assuming that |σ
(p1,2,3)R| is larger than a default estimate for a plurality of accesses to S and T, a re-optimization places a partial result from an access to a table R in a last join so that a re-optimized plan using a more efficient join operator for a second join results in a worse overall query performance; and
wherein compensation for the worse overall query performance is made by introducing several rounds of re-optimization, each round adding an additional knowledge about additional parts of the plan until a whole plan is covered with an actual knowledge and a final access plan is developed in accordance with the following sequence;
|σ
(p1,2,3)R|<
|σ
(p4,5)S|<
|σ
(p6)T|=>
((R×
S)×
T)
1)
|σ
(p4,5)S|<
|σ
(p6)T|<
|Tempσ
(p1,2,3)R|=>
((S×
T)×
R)
2)
|σ
(p6)T|<
|Tempσ
(p1,2,3)R|<
|Tempσ
(p4,5)S|=>
((T×
R)×
S)
3)
|Tempσ
(p1,2,3)R|<
|Tempσ
(p4,5)S|<
|Tempσ
(p6)T|=>
((R×
S)×
T)
4),so that the join order in
4) is identical to the join order in
1).
1 Assignment
0 Petitions
Accused Products
Abstract
A way for progressively refining a query execution plan during query execution in a federated data system is provided. Re-optimization constraints are placed in the query execution plan during query compilation. When a re-optimization constraint is violated during query execution, a model of the query execution plan is refined using a partially executed query to form a new query execution plan. The new query execution plan is compiled. The compiled new query execution plan is executed.
112 Citations
22 Claims
-
1. A computer implemented method for progressively refining a federated query execution plan in a federated data system, the computer implemented method comprising:
-
executing a federated query according to a federated query execution plan for the federated data system, wherein the federated data system has a federated server, a plurality of data sources including a federated database and a plurality of remote databases, and further wherein the federated data system sends distributed requests to local databases and remote databases for processing, and wherein the remote databases do not store statistical data so that the federated query execution plan is initially compiled without knowledge about the remote databases; and refining the federated query execution plan during execution of the federated query, wherein refining the federated query execution plan during execution of the federated query comprises; placing at least one re-optimization constraint, in the federated query execution plan, at a remote database where remote data, including a statistical data, is received and further processed by the federated data system during compilation of the federated query; responsive to the at least one re-optimization constraint being violated during execution of the federated query execution plan, optimizing the federated query execution plan by refining a model of the federated query execution plan based on the statistical data received from the remote databases during a partial execution of the federated query plan in order to compensate for a lack of data availability, accuracy, or completeness of federated statistics in regard to the remote databases; recompiling the federated query; and re-executing the federated query execution plan; and wherein a plurality of checkpoints are placed in the query execution plan after a first round of re-optimization so that so that all other queries to any remote database are configured to also trigger re-optimizations and all intermediate results for all rounds of re-optimization are mapped back into the query execution plan; and wherein a CHECK operator identifies whether an actual cardinality is within a validity range and triggers at least one re-optimization when the actual cardinality is not within the validity range, and further comprising; repeating the placing, optimizing, recompiling, and re-executing steps until a query result is achieved; wherein in a plurality of multiple re-optimizations of a query with a limited initial knowledge about a plurality of data, each re-optimization adds an actual knowledge about a single table only, and where |σ
(p6)T|>
|σ
(p4,5)S|>
|σ
(p1,2,3)R| and an initial query execution plan chooses a plurality of wrong physical join operators, but chooses a correct join order, and further assuming that |σ
(p1,2,3)R| is larger than a default estimate for a plurality of accesses to S and T, a re-optimization places a partial result from an access to a table R in a last join so that a re-optimized plan using a more efficient join operator for a second join results in a worse overall query performance; andwherein compensation for the worse overall query performance is made by introducing several rounds of re-optimization, each round adding an additional knowledge about additional parts of the plan until a whole plan is covered with an actual knowledge and a final access plan is developed in accordance with the following sequence;
|σ
(p1,2,3)R|<
|σ
(p4,5)S|<
|σ
(p6)T|=>
((R×
S)×
T)
1)
|σ
(p4,5)S|<
|σ
(p6)T|<
|Tempσ
(p1,2,3)R|=>
((S×
T)×
R)
2)
|σ
(p6)T|<
|Tempσ
(p1,2,3)R|<
|Tempσ
(p4,5)S|=>
((T×
R)×
S)
3)
|Tempσ
(p1,2,3)R|<
|Tempσ
(p4,5)S|<
|Tempσ
(p6)T|=>
((R×
S)×
T)
4),so that the join order in
4) is identical to the join order in
1).- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product comprising a computer usable storage medium including computer usable program code for progressively refining a federated query execution plan in a federated data system stored thereon, the computer program product comprising:
-
computer usable program code for executing a federated query according to a federated query execution plan for the federated data system, wherein the federated data system has a federated server, a plurality of data sources including a federated database and a plurality of remote databases, and further wherein the federated data system sends distributed requests to local databases and remote databases for processing, and wherein the remote databases do not store statistical data so that the federated query execution plan is initially compiled without knowledge about the remote databases; and computer usable program code for refining the federated query execution plan during execution of the federated query, wherein the computer usable program code for refining the federated query execution plan during execution of the federated query comprises; computer usable program code for placing at least one re-optimization constraint, in the remote portion of the federated query execution plan, at a location where remote data, including a statistical data, is received and further processed by the federated data system during compilation of the federated query; computer usable program code for, responsive to the at least one re-optimization constraint being violated during execution of the federated query execution plan, optimizing the federated query execution plan by refining a model of the federated query execution plan based on the statistical data received from the remote databases during a partial execution of the federated query plan in order to compensate for a lack of data availability, accuracy, or completeness of federated statistics in regard to the remote databases; computer usable program code for recompiling the federated query; computer usable program code for re-executing the federated query execution plan; and wherein a plurality of checkpoints are placed in the query execution plan after a first round of re-optimization so that so that all other queries to any remote database are configured to also trigger re-optimizations and all intermediate results for all rounds of re-optimization are mapped back into the query execution plan; and wherein a CHECK operator identifies whether an actual cardinality is within a validity range and triggers at least one re-optimization when the actual cardinality is not within the validity range, and wherein costs of materialization are considered when optimizing the federated query execution plan; wherein in a plurality of multiple re-optimizations of a query with a limited initial knowledge about a plurality of data, each re-optimization adds an actual knowledge about a single table only, and where |σ
(p6)T|>
|σ
(p4,5)S|>
|σ
(p1,2,3)R| and an initial query execution plan chooses a plurality of wrong physical join operators, but chooses a correct join order, and further assuming that |σ
(p1,2,3)R| is larger than a default estimate for a plurality of accesses to S and T, a re-optimization places a partial result from an access to a table R in a last join so that a re-optimized plan using a more efficient join operator for a second join results in a worse overall query performance;wherein compensation for the worse overall query performance is made by introducing several rounds of re-optimization, each round adding an additional knowledge about additional parts of the plan until a whole plan is covered with an actual knowledge and a final access plan is developed in accordance with the following sequence;
|σ
(p1,2,3)R|<
|σ
(p4,5)S|<
|σ
(p6)T|=>
((R×
S)×
T)
1)
|σ
(p4,5)S|<
|σ
(p6)T|<
|Tempσ
(p1,2,3)R|=>
((S×
T)×
R)
2)
|σ
(p6)T|<
|Tempσ
(p1,2,3)R|<
|Tempσ
(p4,5)S|=>
((T×
R)×
S)
3)
|Tempσ
(p1,2,3)R|<
|Tempσ
(p4,5)S|<
|Tempσ
(p6)T|=>
((R×
S)×
T)
4),so that the join order in
4) is identical to the join order in
1).- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. An apparatus comprising:
-
a bus; a storage device connected to the bus, wherein the storage device contains computer usable code; at least one managed device connected to the bus; a communications unit connected to the bus; and a processing unit connected to the bus, wherein the processing unit executes the computer usable code to execute a federated query according to a federated query execution plan for the federated data system, wherein the federated data system has a federated server, a plurality of data sources including a federated database and a plurality of remote databases, and further wherein the federated data system sends distributed requests to local databases and remote databases for processing, and wherein the remote databases do not store statistical data so that the federated query execution plan is initially compiled without knowledge about the remote databases; and
refine the federated query execution plan during execution of the federated query, wherein refining the federated query execution plan during execution of the federated query comprises;
placing at least one re-optimization constraint in the federated query execution plan at a remote database where remote data, including a statistical data, is received and further processed by the federated data system during compilation of the federated query, and responsive to the at least one re-optimization constraint being violated during execution of the federated query execution plan, optimizing the federated query execution plan by refining a model of the federated query execution plan based on the statistical data received from the remote databases during a partial execution of the federated query plan in order to compensate for a lack of data availability, accuracy, or completeness of federated statistics in regard to the remote databases;
recompile the federated query; and
re-execute the federated query execution plan; andwherein a plurality of checkpoints are placed in the query execution plan after a first round of re-optimization so that all other queries to any remote database are configured to also trigger re-optimizations and all intermediate results for all rounds of re-optimization are mapped back into the query execution plan; and wherein a CHECK operator identifies whether an actual cardinality is within a validity range and triggers the at least one re-optimization when the actual cardinality is not within the validity range, and further comprising; repeating the placing, optimizing, recompiling, and re-executing steps until a query result is achieved; wherein in a plurality of multiple re-optimizations of a query with a limited initial knowledge about a plurality of data, each re-optimization adds an actual knowledge about a single table only, and where |σ
(p6)T|>
|σ
(p4,5)S|>
|σ
(p1,2,3)R| and an initial query execution plan chooses a plurality of wrong physical join operators, but chooses a correct join order, and further assuming that |σ
(p1,2,3)R| is larger than a default estimate for a plurality of accesses to S and T, a re-optimization places a partial result from an access to a table R in a last join so that a re-optimized plan using a more efficient join operator for a second join results in a worse overall query performance; andwherein compensation for the worse overall query performance is made by introducing several rounds of re-optimization, each round adding an additional knowledge about additional parts of the plan until a whole plan is covered with an actual knowledge and a final access plan is developed in accordance with the following sequence;
|σ
(p1,2,3)R|<
|σ
(p4,5)S|<
|σ
(p6)T|=>
((R×
S)×
T)
5)
|σ
(p4,5)S|<
|σ
(p6)T|<
|Tempσ
(p1,2,3)R|=>
((S×
T)×
R)
6)
|σ
(p6)T|<
|Tempσ
(p1,2,3)R|<
|Tempσ
(p4,5)S|=>
((T×
R)×
S)
7)
|Tempσ
(p1,2,3)R|<
|Tempσ
(p4,5)S|<
|Tempσ
(p6)T|=>
((R×
S)×
T)
8),so that the join order in
4) is identical to the join order in
1).
-
Specification