Hybrid push-down/pull-up of unions with expensive operations in a federated query processor
First Claim
1. A method for executing a query in a database management system having a plurality of data sources, said method comprising:
- receiving said query, wherein said query requires performing a process on multiple first partitions of a first dataset and on multiple second partitions of a second dataset; and
developing a query execution plan such that;
if a first partition is collocated with a second partition on a same source that has a query processor adapted to perform said process, then said process is performed by said query processor on said first partition and on said second partition prior to performing unions of said multiple first partitions and a union of said multiple second partitions; and
if an additional first partition is located on a different source from an additional second partition, then said process is performed on said additional first partition and on said additional second partition after performing a union of at least one of said additional first partition with at least one other first partition and said additional second partition with at least one other second partition.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed are a method and a system for executing a query that requires an expensive process, such as a join, between two or more datasets. If each dataset has multiple partitions that are located at multiple sources, then each of the multiple partitions for each dataset must be unioned prior to completing execution of the query. The method and system develop both a query execution plan and at least one alternative query execution plan to indicate when the process should be pushed down below the unions and when the process should be pulled up above the unions based on collocation of partitions. The query execution plan and the alternative query execution plan(s) are embedded in a composite query execution plan which is evaluated and re-evaluated at run time to determine which of the query execution plan and the alternative query execution plan is currently the most efficient plan and the query is executed, accordingly.
66 Citations
20 Claims
-
1. A method for executing a query in a database management system having a plurality of data sources, said method comprising:
-
receiving said query, wherein said query requires performing a process on multiple first partitions of a first dataset and on multiple second partitions of a second dataset; and
developing a query execution plan such that;
if a first partition is collocated with a second partition on a same source that has a query processor adapted to perform said process, then said process is performed by said query processor on said first partition and on said second partition prior to performing unions of said multiple first partitions and a union of said multiple second partitions; and
if an additional first partition is located on a different source from an additional second partition, then said process is performed on said additional first partition and on said additional second partition after performing a union of at least one of said additional first partition with at least one other first partition and said additional second partition with at least one other second partition. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of executing a query in a database management system having a plurality of data sources, said method comprising:
-
receiving said query, wherein said query requires performing a join between multiple first partitions of a first dataset and multiple second partitions of a second dataset; and
developing a query execution plan such that;
if a first partition is collocated with a second partition on a same source that has a query processor adapted to perform said join, then said join is performed between said first partition and said second partition by said query processor prior to performing unions of said multiple first partitions and said multiple second partitions; and
if an additional first partition is located on a different source from an additional second partition, then said join is performed between said additional first partition and said additional second partition after performing a union of at least one of said additional first partition with at least one other first partition and said additional second partition with at least one other second partition. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A system for executing a query in a database management system having a plurality of data sources, said system comprising:
-
a primary operator adapted develop a query execution plan for a query that requires performing a process on multiple first partitions of a first dataset and on multiple second partitions of a second dataset such that;
if a first partition is collocated with a second partition on a same source having a query processor adapted to perform said process, then said process is performed between said first partition and said second partition by said query processor prior to performing unions of said multiple first partitions and said multiple second partitions; and
if an additional first partition is located on a different source from an additional second partition, then said process is performed between said additional first partition and said additional second partition after performing a union between at least one of said additional first partition with at least one other first partition and said additional second partition with at least one other second partition;
an additional query processor in communication with said primary operator and adapted to perform said process; and
a secondary operator in communication with said additional query processor and a plurality of sources for said first dataset and adapted to perform a union of said multiple first partitions. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A program storage device readable by a computer, tangibly embodying a program of instructions executable by said computer to perform a method of executing a query in a database management system having a plurality of data sources, said method comprising:
-
receiving said query, wherein said query requires performing a process on multiple first partitions of a first dataset and on multiple second partitions of a second dataset; and
developing a query execution plan such that;
if a first partition is collocated with a second partition on a same source that has a query processor adapted to perform said process, then said process is performed on said first partition and on said second partition by said query processor, prior to performing unions of said multiple first partitions and said multiple second partitions; and
if an additional first partition is located on a different source from an additional second partition, then said process is performed on said additional first partition and on said additional second partition after performing a union of at least one of said additional first partition with at least one other first partition and said additional second partition with at least one other second partition.
-
Specification