Dynamic partition selection
First Claim
1. A computer-implemented method comprising:
- receiving a representation of a query plan generated for a query, the query plan comprising a first plurality of operators that, when executed by one or more computing nodes, cause the one or more computing nodes to compute a result for the query, wherein the query plan includes a dynamic scan operator that represents a first computing node obtaining tuples of one or more partitions of a table from storage and transferring the tuples to a second computing node that executes a parent operator of the dynamic scan operator;
generating a partition selector operator corresponding to the dynamic scan operator, wherein the partition selector operator represents a third computing node that executes the partition selector operator including determining one or more partition identifiers of partitions of the table and transferring the one or more partition identifiers to the dynamic scan operator of the first computing node;
determining a location in the query plan for the partition selector operator, including;
determining, for each operator of a subset of operators in the query plan, whether the dynamic scan operator occurs in a subtree of the query plan that is rooted at the respective operator of the subset of operators; and
determining, using results of the determinations of whether the dynamic scan operator occurs in a subtree of the query plan that is rooted at the respective operator of the subset of operators, a first operator in the query plan that is i) a parent operator of the partition selector operator and the dynamic scan operator or ii) a child operator of the partition selector operator;
determining, using the first operator, the location in the query plan for the partition selector operator; and
generating a modified query plan having the partition selector operator at the determined location, wherein the modified query plan includes a second plurality of operators that, when executed by one or more computing nodes, cause the one or more computing nodes to compute a result for the query.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for dynamic partition selection. One of the methods includes receiving a representation of a query plan generated for a query, wherein the query plan includes a dynamic scan operator that represents a first computing node obtaining tuples of one or more partitions of a table from storage and transferring the tuples to a second computing node that executes a parent operator of the dynamic scan operator. A partition selector operator is generated corresponding to the dynamic scan operator. A location in the query plan is determined for the partition selector operator. A modified query plan is generated having the partition selector operator at the determined location.
24 Citations
23 Claims
-
1. A computer-implemented method comprising:
-
receiving a representation of a query plan generated for a query, the query plan comprising a first plurality of operators that, when executed by one or more computing nodes, cause the one or more computing nodes to compute a result for the query, wherein the query plan includes a dynamic scan operator that represents a first computing node obtaining tuples of one or more partitions of a table from storage and transferring the tuples to a second computing node that executes a parent operator of the dynamic scan operator; generating a partition selector operator corresponding to the dynamic scan operator, wherein the partition selector operator represents a third computing node that executes the partition selector operator including determining one or more partition identifiers of partitions of the table and transferring the one or more partition identifiers to the dynamic scan operator of the first computing node; determining a location in the query plan for the partition selector operator, including; determining, for each operator of a subset of operators in the query plan, whether the dynamic scan operator occurs in a subtree of the query plan that is rooted at the respective operator of the subset of operators; and determining, using results of the determinations of whether the dynamic scan operator occurs in a subtree of the query plan that is rooted at the respective operator of the subset of operators, a first operator in the query plan that is i) a parent operator of the partition selector operator and the dynamic scan operator or ii) a child operator of the partition selector operator; determining, using the first operator, the location in the query plan for the partition selector operator; and generating a modified query plan having the partition selector operator at the determined location, wherein the modified query plan includes a second plurality of operators that, when executed by one or more computing nodes, cause the one or more computing nodes to compute a result for the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
one or more first computers and one or more first storage devices storing instructions that are operable, when executed by the one or more first computers, to cause the one or more first computers to implement a select operator node that is operable to request, from a first table, one or more tuples having respective values according to a predicate expression in a query; one or more second computers and one or more second storage devices storing instructions that are operable, when executed by the one or more second computers, to cause the one or more second computers to implement a partition selector node that is operable to determine, from the predicate expression in the query according to a partition selection function, one or more partitions of a table that may include tuples having respective values that satisfy the predicate expression and provides respective identifiers for the one or more partitions to a dynamic scanner node; one or more third computers and one or more third storage devices storing instructions that are operable, when executed by the one or more third computers, to cause the one or more third computers to implement a dynamic scanner node that is operable to receive, from the partition selector node, respective identifiers of the one or more partitions, obtains tuples of the one or more partitions from storage, and provides the one or more obtained tuples to the select operator node; and one or more fourth computers and one or more fourth storage devices storing instructions that are operable, when executed by the one or more fourth computers, to cause the one or more fourth computers to implement a master node that is operable to generate, using a representation of a query plan for the query, a modified query plan for the query that comprises a plurality of operators that, when executed by one or more computing nodes, cause the one or more computing nodes to compute a result for the query, wherein the modified query plan includes a select operator that represents the select operator node, a partition selector operator that represents the partition selector node, and a dynamic scan operator that represents the dynamic scanner node; and wherein the master node is operable to generate the modified query plan by determining a location in the query plan for the partition selector operator by performing operations comprising; determining, for each operator of a subset of operators in the query plan, whether the dynamic scan operator occurs in a subtree of the query plan that is rooted at the respective operator of the subset of operators; determining, using results of the determinations of whether the dynamic scan operator occurs in a subtree of the query plan that is rooted at the respective operator of the subset of operators, a first operator in the query plan that is i) a parent operator of the partition selector operator and the dynamic scan operator or ii) a child operator of the partition selector operator; and determining, using the first operator, the location in the query plan for the partition selector operator. - View Dependent Claims (14, 15, 16)
-
-
17. A system comprising:
-
one or more first computers and one or more first storage devices storing instructions that are operable, when executed by the one or more first computers, to cause the one or more first computers to implement a join operator node that is operable to compute, according to a predicate expression in a query, a join result of pairs of first tuples of a first table and second tuples of a second table that have matching values, including matching first values of a first attribute of the first table, and matching second values of a second attribute of the second table, wherein the second table is partitioned by the second attribute of the second table; one or more second computers and one or more second storage devices storing instructions that are operable, when executed by the one or more second computers, to cause the one or more second computers to implement a table scanner node that is operable to obtain the first tuples of the first table from storage and provides the obtained first tuples to a partition selector node; one or more third computers and one or more third storage devices storing instructions that are operable, when executed by the one or more third computers, to cause the one or more third computers to implement a partition selector node that is operable to determine, according to a partition selection function for the second table, one or more partitions of the second table that may include second tuples having respective second values that match first values of the first tuples for the first attribute, and provide respective identifiers for the one or more partitions of the second table to a dynamic scanner node; one or more fourth computers and one or more fourth storage devices storing instructions that are operable, when executed by the one or more fourth computers, to cause the one or more fourth computers to implement a dynamic scanner node that is operable to receive, from the partition selector node, respective identifiers of the one or more partitions of the second table, obtain, using the respective identifiers of the one or more partitions of the second table, the second tuples of the one or more partitions from storage, and provide the obtained second tuples to the join operator node for use in computing the join result; and one or more fifth computers and one or more fifth storage devices storing instructions that are operable, when executed by the one or more fifth computers, to cause the one or more fifth computers to implement a master node that is operable to generate, using a representation of a query plan for the query, a modified query plan for the query that comprises a plurality of operators that, when executed by one or more computing nodes, cause the one or more computing nodes to compute a result for the query, wherein the modified query plan includes a join operator that represents the join operator node, a table scanner operator that represents the table scanner node, a partition selector operator that represents the partition selector node, and a dynamic scan operator that represents the dynamic scanner node; and wherein the master node is operable to generate the modified query plan by performing operations comprising; determining that the dynamic scan operator is defined in a subtree on an outer side of the join operator; and in response to determining that the dynamic scan operator is defined a subtree on the outer side of the join operator, pushing the partition selector operator to the subtree on the outer side of the join operator. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to perform operations comprising; receiving a representation of a query plan generated for a query, the query plan comprising a first plurality of operators that, when executed by one or more computing nodes, cause the one or more computing nodes to compute a result for the query, wherein the query plan includes a dynamic scan operator that represents a first computing node obtaining tuples of one or more partitions of a table from storage and transferring the tuples to a second computing node that executes a parent operator of the dynamic scan operator; generating a partition selector operator corresponding to the dynamic scan operator, wherein the partition selector operator represents a third computing node that executes the partition selector operator including determining one or more partition identifiers of partitions of the table and transferring the one or more partition identifiers to the dynamic scan operator of the first computing node; determining a location in the query plan for the partition selector operator, including; determining, for each operator of a subset of operators in the query plan, whether the dynamic scan operator occurs in a subtree of the query plan that is rooted at the respective operator of the subset of operators; determining that the dynamic scan operator occurs in a subtree rooted at a particular operator of the subset of operators; and in response to determining that the dynamic scan operator occurs in a subtree rooted at the particular operator of the subset of operators, pushing the partition selector operator to a child operator of the particular operator, wherein the child operator indicates the location for the partition selector operator in the query plan; and generating a modified query plan having the partition selector operator at the determined location, wherein the modified query plan includes a second plurality of operators that, when executed by one or more computing nodes, cause the one or more computing nodes to compute a result for the query.
Specification