Optimizing parallel queries using interesting distributions
First Claim
1. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for identifying and propagating interesting properties within a query plan search space, the method comprising:
- accessing a query plan search space for a query of the distributed database, the query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups; and
formulating an annotated query plan search space by, for at least one group selected from among the root group and the one or more intermediate groups;
for at least one child group of the at least one group;
identifying a distribution property indicating an interesting type of distribution relevant to the child group, the distribution property identifying a column that data for a parent group of the child group is distributed on; and
annotating the child group with the interesting type of distribution by attaching an indication of the identified column to the child group within the hierarchical structure to propagate the identified interesting type of distribution down to the child group for use in subsequent query plan pruning based on the annotated query plan search space.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention extends to methods, systems, and computer program products for optimizing parallel queries using interesting distributions. For each logical operator in an SQL server MEMO, in a top down manner from a root operator to the leaf operators, interesting distributions for the operators can be identified based on the properties of the operators. Identified interesting distributions can be propagated down to lower operators by annotating the lower operators with the interesting distributions. Thus, a SQL server MEMO can be annotated with interesting distributions propagated top down from root to leaf logical operators to generate an annotated SQL server MEMO. Parallel query plans can then be generated from the annotated SQL server MEMO in a bottom up manner from leaf operators to a root operator. Annotated interesting properties can be used to prune operators, thereby facilitating a more tractable search space for a parallel query plan.
-
Citations
20 Claims
-
1. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for identifying and propagating interesting properties within a query plan search space, the method comprising:
-
accessing a query plan search space for a query of the distributed database, the query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups; and formulating an annotated query plan search space by, for at least one group selected from among the root group and the one or more intermediate groups; for at least one child group of the at least one group; identifying a distribution property indicating an interesting type of distribution relevant to the child group, the distribution property identifying a column that data for a parent group of the child group is distributed on; and annotating the child group with the interesting type of distribution by attaching an indication of the identified column to the child group within the hierarchical structure to propagate the identified interesting type of distribution down to the child group for use in subsequent query plan pruning based on the annotated query plan search space. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for pruning a search space of query plans, the method comprising:
-
accessing an annotated query plan search space for a query of the distributed database, the annotated query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups, each of one or more groups in the annotated query plan search space annotated with indication of an interesting type of distribution by having at least one attached indication of an identified column a parent group of the group is distributed on, the identified column relevant to the group and propagated down from the parent group to annotate the group; and for each of the plurality of groups, starting at the leaf groups and in a bottom up manner; for each logical operator in the group; examining a plurality of possible input physical operators for implementing the logical operator; for each of the possible physical input operators, inserting a corresponding appropriate data movement operator to make the logical operator distribution compatible; costing each of the plurality of possible input physical operators, including corresponding inserted data movement operators; and pruning the plurality of possible physical operators by; retaining the physical operator and corresponding inserted movement operator with the overall cheapest cost; retaining the physical operator and corresponding inserted movement operator with the cheapest cost that has an output distribution matching an attached indication of an interesting type of distribution propagated down from a parent group; and removing any other physical operators. - View Dependent Claims (12, 13, 14)
-
-
15. A distributed database system, the distributed database system comprising:
-
a distributed database, the distributed database distributed across a plurality of compute nodes; the plurality of compute nodes configured and a control node configured in a shared-nothing architecture, each compute node including; one or more processors; system memory; and one or more storage devices; and each compute node maintaining a portion of the database in a local database instance at the one or more storage devices; the control node including; one or more processors; system memory; one or more computer storage devices having stored thereon computer executable instructions representing a distribution identifier, a query plan pruner, and a plan selector, the distribution identifier configured to generate an annotated query plan search space by being configured to; access a query plan search space for a query of the distributed database, the query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups; and for at least one group selected from among the root group and the one or more intermediate groups; for at least one child group of the at least one group; identify at least one distribution property indicating an interesting type of distribution relevant to the child group, each of the at least one distribution properties identifying a column that data for a parent group of the child group is distributed on; and annotate the child group with the identified interesting type of distribution by attaching an indication of the identified column to the child group within the hierarchical structure to propagate the identified interesting type of distribution down to the child group; wherein the query plan pruner is configured to; access the annotated query plan search space; and for each of the plurality of groups, starting at the leaf groups and in a bottom up manner; for each logical operator in the group; examine a plurality of possible input physical operators for implementing the logical operator; for each of the possible input physical operators, insert a corresponding appropriate data movement operators to make the logical operator distribution compatible; cost each of the plurality of possible input physical operators, including corresponding inserted data movement operators; and prune the plurality of possible physical operators by; retaining the physical operator and corresponding inserted movement operator with the overall cheapest cost; retaining the physical operator and corresponding inserted movement operator with the cheapest cost that has an output distribution matching an attached indication of an interesting type of distribution propagating down from a parent group; and removing any other physical operators. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification