Optimizing parallel queries using interesting distributions

US 9,229,979 B2
Filed: 12/11/2012
Issued: 01/05/2016
Est. Priority Date: 12/11/2012
Status: Active Grant

First Claim

Patent Images

1. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for identifying and propagating interesting properties within a query plan search space, the method comprising:

accessing a query plan search space for a query of the distributed database, the query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups; and

formulating an annotated query plan search space by, for at least one group selected from among the root group and the one or more intermediate groups;

for at least one child group of the at least one group;

identifying a distribution property indicating an interesting type of distribution relevant to the child group, the distribution property identifying a column that data for a parent group of the child group is distributed on; and

annotating the child group with the interesting type of distribution by attaching an indication of the identified column to the child group within the hierarchical structure to propagate the identified interesting type of distribution down to the child group for use in subsequent query plan pruning based on the annotated query plan search space.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention extends to methods, systems, and computer program products for optimizing parallel queries using interesting distributions. For each logical operator in an SQL server MEMO, in a top down manner from a root operator to the leaf operators, interesting distributions for the operators can be identified based on the properties of the operators. Identified interesting distributions can be propagated down to lower operators by annotating the lower operators with the interesting distributions. Thus, a SQL server MEMO can be annotated with interesting distributions propagated top down from root to leaf logical operators to generate an annotated SQL server MEMO. Parallel query plans can then be generated from the annotated SQL server MEMO in a bottom up manner from leaf operators to a root operator. Annotated interesting properties can be used to prune operators, thereby facilitating a more tractable search space for a parallel query plan.

Citations

20 Claims

1. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for identifying and propagating interesting properties within a query plan search space, the method comprising:
- accessing a query plan search space for a query of the distributed database, the query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups; and
  
  formulating an annotated query plan search space by, for at least one group selected from among the root group and the one or more intermediate groups;
  
  for at least one child group of the at least one group;
  
  identifying a distribution property indicating an interesting type of distribution relevant to the child group, the distribution property identifying a column that data for a parent group of the child group is distributed on; and
  
  annotating the child group with the interesting type of distribution by attaching an indication of the identified column to the child group within the hierarchical structure to propagate the identified interesting type of distribution down to the child group for use in subsequent query plan pruning based on the annotated query plan search space.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein identifying an interesting type of distribution relevant to the child group comprises identifying one or more of:
    - a hash-distribution on equi-join predicates for joins, a hash distribution of group-by/partitioning columns for grouping/partitioning operators, a replicated distribution for a join operator, a replicated distribution for a grouping operator, a replicated distribution for a partitioning operator, and an indication that a table is located on a control node of the distributed database.
  - 3. The method of claim 1, wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying a distribution property for a top operator.
  - 4. The method of claim 3, wherein identifying a distribution property for a top operator comprises identifying a distribution property that includes a combination of a replicated distribution and an indication that a table is located on a control node of the distributed database.
  - 5. The method of claim 1, wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying a distribution property for an insert operator, the insert operator inserting rows from a source select statement into a table, the table being hash distributed on the identified column.
  - 6. The method of claim 5, wherein identifying a distribution property for an insert operator comprises identifying the hash distribution of the identified column as an interesting distribution for the insert operator.
  - 7. The method of claim 1, wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying an inherited distribution property that originated at and was previously propagated down to the group from a parent group of the group.
  - 8. The method of claim 7, wherein annotating the child group comprises attaching the indication of the identified column to the child group within the hierarchical structure to further propagate the inherited interesting distribution.
  - 9. The method of claim 1, wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying distribution property generated by the one or more logical operators in the child group.
  - 10. The method of claim 9, wherein annotating the interesting distribution to the child group comprises attaching a plurality of indications to the child group, each of the plurality of indications identifying a column at a different parent group of the child group.

11. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for pruning a search space of query plans, the method comprising:
- accessing an annotated query plan search space for a query of the distributed database, the annotated query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups, each of one or more groups in the annotated query plan search space annotated with indication of an interesting type of distribution by having at least one attached indication of an identified column a parent group of the group is distributed on, the identified column relevant to the group and propagated down from the parent group to annotate the group; and
  
  for each of the plurality of groups, starting at the leaf groups and in a bottom up manner;
  
  for each logical operator in the group;
  
  examining a plurality of possible input physical operators for implementing the logical operator;
  
  for each of the possible physical input operators, inserting a corresponding appropriate data movement operator to make the logical operator distribution compatible;
  
  costing each of the plurality of possible input physical operators, including corresponding inserted data movement operators; and
  
  pruning the plurality of possible physical operators by;
  
  retaining the physical operator and corresponding inserted movement operator with the overall cheapest cost;
  
  retaining the physical operator and corresponding inserted movement operator with the cheapest cost that has an output distribution matching an attached indication of an interesting type of distribution propagated down from a parent group; and
  
  removing any other physical operators.
- View Dependent Claims (12, 13, 14)
- - 12. The method of claim 11, wherein at least one group having an attached indication of an interesting distribution, has an attached indication of an interesting distribution selected from among:
    - a hash-distribution on equi-join predicates for joins, a hash distribution of group-by/partitioning columns for grouping/partitioning operators, a replicated distribution for a join operator, a replicated distribution for a grouping operator, a replicated distribution for a partitioning operator, and an indication that a table is located on a control node of the distributed database.
  - 13. The method of claim 11, further comprising generating a pruned query plan search space from the pruned pluralities of possible physical operators.
  - 14. The method of claim 11, wherein each of one or more groups in the annotated query plan search space annotated with indication of an interesting type of distribution by having at least one attached indication of an identified column a parent group of the group is distributed on comprises at least one of the one or more groups having a plurality of attached indications of identified columns, each of the plurality of attached indications identifying a column that data for a different parent group of the group is distributed on.

15. A distributed database system, the distributed database system comprising:
- a distributed database, the distributed database distributed across a plurality of compute nodes;
  
  the plurality of compute nodes configured and a control node configured in a shared-nothing architecture, each compute node including;
  
  one or more processors;
  
  system memory; and
  
  one or more storage devices; and
  
  each compute node maintaining a portion of the database in a local database instance at the one or more storage devices;
  
  the control node including;
  
  one or more processors;
  
  system memory;
  
  one or more computer storage devices having stored thereon computer executable instructions representing a distribution identifier, a query plan pruner, and a plan selector, the distribution identifier configured to generate an annotated query plan search space by being configured to;
  
  access a query plan search space for a query of the distributed database, the query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups; and
  
  for at least one group selected from among the root group and the one or more intermediate groups;
  
  for at least one child group of the at least one group;
  
  identify at least one distribution property indicating an interesting type of distribution relevant to the child group, each of the at least one distribution properties identifying a column that data for a parent group of the child group is distributed on; and
  
  annotate the child group with the identified interesting type of distribution by attaching an indication of the identified column to the child group within the hierarchical structure to propagate the identified interesting type of distribution down to the child group;
  
  wherein the query plan pruner is configured to;
  
  access the annotated query plan search space; and
  
  for each of the plurality of groups, starting at the leaf groups and in a bottom up manner;
  
  for each logical operator in the group;
  
  examine a plurality of possible input physical operators for implementing the logical operator;
  
  for each of the possible input physical operators, insert a corresponding appropriate data movement operators to make the logical operator distribution compatible;
  
  cost each of the plurality of possible input physical operators, including corresponding inserted data movement operators; and
  
  prune the plurality of possible physical operators by;
  
  retaining the physical operator and corresponding inserted movement operator with the overall cheapest cost;
  
  retaining the physical operator and corresponding inserted movement operator with the cheapest cost that has an output distribution matching an attached indication of an interesting type of distribution propagating down from a parent group; and
  
  removing any other physical operators.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The distributed database system of claim 15, wherein the distribution identifier being configured to identify an interesting type of distribution relevant to the child group comprises the distribution identifier being configured to identify an interesting type of distribution that originated at and was inherited from a parent group of the child group.
  - 17. The distributed database system of claim 16, wherein the distribution identifier being configured to annotate the child group comprises the distribution identifier being configured to propagate the inherited interesting distribution down to further lower child groups.
  - 18. The distributed database system of claim 15, wherein the distribution identifier being configured to identify an interesting type of distribution relevant to the child group comprises the distribution identifier being configured to identify an interesting type of distribution generated by the one or more logical operators in the group.
  - 19. The distributed database system of claim 18, wherein the distribution identifier being configured to identify at least one distribution property indicating an interesting type of distribution relevant to the child group comprises the distribution identifier being configured to identify a plurality of distribution properties, each of the plurality of attached indications identifying a column that data for a different parent group of the group is distributed on.
  - 20. The distributed database system of claim 15, wherein the distribution identifier being configured to identify an interesting type of distribution comprises the distribution identifier being configured to identify an interesting type of distribution selected from among:
    - a hash-distribution on equi-join predicates for joins, a hash distribution of group-by/partitioning columns for grouping/partitioning operators, a replicated distribution for a join operator, a replicated distribution for a grouping operator, a replicated distribution for a partitioning operator, and an indication that a table is located on a control node of the distributed database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Shankar, Srinath, Nehme, Rimma V.
Primary Examiner(s)
Perveen, Rehana
Assistant Examiner(s)
Tran, Loc

Application Number

US13/710,470
Publication Number

US 20140164353A1
Time in Patent Office

1,120 Days
Field of Search

707/714, 707/718, 707/719
US Class Current

1/1
CPC Class Codes

G06F 16/24532   of parallel queries

G06F 16/24544   Join order optimisation

G06F 16/24554   Unary operations; Data part...

G06F 16/24573   using data annotations, e.g...

Optimizing parallel queries using interesting distributions

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Optimizing parallel queries using interesting distributions

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links