OPTIMIZING QUERIES OF PARALLEL DATABASES
First Claim
1. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for optimizing a query of the database, the method comprising:
- accessing the query, the query expressing a logical intent to retrieve specified data from within the database;
sending the query to an optimizer that lacks awareness of the database being distributed;
receiving a data structure from the optimizer, the data structure encapsulating a serial query plan search space, the serial query plan search space including one more query plans for implementing the expressed logical intent of the query;
parallelizing the serial query plan search space into a parallel query plan search space for use with the distributed database, the parallel query plan search space including one or more parallel query plans for implementing the expressed logical intent of the query, parallelizing the serial query plan search space including;
augmenting the data structure to account for data-parallelism in the database;
generating cost estimates for operations contained in the augmented data structure;
identifying a parallel query plan within the parallel query plan search space having the lowest cost based on the generated cost estimates; and
selecting the identified parallel query plan for implementing the expressed logical intent of the query.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention extends to methods, systems, and computer program products for optimizing queries of parallel databases. Queries can be partially optimized at an optimizer that is unaware of its use to optimize queries for parallel processing. The optimizer can produce a data structure (e.g., a SQL Server MEMO) that encapsulates a logical serial plan search space. The logical serial plan search space may not incorporate any notion of parallelism into the plan space itself. A parallel-aware optimizer can parallelize the logical serial plan search space by augmenting the data structure (e.g., transforming the SQL Server MEMO into a parallel MEMO). Augmentation can be with data movement operations that move data associated one or more compute nodes in a distributed architecture. Cost estimates can be calculated for the operations contained in the parallelized data structure. The parallel plan with the lowest estimated cost can be selected for the query.
54 Citations
20 Claims
-
1. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for optimizing a query of the database, the method comprising:
-
accessing the query, the query expressing a logical intent to retrieve specified data from within the database; sending the query to an optimizer that lacks awareness of the database being distributed; receiving a data structure from the optimizer, the data structure encapsulating a serial query plan search space, the serial query plan search space including one more query plans for implementing the expressed logical intent of the query; parallelizing the serial query plan search space into a parallel query plan search space for use with the distributed database, the parallel query plan search space including one or more parallel query plans for implementing the expressed logical intent of the query, parallelizing the serial query plan search space including; augmenting the data structure to account for data-parallelism in the database; generating cost estimates for operations contained in the augmented data structure; identifying a parallel query plan within the parallel query plan search space having the lowest cost based on the generated cost estimates; and selecting the identified parallel query plan for implementing the expressed logical intent of the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. At a computer system, the computer system including one or more processors and system memory, the computer system also including a parallel-aware query optimizer, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of a database in a local database instance, a method for optimizing a query of the database, the method comprising:
-
accessing a SQL Server MEMO, the SQL Server MEMO encapsulating one or more serial query plans for implementing an express logical intent of the query, the express logical intent to retrieve specified data from within the database; transforming the SQL Server MEMO into a parallel MEMO by augmenting the one or more serial query plans into one or more parallel query plans that, when executed, implement the express logical intent of the query and that account for data-parallelism in the database, augmenting including adding at least one data movement operation to each of the one or more serial query plans, added data movement operations configured to move database data associated with at least one compute node; for each of the one or more parallel query plans, generating an estimated execution cost for the parallel query plan, the estimated execution cost based on the type of data movement operation added to the query plan and on statistics for the associated database data; identifying a parallel query plan with the lowest estimated cost; and selecting the identified parallel query plan to implement the expressed logical intent of the query. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A distributed database system, the distributed database system comprising:
-
a distributed database, the distributed database distributed across a plurality of compute nodes; the plurality of compute nodes configured and a control node configured in a shared-nothing architecture, each compute node including; one or more processors; system memory; and one or more storage devices; and each compute node maintaining a portion of the database in a local database instance at the one or more storage devices; the control node including; one or more processors; system memory; one or more computer storage devices having stored thereon computer executable instructions representing a parallel-aware optimizer and a plan selector, the control node configured to; access a query, the query expressing a logical intent to retrieve specified data from within the database; and send the query to a SQL server optimizer; and wherein the parallel-aware optimizer is configured to; receive a data structure from the SQL server optimizer, the data structure containing a serial query plan search space, the serial query plan search space including one more query plans for implementing the expressed logical intent of the query; and parallelize the serial query plan search space into a parallel query plan search space for use with the distributed database, the parallel query plan search space including one or more parallel query plans for implementing the expressed logical intent of the query, parallelizing the serial query plan search space including; augment the data structure to account for data-parallelism in the database; generate cost estimates for operations contained in the augmented data structure; wherein the plan selector is configured to; identify a parallel query plan within the parallel query plan search space having the lowest cost based on the generated cost estimates; and select the identified parallel query plan for implementing the expressed logical intent of the query; and the SQL server optimizer including; a shell database; the SQL server optimizer configured to; receive the query from the control node; generate the data structure containing the serial query plan search space for the query by referring to statistics in the shell database; and return the data structure containing the serial query plan search space to the control node. - View Dependent Claims (18, 19, 20)
-
Specification