Shared nothing parallel execution of procedural constructs in SQL
First Claim
1. Data processor implemented method for specifying complex queries to a relational database management system implemented on a parallel network, the network including a plurality of nodes coordinated by a network protocol, each of the nodes including at least one of data processor, data storage device, and memory, wherein the data contained in the base tables of the relational database management system is retrievable by means of a query language query to the database management system, the query containing at least one of query elements which require local computation or local coordination of data computation performed across the nodes of the distributed system and query elements which are computable on the several nodes of the network, the method comprising the steps of:
- identifying and marking those subgraphs of the query which must be executed on one of a given node and a nodegroup as "No TQ Zones", or NTQZs;
recognizing the marks written on those subgraphs of the query which must be executed on a given node or nodegroup by the identifying and marking step;
responsive to the recognizing step, generating a query plan which forces the computation of the marked subgraphs to be in the same section of the query plan;
responsive to the step of generating a query plan, partitioning that query plan into subplans including the NTQZs marked in the identifying and marking step, whereby the partitions formed by the partitioning step force the operation of subplans including a NTQZ to at least one of a single-node, the coordinator node, the catalog node, and to a particular partition class on multiple nodes; and
executing the query plan.
1 Assignment
0 Petitions
Accused Products
Abstract
An automated methodology, and an apparatus for practicing the methodology, which enables the power and flexibility inherent in shared nothing parallel database systems (MPP) to be utilized on complex queries which have, heretofore, contained query elements requiring local computation or local coordination of data computation performed across the nodes of the distributed system. The present invention provides these features and advantages by identifying and marking the subgraphs containing these types of query elements as "no TQ zones" in the preparation phase prior to optimization. When the optimizer sees the markings, it builds a plan that will force the computation of the marked subgraphs to be in the same section. This preparation phase also provides the partitioning information for all inputs to the "no TQ zones". This allows the bottom-up optimizer to correctly plan the partitioning for the "no TQ zones". These partitionings can force the operation to a single-node, the coordinator node, the catalog node, or to a particular partition class on multiple nodes, or nodegroups.
260 Citations
8 Claims
-
1. Data processor implemented method for specifying complex queries to a relational database management system implemented on a parallel network, the network including a plurality of nodes coordinated by a network protocol, each of the nodes including at least one of data processor, data storage device, and memory, wherein the data contained in the base tables of the relational database management system is retrievable by means of a query language query to the database management system, the query containing at least one of query elements which require local computation or local coordination of data computation performed across the nodes of the distributed system and query elements which are computable on the several nodes of the network, the method comprising the steps of:
-
identifying and marking those subgraphs of the query which must be executed on one of a given node and a nodegroup as "No TQ Zones", or NTQZs; recognizing the marks written on those subgraphs of the query which must be executed on a given node or nodegroup by the identifying and marking step; responsive to the recognizing step, generating a query plan which forces the computation of the marked subgraphs to be in the same section of the query plan; responsive to the step of generating a query plan, partitioning that query plan into subplans including the NTQZs marked in the identifying and marking step, whereby the partitions formed by the partitioning step force the operation of subplans including a NTQZ to at least one of a single-node, the coordinator node, the catalog node, and to a particular partition class on multiple nodes; and executing the query plan. - View Dependent Claims (2)
-
-
3. A system for processing complex queries, comprising:
-
a parallel network including a plurality of nodes coordinated by a network protocol, each of the nodes including at least one of data processor, data storage device, and memory; a relational database management system having data contained in base tables, wherein the data is retrievable by means of a query language query to the database management system, and the query contains at least one of query elements which require local computation or local coordination of data computation performed across the nodes of the distributed system and query elements which are computable on the several nodes of the parallel network; a data processor for specifying the query to the relational database management system, the data processor operative for; identifying and marking those subgraphs of the query which must be executed on one of a given node and a nodegroup as "No TQ Zones", or NTQZs; recognizing the marks written on those subgraphs of the query which must be executed on a given node or nodegroup by the identifying and marking step; responsive to the recognizing step, generating a query plan which forces the computation of the marked subgraphs to be in the same section of the query plan; responsive to the step of generating a query plan, partitioning that query plan into subplans including the NTQZs marked in the identifying and marking step, whereby the partitions formed by the partitioning step force the operation of subplans including a NTQZ to at least one of a single-node, the coordinator node, the catalog node, and to a particular partition class on multiple nodes; and executing the query plan. - View Dependent Claims (4)
-
-
5. A database language compiler for specifying complex queries to a relational database management system implemented on a parallel network, the network including a plurality of nodes coordinated by a network protocol, each of the nodes including at least one of data processor, data storage device, and memory, wherein the data contained in the base tables of the relational database management system is retrievable by means of a query language query to the database management system, the query containing at least one of query elements which require local computation or local coordination of data computation performed across the nodes of the distributed system and query elements which are computable on the several nodes of the network, the database language compiler operative for:
-
identifying and marking those subgraphs of the query which must be executed on one of a given node and a nodegroup as "No TQ Zones", or NTQZs; recognizing the marks written on those subgraphs of the query which must be executed on a given node or nodegroup by the identifying and marking step; responsive to the recognizing step, generating a query plan which forces the computation of the marked subgraphs to be in the same section of the query plan; responsive to the step of generating a query plan, partitioning that query plan into subplans including the NTQZs marked in the identifying and marking step, whereby the partitions formed by the partitioning step force the operation of subplans including a NTQZ to at least one of a single-node, the coordinator node, the catalog node, and to a particular partition class on multiple nodes; and executing the query plan. - View Dependent Claims (6)
-
-
7. A computer program product for specifying complex queries to a relational database management system implemented on a parallel network, the network including a plurality of nodes coordinated by a network protocol, each of the nodes including at least one of data processor, data storage device, and memory, wherein the data contained in the base tables of the relational database management system is retrievable by means of a query language query to the database management system, the query containing at least one of query elements which require local computation or local coordination of data computation performed across the nodes of the distributed system and query elements which are computable on the several nodes of the network, the computer program product comprising:
-
a storage medium; computer software stored on the storage medium and executable on a data processor for; identifying and marking those subgraphs of the query which must be executed on one of a given node and a nodegroup as "No TQ Zones", or NTQZs; recognizing the marks written on those subgraphs of the query which must be executed on a given node or nodegroup by the identifying and marking step; responsive to the recognizing step, generating a query plan which forces the computation of the marked subgraphs to be in the same section of the query plan; responsive to the step of generating a query plan, partitioning that query plan into subplans including the NTQZs marked in the identifying and marking step, whereby the partitions formed by the partitioning step force the operation of subplans including a NTQZ to at least one of a single-node, the coordinator node, the catalog node, and to a particular partition class on multiple nodes; and executing the query plan. - View Dependent Claims (8)
-
Specification