Method and apparatus for implementing parallel operations in a database management system
First Claim
1. A computer-implemented method of implementing database management system (DBMS) operations in parallel, independent of physical storage locations, said computer-implemented method comprising the steps of:
- generating a serial execution plan for operations in said DBMS;
generating a parallelized execution plan for said serial execution plan, said parallelized execution plan including first and second operations, said second operation including one or more slave processes operating on a plurality of data partitions, the quantity of said data partitions being greater than the quantity of said slave processes, each of said slave processes operating on a different one of said data partitions, at least one of said slave processes operating on more than one of said data partitions;
executing said parallelized execution plan when a plurality of parallel resources of said computer system are available, said first and second operations executing in parallel; and
executing said serial execution plan when said plurality of resources are not available.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention implements parallel processing in a Database Management System. The present invention provides the ability to locate transaction and recovery information at one location and eliminates the need for read locks and two-phased commits. The present invention provides the ability to dynamically partition row sources for parallel processing. Parallelism is based on the ability to parallelize a row source, the partitioning requirements of consecutive row sources and the entire row source tree, and any specification in the SQL statement. A Query Coordinator assumes control of the processing of a entire query and can execute serial row sources. Additional threads of control, Query Server, execute a parallel operators. Parallel operators are called data flow operators (DFOs). A DFO is represented as structured query language (SQL) statements and can be executed concurrently by multiple processes, or query slaves. A central scheduling mechanism, a data flow scheduler, controls a parallelized portion of an execution plan, and can become invisible for serial execution. Table queues are used to partition and transport rows between sets of processes. Node linkages provide the ability to divide the plan into independent lists that can each be executed by a set of query slaves. The present invention maintains a bit vector that is used by a subsequent producer to determine whether any rows need to be produced to its consumers. The present uses states and a count of the slaves that have reached these states to perform its scheduling tasks.
337 Citations
19 Claims
-
1. A computer-implemented method of implementing database management system (DBMS) operations in parallel, independent of physical storage locations, said computer-implemented method comprising the steps of:
-
generating a serial execution plan for operations in said DBMS; generating a parallelized execution plan for said serial execution plan, said parallelized execution plan including first and second operations, said second operation including one or more slave processes operating on a plurality of data partitions, the quantity of said data partitions being greater than the quantity of said slave processes, each of said slave processes operating on a different one of said data partitions, at least one of said slave processes operating on more than one of said data partitions; executing said parallelized execution plan when a plurality of parallel resources of said computer system are available, said first and second operations executing in parallel; and executing said serial execution plan when said plurality of resources are not available. - View Dependent Claims (2, 3)
-
-
4. A method of generating an execution plan to process a database management system (DBMS) operation in parallel including the steps of:
-
generating an execution plan for said operation; examining said execution plan from bottom up; identifying a parallelized portion of said execution plan, said parallelized portion can be processed in parallel, said parallelized portion including first and second operations, said first and second operations being executable in parallel, said second operation including one or more slave processes operating on a plurality of data partitions, the quantity of said data partitions being greater than the quantity of said slave processes, each of said slave processes operating on a different one of said data partitions, at least one of said slave processes operating on more than one of said data partitions; identifying some serial portion of said execution plan, said serial portion can be processed in serial; allocating a central scheduler between said parallelized portion and said serial portion. - View Dependent Claims (5)
-
-
6. A computer-implemented method of executing database management system (DBMS) operations in parallel in a computer system, said method comprising the steps of:
-
generating an execution plan to execute said operations in parallel, said execution plan including first and second operations; initiating an operation coordinator in said computer system to coordinate execution of said execution plan; initiating, by said operation coordinator, a first set of slaves operating on a plurality of data partitions to produce data, the quantity of said data partitions being greater than the quantity of said first set of slave processes, each of said slave processes of said first set of slave processes operating on a different one of said data partitions, at least one of said slave processes operating on more than one of said data partitions; initiating, by said operation coordinator, a second set of slaves to consume data; and directing said second set of slaves to produce data and said first set of slaves to consume data when said first set of slaves finishes producing data. - View Dependent Claims (7)
-
-
8. A computer-implemented method of executing database management system (DBMS) operations in parallel in a computer system, said method comprising the steps of:
-
generating an execution plan to execute said operations in parallel, said execution plan including first and second operations; initiating a data flow scheduler in said computer system to coordinate data flow; initiating, by said data flow scheduler, producer slaves operating on a plurality of data partitions to produce a first data production the quantity of said data partitions being greater than the quantity of said producer slaves, each of said producer slaves operating on a different one of said data partitions, at least one of said producer slaves operating on more than one of said data partitions; initiating, by said data flow scheduler, consumer slaves to consume said first data production; transmitting a ready message to said data flow scheduler when said producer slaves become ready to produce data; transmitting a completion message to said data flow scheduler when said first data production is completed; generating, by said data flow scheduler, in response to said completion message, an identification of a plurality of said consumer slaves that did not receive data in said first data production, said generating step using information derived from said ready message; examining, by said producer slaves, said identification during a subsequent data production; and reducing said subsequent data production such that said subsequent data production does not produce data for said plurality of said consumer slaves.
-
-
9. In a computer system, a database management apparatus for implementing database operations in parallel, independent of physical storage locations, said database management apparatus comprising;
-
means for generating a serial execution plan for operations in said database management apparatus; means for generating a parallelized execution plan for said serial execution plan, said parallelized execution plan including first and second operations, said second operation including one or more slave processes operating on a plurality of data partitions, the quantity of said data partitions being greater than the quantity of said slave processes, each of said slave processes operating on a different one of said data partitions, at least one of said slave processes operating on more than one of said data partitions; means for executing said parallelized execution plan when a plurality of parallel resources of said computer system are available, said first and second operations executing in parallel; and means for executing said serial execution plan when said plurality of resources are not available. - View Dependent Claims (10, 11)
-
-
12. In a computer system, a database management apparatus for generating an execution plan to process database operations in parallel, said database management apparatus comprising;
-
means for generating an execution plan for said operations; means for examining said execution plan from bottom up; means for identifying a parallelized portion of said execution plan, said parallelized portion including first and second operations, said parallelized portion being processed in parallel, said second operation including one or more slave processes operating on a plurality of data partitions, the quantity of said data partitions being greater than the quantity of said slave processes, each of said slave processes operating on a different one of said data partitions, at least one of said slave processes operating on more than one of said data partitions; means for identifying some serial portion of said execution plan, said serial portion being processed in serial; and means for allocating a central scheduler between said parallelized portion and said serial portion. - View Dependent Claims (13)
-
-
14. In a computer system, a database management apparatus for executing database operations in parallel, said database management apparatus comprising:
-
means for generating an execution plan to execute said operations in parallel, said execution plan including first and second operations, said first and second operations being executed in parallel; means for initiating an operation coordinator in said computer system to coordinate execution of said execution plan; means for initiating, by said operation coordinator, a first set of slaves operating on a plurality of data partitions to produce data, the quantity of said data partitions being greater than the quantity of said first set of slave processes, each of said slave processes of said first set of slave processes operating on a different one of said data partitions, at least one of said slave processes operating on more than one of said data partitions; means for initiating, by said operation coordinator, a second set of slaves to consume data; and means for directing said second set of slaves to produce data and said first set of slaves to consume data when said first set of slaves finishes producing data. - View Dependent Claims (15)
-
-
16. In a computer system, a database management apparatus for executing database operations in parallel, said database management apparatus comprising:
-
means for generating an execution plan to execute said operations in parallel, said execution plan including first and second operations, said first and second operations being executed in parallel; means for initiating a data flow scheduler in said computer system to coordinate data flow; means for initiating, by said data flow scheduler, producer slaves operating on a plurality of data partitions to produce a first data production, the quantity of said data partitions being greater than the quantity of said producer slaves, each of said producer slaves operating on a different one of said data partitions, at least one of said producer slaves operating on more than one of said data partitions; means for initiating, by said data flow scheduler, consumer slaves to consume said first data production; means for transmitting a ready message to said data flow scheduler when said producer slaves become ready to produce data; means for transmitting a completion message to said data flow scheduler when said first data production is completed; means for generating, by said data flow scheduler, in response to said completion message, an identification of a plurality of said consumer slaves that did not receive data in said first data production, said generating step using information derived from said ready message; means for examining, by said producer slaves, said identification during a subsequent data production; and means for reducing said subsequent data production such that said subsequent data production does not produce data for said plurality of said consumer slaves.
-
-
17. An article of manufacture comprising a computer usable mass storage medium having computer readable program code embodied therein for causing a processing means to execute computer-implemented database management operations in parallel, independent of physical storage locations, said computer readable program code in said article of manufacture comprising:
-
computer readable program code for causing said processing means to generate a serial execution plan for said database management operations; computer readable program code for causing said processing means to generate a parallelized execution plan for said serial execution plan, said parallelized execution plan including first and second operations, said second operation including one or more slave processes operating on a plurality of data partitions, the quantity of said data partitions being greater than the quantity of said slave processes, each of said slave processes operating on a different one of said data partitions, at least one of said slave processes operating on more than one of said data partitions; computer readable program code for causing said processing means to execute said parallelized execution plan when a plurality of parallel resources of said computer system are available, said first and second operations executing in parallel; and computer readable program code for causing said processing means to execute said serial execution plan when said plurality of resources are not available. - View Dependent Claims (18, 19)
-
Specification