System and methodology for parallel query optimization using semantic-based partitioning
First Claim
1. In a database system comprising a database storing data in database tables, a method for improving query performance by dynamically partitioning said data, the method comprising:
- generating, by at least one processor, a plurality of subplans for obtaining data requested by the query, each subplan including one or more operators for performing relational operations;
determining if partitioning of data is advantageous for performing a given relational operation, wherein said data includes unpartitioned data, and wherein determining if partitioning of data is advantageous for performing the given relational operation comprises determining whether an amount of data involved for the given relational operation satisfies a threshold;
performing the given relational operation serially with at least some of said plurality of subplans if partitioning of data is determined to be not advantageous;
adding operators for partitioning data and performing the given relational operation in parallel to at least some of said plurality of subplans if partitioning of data is determined to be advantageous;
building a plan for execution of the query based, at least in part, upon selecting subplans having favorable execution costs; and
executing the plan and returning results in response to the query.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and methodology for parallel query optimization using semantic-based partitioning is described. In one embodiment, for example, in a database system comprising a database storing data in database tables, a method is described for improving query performance by dynamically partitioning the data, the method comprises steps of: receiving a query requesting data from the database; generating a plurality of subplans for executing the query, each subplan including one or more operators for performing relational operations; adding operators for partitioning data and performing a given relational operation in parallel to at least some of the plurality of subplans; and building a plan for execution of the query based, at least in part, upon selecting subplans having favorable execution costs.
121 Citations
46 Claims
-
1. In a database system comprising a database storing data in database tables, a method for improving query performance by dynamically partitioning said data, the method comprising:
-
generating, by at least one processor, a plurality of subplans for obtaining data requested by the query, each subplan including one or more operators for performing relational operations; determining if partitioning of data is advantageous for performing a given relational operation, wherein said data includes unpartitioned data, and wherein determining if partitioning of data is advantageous for performing the given relational operation comprises determining whether an amount of data involved for the given relational operation satisfies a threshold; performing the given relational operation serially with at least some of said plurality of subplans if partitioning of data is determined to be not advantageous; adding operators for partitioning data and performing the given relational operation in parallel to at least some of said plurality of subplans if partitioning of data is determined to be advantageous; building a plan for execution of the query based, at least in part, upon selecting subplans having favorable execution costs; and executing the plan and returning results in response to the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. In a database system, a method for optimization of a query requesting data from a database, the method comprising:
-
constructing, by at least one processor, a tree of relational operators based on the query, each relational operator for performing a given relational operation; determining whether partitioning of data processed by a particular relational operator in said tree is advantageous for executing a relational operation in parallel, wherein said data includes unpartitioned data, and wherein determining whether partitioning of data processed by a particular relational operator in said tree is advantageous for executing the relational operation in parallel comprises determining whether an amount of data involved for the given relational operation satisfies a threshold; if partitioning of data processed by a particular relational operation in said tree is determined to be not advantageous, executing the particular relational operation serially; if partitioning of data processed by a particular relational operator in said tree is determined to be advantageous, creating a revised tree by adding operators to said tree for dividing data processed by the particular relational operator and executing the particular relational operator in parallel over the divided data; generating a plan for execution of the query based on said revised tree; and
executing the plan and returning results in response to the query. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A database system for dynamically partitioning data during query processing, the system comprising:
-
a computer having memory and at least one processor for running the database system; a module operable to generate a plurality of plan fragments for obtaining data requested by a query from database tables of the database system;
wherein at least some of the data is unpartitioned;a partitioning module operable to create additional plan fragments by adding operators for dynamically partitioning data and processing the partitioned data in parallel to at least some of said plan fragments; a module operable to determine whether dynamically partitioning data is advantageous, based on whether an amount of data involved for a plan fragment satisfies a threshold; and a module operable to construct a final plan for execution of the query based, at least in part, upon selecting plan fragments having operators for dynamically partitioning data and processing the partitioned data in parallel when dynamically partitioning data is determined to be advantageous and selecting plan fragments for processing the data serially when dynamically partitioning data is determined to be not advantageous. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. In a database system comprising a database storing data in database tables, a method for improving query performance comprising:
-
receiving a query specifying a join of two or more database tables; as data is retrieved from the database during processing of the query, determining, by at least one processor, whether partitioning is advantageous for a given relational operation to be performed for returning data requested by the query based, at least in part, on determining whether a quantity of data to be processed during a given relational operation satisfies a threshold; and if partitioning is determined to be not advantageous, processing said query serially; if partitioning is determined to be advantageous, partitioning data to be processed during a given relational operation into separate memory;
wherein at least some of the data is unpartitioned when retrieved from the database and before being partitioned into separate memory buffers;processing said query in parallel by concurrently processing said data in said memory buffers if partitioning is determined to be advantageous; and returning results of processing said query in response to said query. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43)
-
-
44. A computer readable medium encoded with a computer program, the program comprising instructions that when executed by one or more processors cause the one or more processors to perform operations comprising:
-
generating a plurality of subplans for obtaining data requested by a query, each subplan including one or more operators for performing relational operations; determining if partitioning of data is advantageous for performing a given relational operation, wherein said data includes unpartitioned data, and wherein determining if partitioning of data is advantageous for performing the given relational operation comprises determining whether an amount of data involved for the given relational operation satisfies a threshold; performing the given relational operation serially with at least some of said plurality of subplans if partitioning of data is determined to be not advantageous; adding operators for partitioning data and performing the given relational operation in parallel to at least some of said plurality of subplans if partitioning of data is determined to be advantageous; building a plan for execution of the query based, at least in part, upon selecting subplans having favorable execution costs; and executing the plan and returning results in response to the query.
-
-
45. A computer readable medium encoded with a computer program, the program comprising instructions that when executed by one or more processors cause the one or more processors to perform operations comprising:
-
constructing a tree of relational operators based on a query, each relational operator for performing a given relational operation; determining whether partitioning of data processed by a particular relational operator in said tree is advantageous for executing a relational operation in parallel, wherein said data includes unpartitioned data, and wherein determining whether partitioning of data processed by a particular relational operator in said tree is advantageous for executing the relational operation in parallel comprises determining whether an amount of data involved for the given relational operation satisfies a threshold; if partitioning of data processed by a particular relational operation in said tree is determined to be not advantageous, executing the particular relational operation serially; if partitioning of data processed by a particular relational operator in said tree is determined to be advantageous, creating a revised tree by adding operators to said tree for dividing data processed by the particular relational operator and executing the particular relational operator in parallel over the divided data; generating a plan for execution of the query based on said revised tree; and
executing the plan and returning results in response to the query.
-
-
46. A computer readable medium encoded with a computer program, the program comprising instructions that when executed by one or more processors cause the one or more processors to perform operations comprising:
-
receiving a query specifying a join of two or more database tables; as data is retrieved from the database during processing of the query, determining whether partitioning is advantageous for a given relational operation to be performed for returning data requested by the query based, at least in part, on determining whether a quantity of data to be processed during a given relational operation satisfies a threshold; and if partitioning is determined to be not advantageous, processing said query serially; if partitioning is determined to be advantageous, partitioning data to be processed during a given relational operation into separate memory buffers;
wherein at least some of the data is unpartitioned when retrieved from the database and before being partitioned into separate memory buffers;processing said query in parallel by concurrently processing said data in said memory buffers if partitioning is determined to be advantageous; and returning results of processing said query in response to said query.
-
Specification