Automatically and adaptively determining execution plans for queries with parameter markers
First Claim
1. A computer-based method of automatically and adaptively determining query execution plans for queries having parameter markers, said method comprising:
- generating, by a computing system, a first classifier trained by an initial set of training points;
dynamically updating, by a computing system at a first runtime thereof, at least one of a workload of queries processed by a database of said computing system and database statistics collected by said database for computing a plurality of selectivities;
collecting, by a computing system in an off-line phase thereof, said off-line phase being subsequent to said first runtime, a new set of training points, said collecting responsive to a detection of said dynamically updating;
modifying, by said computing system in said off-line phase, said first classifier into a second classifier, said modifying including utilizing said new set of training points;
receiving, by said computing system at a second runtime thereof, said second runtime being subsequent to said off-line phase, a query for said database, said query including one or more predicates, each predicate including one or more parameter markers bound to one or more actual values, and said one or more predicates associated with one or more selectivities of said plurality of selectivities in a one-to-one correspondence; and
automatically determining a query execution plan by said computing system, said automatically determining including mapping, by said second classifier, said one or more selectivities into said query execution plan, wherein said query execution plan is included in an augmented set of training points, said augmented set including said initial set and said new set,wherein said generating said first classifier comprises utilizing a machine learning technique, wherein said modifying said first classifier into said second classifier includes maintaining said first classifier incrementally, wherein said machine learning technique is a boosting technique, and wherein said method further comprises;
determining a subset of training points of said initial set of training points, said subset of training points belonging to a plurality of classes, each class having less than a predetermined threshold coverage of said initial set of training points;
assigning training points of said subset of training points to a single unclassified class;
setting a number of classes to k, wherein k is one plus a number of classes having greater than said predetermined threshold coverage;
generating an error-correcting output code (ECOC) table of length 2*k;
training a binary classifier as said first classifier, said training including utilizing AdaBoost with confidence-rated predictions for each column in said ECOC table, said training said binary classifier including;
initializing said augmented set of training points with equal weights; and
performing a training procedure for T rounds, each round of said training procedure comprising;
training a plurality of weak learners on said augmented set of training points,choosing a weak learner of said plurality of weak learners, said weak learner having a lowest training error of a plurality of training errors associated with said plurality of weak learners in a one-to-one correspondence,assigning a weight to said weak learner, said weight being a function of said lowest training error,assigning exponentially higher weight to any misclassified training points of said augmented set of training points, andassigning exponentially lower weight to any correctly classified training points of said augmented set of training points; and
said training said binary classifier further including;
outputting a model including T weak learners and T weights, said T weights associated with said T weak learners in a one-to-one correspondence, each weak learner of said T weak learners chosen by said choosing said weak learner in each round of said T rounds, and each weight of said T weights assigned by said assigning said weight in each round of said T rounds.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated. A query workload and/or database statistics are dynamically updated. A new set of training points is collected off-line. Using the new set of training points, the first classifier is modified into a second classifier. A database query is received at a runtime subsequent to the off-line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. A mapping of the selectivities into a plan determines the query execution plan. The determined query execution plan is included in an augmented set of training points, where the augmented set includes the initial set and the new set.
-
Citations
11 Claims
-
1. A computer-based method of automatically and adaptively determining query execution plans for queries having parameter markers, said method comprising:
-
generating, by a computing system, a first classifier trained by an initial set of training points; dynamically updating, by a computing system at a first runtime thereof, at least one of a workload of queries processed by a database of said computing system and database statistics collected by said database for computing a plurality of selectivities; collecting, by a computing system in an off-line phase thereof, said off-line phase being subsequent to said first runtime, a new set of training points, said collecting responsive to a detection of said dynamically updating; modifying, by said computing system in said off-line phase, said first classifier into a second classifier, said modifying including utilizing said new set of training points; receiving, by said computing system at a second runtime thereof, said second runtime being subsequent to said off-line phase, a query for said database, said query including one or more predicates, each predicate including one or more parameter markers bound to one or more actual values, and said one or more predicates associated with one or more selectivities of said plurality of selectivities in a one-to-one correspondence; and automatically determining a query execution plan by said computing system, said automatically determining including mapping, by said second classifier, said one or more selectivities into said query execution plan, wherein said query execution plan is included in an augmented set of training points, said augmented set including said initial set and said new set, wherein said generating said first classifier comprises utilizing a machine learning technique, wherein said modifying said first classifier into said second classifier includes maintaining said first classifier incrementally, wherein said machine learning technique is a boosting technique, and wherein said method further comprises; determining a subset of training points of said initial set of training points, said subset of training points belonging to a plurality of classes, each class having less than a predetermined threshold coverage of said initial set of training points; assigning training points of said subset of training points to a single unclassified class; setting a number of classes to k, wherein k is one plus a number of classes having greater than said predetermined threshold coverage; generating an error-correcting output code (ECOC) table of length 2*k; training a binary classifier as said first classifier, said training including utilizing AdaBoost with confidence-rated predictions for each column in said ECOC table, said training said binary classifier including; initializing said augmented set of training points with equal weights; and performing a training procedure for T rounds, each round of said training procedure comprising; training a plurality of weak learners on said augmented set of training points, choosing a weak learner of said plurality of weak learners, said weak learner having a lowest training error of a plurality of training errors associated with said plurality of weak learners in a one-to-one correspondence, assigning a weight to said weak learner, said weight being a function of said lowest training error, assigning exponentially higher weight to any misclassified training points of said augmented set of training points, and assigning exponentially lower weight to any correctly classified training points of said augmented set of training points; and
said training said binary classifier further including;outputting a model including T weak learners and T weights, said T weights associated with said T weak learners in a one-to-one correspondence, each weak learner of said T weak learners chosen by said choosing said weak learner in each round of said T rounds, and each weight of said T weights assigned by said assigning said weight in each round of said T rounds. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computing system comprising:
-
a processor; and a computer-readable memory unit coupled to said processor, said memory unit comprising a software application and instructions that when executed by said processor implement a method of automatically and adaptively determining query execution plans for queries having parameter markers, said method comprising; generating, by a computing system, a first classifier trained by an initial set of training points; dynamically updating, by a computing system at a first runtime thereof, at least one of a workload of queries processed by a database of said computing system and database statistics collected by said database for computing a plurality of selectivities; collecting, by a computing system in an off-line phase thereof, said off-line phase being subsequent to said first runtime, a new set of training points, said collecting responsive to a detection of said dynamically updating; modifying, by said computing system in said off-line phase, said first classifier into a second classifier, said modifying including utilizing said new set of training points; receiving, by said computing system at a second runtime thereof, said second runtime being subsequent to said off-line phase, a query for said database, said query including one or more predicates, each predicate including one or more parameter markers bound to one or more actual values, and said one or more predicates associated with one or more selectivities of said plurality of selectivities in a one-to-one correspondence; and automatically determining a query execution plan by said computing system, said automatically determining including mapping, by said second classifier, said one or more selectivities into said query execution plan, wherein said query execution plan is included in an augmented set of training points, said augmented set including said initial set and said new set, wherein said generating said first classifier comprises utilizing a machine learning technique, wherein said modifying said first classifier into said second classifier includes maintaining said first classifier incrementally, wherein said machine learning technique is a boosting technique, and wherein said method further comprises; determining a subset of training points of said initial set of training points, said subset of training points belonging to a plurality of classes, each class having less than a predetermined threshold coverage of said initial set of training points; assigning training points of said subset of training points to a single unclassified class; setting a number of classes to k, wherein k is one plus a number of classes having greater than said predetermined threshold coverage; generating an error-correcting output code (ECOC) table of length 2*k; training a binary classifier as said first classifier, said training including utilizing AdaBoost with confidence-rated predictions for each column in said ECOC table, said training said binary classifier including; initializing said augmented set of training points with equal weights; and performing a training procedure for T rounds, each round of said training procedure comprising; training a plurality of weak learners on said augmented set of training points, choosing a weak learner of said plurality of weak learners, said weak learner having a lowest training error of a plurality of training errors associated with said plurality of weak learners in a one-to-one correspondence, assigning a weight to said weak learner, said weight being a function of said lowest training error, assigning exponentially higher weight to any misclassified training points of said augmented set of training points, and assigning exponentially lower weight to any correctly classified training points of said augmented set of training points; and
said training said binary classifier further including;outputting a model including T weak learners and T weights, said T weights associated with said T weak learners in a one-to-one correspondence, each weak learner of said T weak learners chosen by said choosing said each weak learner in each round of said T rounds, and each weight of said T weights assigned by said assigning said weight in each round of said T rounds. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer-based method of automatically and adaptively determining query execution plans for queries having parameter markers, said method comprising:
-
generating, by a computing system, a first classifier trained by an initial set of training points, wherein the initial set of training points consists of selectivities; generating a set of random decision trees (RDTs), said set of RDTs having a predetermined number of RDTs, wherein said generating said set of RDTs includes defining a generation procedure for each RDT of said set of RDTs, wherein said defining said generation procedure includes; choosing a selectivity of a plurality of selectivities for a first node of an RDT of said set of RDTs, said chosen selectivity not used in a higher level node of said RDT'"'"'s hierarchy; selecting a decision threshold value for said chosen selectivity, said decision threshold value separating a set of query execution plans in said first node into two disjoint subsets of said set of query execution plans; and recursively using said generation procedure to expand said RDT for each subset of said two disjoint subsets until a number of query execution plans in a subset of said two disjoint subsets is fewer than a predefined minimum query execution plan threshold, a depth of said RDT reaches a depth threshold based on predefined criteria, or all query execution plans of said subset of said two disjoint subsets belong to a single type; dynamically updating, by a computing system at a first runtime thereof, at least one of a workload of queries processed by a database of said computing system and database statistics collected by said database for computing said plurality of selectivities; collecting, by a computing system in an off-line phase thereof, said off-line phase being subsequent to said first runtime, a new set of training points, said collecting responsive to a detection of said dynamically updating; modifying, by said computing system in said off-line phase, said first classifier into a second classifier, said modifying including utilizing said new set of training points; receiving, by said computing system at a second runtime thereof, said second runtime being subsequent to said off-line phase, a query for said database, said query including one or more predicates, each predicate including one or more parameter markers bound to one or more actual values; and automatically determining a query execution plan by a processor of said computing system, wherein said automatically determining includes; mapping, by said second classifier, one or more selectivities of said one or more predicates into said query execution plan, wherein said query execution plan is included in an augmented set of training points, said augmented set including said initial set of training points and said new set of training points; traversing a plurality of decision paths in said set of RDTs, wherein a decision path of said plurality of decision paths starts at a root of said RDT and ends at a leaf node of said RDT, wherein said traversing includes obtaining, across said set of RDTs, a set of posterior probabilities that are based on said augmented set of training points and said query execution plan, and obtaining, across said set of RDTs, one or more other sets of posterior probabilities that are based on one or more other query execution plans; computing a first average of said posterior probabilities included in said set of posterior probabilities; comparing said first average of said posterior probabilities to one or more other averages of said one or more other sets of posterior probabilities, wherein said comparing includes identifying said first average of said posterior probabilities as an optimal average selected from the group consisting of said first average of said posterior probabilities and said one or more other averages of said one or more other sets of posterior probabilities, and wherein said identifying said first average of said posterior probabilities as said optimal average includes utilizing a loss function; identifying said query execution plan as having the optimum average posterior probability, wherein said identifying said query execution plan is based on selecting said optimal average in response to said identifying said first average as said optimal average; and providing said query execution plan as a prediction of an output of a query optimizer of said database without utilizing said query optimizer to provide said output.
-
Specification