Method and apparatus for predicting selectivity of database query join conditions using hypothetical query predicates having skewed value constants
First Claim
1. A method for predicting the selectivity of a set of at least one logical query condition for querying a database, said at least one logical query condition comprising at least one join condition for joining multiple tables of said database, comprising the steps of:
- automatically identifying at least one skewed value for a first field of said database specified in said at least one join condition;
for each skewed value identified by said step of automatically identifying at least one skewed value, automatically constructing a corresponding set of one or more hypothetical query predicates, each hypothetical query predicate of said set of hypothetical query predicates corresponding to a respective query condition of said set of at least one logical query condition, wherein each respective said hypothetical query predicate is constructed by replacing each occurrence of said first field in the corresponding query condition with a constant equal to the skewed value corresponding to the set of hypothetical query predicates containing the hypothetical query predicate;
automatically predicting a respective selectivity for each said set of hypothetical query predicates; and
automatically predicting a composite selectivity for said set of at least one logical query condition using the respective selectivity for each said set of hypothetical query predicates;
automatically identifying at least one skewed value for a first field comprises automatically accessing a frequent value list containing sampled values from said first field, and automatically identifying any sampled values exceeding at least one pre-determined threshold;
automatically identifying any sampled values exceeding at least one pre-determined threshold comprises automatically identifying any sampled values exceeding at least two pre-determined thresholds, a first of said pre-determined thresholds representing a number of occurrences of said sampled value, and a second of said pre-determined thresholds representing a proportion of records containing said sampled value among all records in a database table containing said first field.
2 Assignments
0 Petitions
Accused Products
Abstract
A database management system predicts a selectivity for database query conditions requiring a join of records from different tables. The system identifies at least one skewed value in a field specified in the join condition, and constructs, for each skewed value, a set of hypothetical query predicates in which the field specified in the join condition is replaced with a constant equal to the skewed value. The system then predicts the selectivity for the hypothetical predicates, using any appropriate prediction technique. The selectivities of the hypothetical predicates are used to predict a selectivity for the original query.
-
Citations
9 Claims
-
1. A method for predicting the selectivity of a set of at least one logical query condition for querying a database, said at least one logical query condition comprising at least one join condition for joining multiple tables of said database, comprising the steps of:
-
automatically identifying at least one skewed value for a first field of said database specified in said at least one join condition; for each skewed value identified by said step of automatically identifying at least one skewed value, automatically constructing a corresponding set of one or more hypothetical query predicates, each hypothetical query predicate of said set of hypothetical query predicates corresponding to a respective query condition of said set of at least one logical query condition, wherein each respective said hypothetical query predicate is constructed by replacing each occurrence of said first field in the corresponding query condition with a constant equal to the skewed value corresponding to the set of hypothetical query predicates containing the hypothetical query predicate; automatically predicting a respective selectivity for each said set of hypothetical query predicates; and automatically predicting a composite selectivity for said set of at least one logical query condition using the respective selectivity for each said set of hypothetical query predicates; automatically identifying at least one skewed value for a first field comprises automatically accessing a frequent value list containing sampled values from said first field, and automatically identifying any sampled values exceeding at least one pre-determined threshold; automatically identifying any sampled values exceeding at least one pre-determined threshold comprises automatically identifying any sampled values exceeding at least two pre-determined thresholds, a first of said pre-determined thresholds representing a number of occurrences of said sampled value, and a second of said pre-determined thresholds representing a proportion of records containing said sampled value among all records in a database table containing said first field. - View Dependent Claims (2, 3)
-
-
4. A computer storage media for predicting the selectivity of a set of at least one logical query condition for querying a database, said at least one logical query condition comprising at least one join condition for joining multiple tables of said database, said computer storage media comprising:
-
a plurality of computer-executable instructions stored on said computer storage media, wherein said instructions, when executed by at least one computer system, cause the at least one computer system to perform the steps of; (a) identifying at least one skewed value for a first field of said database specified in said at least one join condition; (b) for each skewed value identified by said step of identifying at least one skewed value, constructing a corresponding set of one or more hypothetical query predicates, each hypothetical query predicate of said set of hypothetical query predicates corresponding to a respective query condition of said set of at least one logical query condition, wherein each respective said hypothetical query predicate is constructed by replacing each occurrence of said first field in the corresponding query condition with a constant equal to the skewed value corresponding to the set of hypothetical query predicates containing the hypothetical query predicate; (c) predicting a respective selectivity for each said set of hypothetical query predicates; and (d) predicting a composite selectivity for said set of at least one logical query condition using the respective selectivity for each said set of hypothetical query predicates; identifying at least one skewed value for a first field comprises accessing a frequent value list containing sampled values from said first field, and identifying any sampled values exceeding at least one pre-determined threshold; identifying any sampled values exceeding at least one pre-determined threshold comprises identifying any sampled values exceeding at least two pre-determined thresholds, a first of said pre-determined thresholds representing a number of occurrences of said sampled value, and a second of said pre-determined thresholds representing a proportion of records containing said sampled value among all records in a database table containing said first field. - View Dependent Claims (5, 6, 7)
-
-
8. A computer system, comprising:
-
at least one processor; an addressable memory coupled to said at least one processor; a database containing data storable in said addressable memory, said database comprising a plurality of tables a database management system embodied as a plurality of instructions storable in said addressable memory and executable on said at least one processor, said database management system supporting the execution of logical queries against data in said database, at least some of said logical queries including at least one respective join condition for joining multiple tables of said database;
wherein said database management system estimates selectivity of a set of at least one logical query condition including at least one join condition by(a) identifying any skewed values in a field specified in said at least one join condition, (b) constructing a respective set of one or more hypothetical query predicates corresponding to each identified skewed value, each hypothetical query predicate of a set of hypothetical query predicates corresponding to a respective query condition of said set of at least one logical query condition, wherein each respective said hypothetical query predicate is constructed by replacing each occurrence of the field containing the skewed value in the corresponding query condition with a constant equal to the skewed value corresponding to the set of hypothetical query predicates containing the hypothetical query predicate, (c) predicting a respective selectivity for each said set of hypothetical query predicates, and (d) predicting a composite selectivity for said set of at least one logical query condition using the respective selectivity for each said set of hypothetical query predicates; identifying at least one skewed value for a first field comprises accessing a frequent value list containing sampled values from said first field, and identifying any sampled values exceeding at least one pre-determined threshold; identifying any sampled values exceeding at least one pre-determined threshold comprises identifying any sampled values exceeding at least two pre-determined thresholds, a first of said pre-determined thresholds representing a number of occurrences of said sampled value, and a second of said pre-determined thresholds representing a proportion of records containing said sampled value among all records in a database table containing said first field. - View Dependent Claims (9)
-
Specification