Query optimization through the use of multi-column statistics to avoid the problems of column correlation
First Claim
1. A method, for use in a database management system for optimizing a query, the method comprising:
- collecting at least one type of multi-column statistic to reflect a relationship among multiple selected columns of a table; and
storing the at least one type of multi-column statistic in a table for subsequent use in determining a selectivity value (a number of qualified rows) for predicates in the query, wherein the selectivity value is used in optimizing execution of the query;
wherein the step of collecting at least one type of multi-column statistic further comprises collecting a first type of multi-column quantile statistics for indicating a number of rows between two given values by dividing the data into a plurality of sub-ranges, each sub-range having an even distribution of data; and
determining a frequency and cardinality of each sub-range.
3 Assignments
0 Petitions
Accused Products
Abstract
The system, method, and program of this invention collects multi-column statistics, by a database management system, to reflect a relationship among multiple columns of a table in a relational database. These statistics are stored in the system catalog, and are used during query optimization to obtain an estimate of the number of qualifying rows when a query has predicates on multiple columns of a table.
A multi-column linear quantile statistic is collected by dividing the data of multiple columns into sub-ranges where each sub-range has approximately an even distribution of data, and determining a frequency and cardinality of each sub-range. A multi-column polygonal quantile statistic is collected by dividing the data of multiple columns into sub-spaces where each sub-space contains approximately the same number of tuples, and determining a frequency and cardinality of each sub-space.
The system catalog is accessed for the stored multi-column linear quantile statistic for a query having a single range predicate and at least one equal predicate to determine the selectivity value for the predicates of the query. The system catalog is accessed for the stored multi-column polygonal quantile statistic for a query having more than one range predicate. These statistics are used in various ways to determine the selectivity value for the predicates of the query.
-
Citations
23 Claims
-
1. A method, for use in a database management system for optimizing a query, the method comprising:
-
collecting at least one type of multi-column statistic to reflect a relationship among multiple selected columns of a table; and storing the at least one type of multi-column statistic in a table for subsequent use in determining a selectivity value (a number of qualified rows) for predicates in the query, wherein the selectivity value is used in optimizing execution of the query; wherein the step of collecting at least one type of multi-column statistic further comprises collecting a first type of multi-column quantile statistics for indicating a number of rows between two given values by dividing the data into a plurality of sub-ranges, each sub-range having an even distribution of data; and
determining a frequency and cardinality of each sub-range. - View Dependent Claims (2, 3, 4)
-
-
5. A method, for use in a database management system for optimizing a query, the method comprising:
-
collecting at least one type of multi-column statistic to reflect a relationship among multiple selected columns of a table; and storing the at least one type of multi-column statistic in a table for subsequent use in determining a selectivity value (a number of qualified rows) for predicates in the query, wherein the selectivity value is used in optimizing execution of the query; wherein the step of collecting at least one type of multi-column statistic further comprises collecting a second type of multi-column quantile statistics for indicating a number of rows between two given sets of values by dividing the data into a plurality of sub-spaces, each sub-space containing approximately a same number of tuples; and
determining a frequency and cardinality of each sub-space. - View Dependent Claims (6, 7, 8)
-
-
9. A database management system comprising:
-
means for collecting at least one type of multi-column statistic to reflect a relationship among multiple columns of a table; and means for storing the at least one type of multi-column statistic in a table for subsequent use in determining a selectivity value (a number of qualified rows) for predicates in the query, wherein the selectivity value is used in optimizing execution of the query; wherein the means for collecting at least one type of multi-column statistic further comprises means for collecting a first type of multi-column quantile statistics for indicating a number of rows between two given values by dividing the data into a plurality of sub-ranges, each sub-range having an even distribution of data; and
means for determining a frequency and cardinality of each sub-range. - View Dependent Claims (10, 11, 12)
-
-
13. A database management system comprising:
-
means for collecting at least one type of multi-column statistic to reflect a relationship among multiple columns of a table; and means for storing the at least one type of multi-column statistic in a table for subsequent use in determining a selectivity value (a number of qualified rows) for predicates in the query, wherein the selectivity value is used in optimizing execution of the query; wherein the means for collecting at least one type of multi-column statistic further comprises means for collecting a second type of multi-column quantile statistics for indicating a number of rows between two given sets of values by dividing the data into a plurality of sub-spaces, each sub-space containing approximately a same number of tuples; and
means for determining a frequency and cardinality of each sub-space. - View Dependent Claims (14, 15, 16)
-
-
17. Computer programming code residing on at least one computer usable medium (i.e., a program product) for use in a database management system, the program product comprising:
-
means for causing a collection of at least one type of multi-column statistic to reflect a relationship among multiple columns of a table; and means for causing a storing of the at least one type of multi-column statistic in a table for subsequent use in determining a selectivity value (a number of qualified rows) for predicates in the query, wherein the selectivity value is used in optimizing execution of the query; wherein the means for causing a collection of at least one type of multi-column statistic further comprises means for causing a collection of a first type of multi-column quantile statistics for indicating a number of rows between two given values by dividing the data into a plurality of sub-ranges, each sub-range having an even distribution of data; and
means for causing a determination of a frequency and cardinality of each sub-range. - View Dependent Claims (18, 19, 20)
-
-
21. Computer programming code residing on at least one computer usable medium (i.e. a program product) for use in a database management system, the program product comprising:
-
means for causing a collection of at least one type of multi-column statistic to reflect a relationship among multiple columns of a table; and means for causing a storing of the at least one type of multi-column statistic in a table for subsequent use in determining a selectivity value (a number of qualified rows) for predicates in the query, wherein the selectivity value is used in optimizing execution of the query; wherein the means for causing a collection of at least one type of multi-column statistic further comprises means for causing a collection of a second type of multi-column quantile statistic for indicating a number of rows between two given sets of values by dividing the data into a plurality of sub-spaces, each sub-space containing approximately a same number of tuples; and
means for causing a determination of a frequency and cardinality of each sub-space. - View Dependent Claims (22, 23)
-
Specification