Selectivity prediction with compressed histograms in a parallel processing database system
First Claim
1. A method for generating statics for records stored in a subject table in a computer system, comprising:
- (a) generating a global aggregate spool for each of a plurality of partitions of a subject table stored on the computer system, wherein the partitions are stored across a plurality of processing units of the computer system;
(b) constructing one or more summary records from each of the global aggregate spools;
(c) generating one or more interval records from the summary records;
(d) constructing a compressed histogram from the interval records, wherein the compressed histogram includes both equal-height intervals and high-biased intervals; and
(e) analyzing the compressed histogram to estimate cardinality associated with one or more search conditions.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus, and article of manufacture for generating statistics for use by a relational database management system. A global aggregate spool is generated for each of a plurality of partitions of a subject table that are spread across a plurality of processing units of a computer system. Each of the global aggregate spools is scanned to generate summary records. The summary records are then merged to generate interval records for a compressed histogram of the subject table, wherein the compressed histogram includes both equal-height intervals and high-biased intervals. The compressed histogram can then be analyzed to estimate the cardinality associated with one or more search conditions of a user query or other SQL statement. Compared to a strictly equal-height histogram, the compressed histogram allows the relational database management system to more accurately estimate the cardinality associated with various search conditions. As a result, the relational database management system can better optimize the execution of the user query.
67 Citations
75 Claims
-
1. A method for generating statics for records stored in a subject table in a computer system, comprising:
-
(a) generating a global aggregate spool for each of a plurality of partitions of a subject table stored on the computer system, wherein the partitions are stored across a plurality of processing units of the computer system;
(b) constructing one or more summary records from each of the global aggregate spools;
(c) generating one or more interval records from the summary records;
(d) constructing a compressed histogram from the interval records, wherein the compressed histogram includes both equal-height intervals and high-biased intervals; and
(e) analyzing the compressed histogram to estimate cardinality associated with one or more search conditions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
where f=frequency of the loner, T is a total number of rows in the subject table, and L is a maximum number of loners.
-
-
26. An apparatus for generating statistics for records stored in a subject table in a computer system, comprising:
-
(a) a computer system;
(b) logic, performed by the computer system, for;
(1) generating a global aggregate spool for each of a plurality of partitions of a subject table stored on the computer system, wherein the partitions are stored across a plurality of processing units of the computer system;
(2) constructing one or more summary records from each of the global aggregate spools;
(3) generating one or more interval records from the summary records;
(4) constructing a compressed histogram from the interval records, wherein the compressed histogram includes both equal-height intervals and high-biased intervals; and
(5) analyzing the compressed histogram to estimate cardinality associated with one or more search conditions. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
where f=frequency of the loner, T is a total number of rows in the subject table, and L is a maximum number of loners.
-
-
51. An article of manufacture embodying logic for generating statistics for records stored in a subject table in a computer system, comprising:
-
(a) generating a global aggregate spool for each of a plurality of partitions of a subject table stored on the computer system, wherein the partitions are stored across a plurality of processing units of the computer system;
(b) constructing one or more summary records from each of the global aggregate spools;
(c) generating one or more interval records from the summary records;
(d) constructing a compressed histogram from the interval records, wherein the compressed histogram includes both equal-height intervals and high-biased intervals; and
(e) analyzing the compressed histogram to estimate cardinality associated with one or more search conditions. - View Dependent Claims (52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75)
where f=frequency of the loner, T is a total number of rows in the subject table, and L is a maximum number of loners.
-
Specification