Method, apparatus and programmed medium for approximating the data cube and obtaining approximate answers to queries in relational databases
First Claim
Patent Images
1. A computer implemented method for providing an approximate answer to a query in a stored relational database, the method comprising the steps of:
- receiving said query containing input data to said stored relational database;
precomputing a summary of a data cube corresponding to said stored relational database by defining at least one sub-cube;
summarizing said at least one sub-cube using a plurality of histogram techniques;
computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique;
determining a maximum of the error/space benefits, wherein said maximum error/space benefit corresponds to a summary with a minimum error;
calculating an approximate answer to said query based on said input data using the histogram technique corresponding to said maximum error/space benefit; and
outputting said approximate answer.
3 Assignments
0 Petitions
Accused Products
Abstract
A novel and unique method of approximating the data cube and summarizing database data in order to provide quick and approximate answers to aggregate queries by precomputing a summary of the data cube using histograms and answering queries using the substantially smaller summary. A unique method according to the present invention provides for identifying accurate histogram classes and distributing the space among the histograms on various sub-cubes such that the errors are minimized, while at the same time computer resources are maximized.
170 Citations
30 Claims
-
1. A computer implemented method for providing an approximate answer to a query in a stored relational database, the method comprising the steps of:
-
receiving said query containing input data to said stored relational database; precomputing a summary of a data cube corresponding to said stored relational database by defining at least one sub-cube; summarizing said at least one sub-cube using a plurality of histogram techniques; computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique; determining a maximum of the error/space benefits, wherein said maximum error/space benefit corresponds to a summary with a minimum error; calculating an approximate answer to said query based on said input data using the histogram technique corresponding to said maximum error/space benefit; and outputting said approximate answer. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program recorded on a computer-readable medium, said computer program designed to operate a computer to provide an approximate answer to a query in a stored relational database by executing the steps of:
-
receiving said query containing input data to said stored relational database; precomputing a summary of a data cube corresponding to said stored relational database by defining at least one sub-cube; summarizing said at least one of sub-cube using a plurality of histogram techniques; computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique; determining which histogram corresponds to a maximum of the error/space benefits, wherein said maximum of said error/space benefit corresponds to a summary with a minimum error; calculating an approximate answer to said query based on said input data using the histogram technique corresponding to said maximum error/space benefit; and outputting said approximate answer. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program recorded on a computer-readable medium, said computer program designed to operate a computer to provide an approximate answer to an aggregate query in a relational database, by executing the steps of:
-
receiving a query; precomputing a summary of a data cube corresponding to a stored relational database by defining at least one sub-cube; summarizing said at least one of sub-cube using a plurality of histogram techniques; computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique; calculating a maximum of the error/space benefits, wherein said maximum of said error/space benefit corresponds to a summary with a minimum error; and providing an approximate answer to said query using the histogram technique corresponding to said maximum error/space benefit. - View Dependent Claims (16, 17)
-
-
18. A system comprising a programmed computer and a stored relational database, wherein said computer is programmed to maximize a benefit for a data sub-cube in said stored relational database, by executing the steps of:
-
allocating a predetermined number of data subsets to a histogram on said subcube having a first data configuration; generating a second data configuration on said sub-cube using said predetermined number of data subsets; calculating an error value for each data configuration; subtracting said error values; and updating said first data configuration based on said subtracted error values, wherein said benefit is a minimum error. - View Dependent Claims (19, 20, 21, 22, 23)
-
-
24. A computer system for providing an approximate answer to a query in a stored relational database, the system comprising:
-
means for receiving said query containing input data to said stored relational database; means for precomputing a summary of a data cube corresponding to said stored relational database by defining at least one sub-cube; means for summarizing said at least one sub-cube using a plurality of histogram techniques; means for computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique; means for determining a maximum of the error/space benefits, wherein said maximum error/space benefit corresponds to a summary with a minimum error; means for calculating an approximate answer to said query based on said input data using the histogram technique corresponding to said maximum error/space benefit; and means for outputting said approximate answer. - View Dependent Claims (25, 26, 27, 28, 29)
-
-
30. A computer implemented method for data subset allocation within a histogram comprising the steps of:
-
counting a number of data subsets corresponding to a histogram which has a maximum error/space benefit, wherein said maximum of said error/space benefit corresponds to a histogram with a minimum error; allocating a new data subset to said histogram corresponding to said maximum error/space benefit when said number of data subsets is greater than zero; decreasing said number of data subsets by one; repeating said counting, allocating and decreasing steps until said number of data subsets equals zero; and using said allocation of data subsets to achieve an optimum allocation of data subsets in said histogram.
-
Specification