Database aggregation query result estimator
First Claim
1. A method of providing an estimated aggregation result for a set of values having first values and second values, the values of the set comprising values in a database, the method comprising:
- obtaining a variance of the first values in different groups of first values, and selecting from among the groups a group of first values having a lower variance than one or more other different of the groups of first values, along with corresponding second values for that group, to use for obtaining the first result of aggregating the first values;
obtaining a first result of aggregating the first values and a second result of aggregating the second values, wherein the first values comprise non-outlier values with respect to the set of values, wherein the second values comprise identified outlier values with respect to the set of values;
sampling the non-outlier first values and aggregating the sampled first values;
aggregating the outlier second values;
obtaining a variance of the first values in different groups of first values; and
for a query to the database, estimating the aggregation result for the set of values based on the first and second results, and storing the estimation for further use in conjunction with the database.
1 Assignment
0 Petitions
Accused Products
Abstract
Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
27 Citations
6 Claims
-
1. A method of providing an estimated aggregation result for a set of values having first values and second values, the values of the set comprising values in a database, the method comprising:
-
obtaining a variance of the first values in different groups of first values, and selecting from among the groups a group of first values having a lower variance than one or more other different of the groups of first values, along with corresponding second values for that group, to use for obtaining the first result of aggregating the first values; obtaining a first result of aggregating the first values and a second result of aggregating the second values, wherein the first values comprise non-outlier values with respect to the set of values, wherein the second values comprise identified outlier values with respect to the set of values; sampling the non-outlier first values and aggregating the sampled first values; aggregating the outlier second values; obtaining a variance of the first values in different groups of first values; and for a query to the database, estimating the aggregation result for the set of values based on the first and second results, and storing the estimation for further use in conjunction with the database. - View Dependent Claims (2, 3, 4, 5)
-
-
6. One or more volatile and/or nonvolatile computer-readable medium storing information to enable a computer to perform a process of providing an estimated aggregation result for a set of values having first values and second values, the values of the set comprising values in a database, the process comprising:
-
obtaining a variance of the first values in different groups of first values, and selecting from among the groups a group of first values having a lower variance than one or more other different of the groups of first values, along with corresponding second values for that group, to use for obtaining the first result of aggregating the first values; obtaining a first result of aggregating the first values and a second result of aggregating the second values, wherein the first values comprise non-outlier values with respect to the set of values, wherein the second values comprise identified outlier values with respect to the set of values; sampling the non-outlier first values and aggregating the sampled first values; aggregating the outlier second values; obtaining a variance of the first values in different groups of first values; and for a query to the database, estimating the aggregation result for the set of values based on the first and second results, and storing the estimation for further use in conjunction with the database.
-
Specification