Multi-dimensional database and data cube compression for aggregate query support on numeric dimensions
First Claim
1. A method of querying a database containing data records stored in the database comprising the steps of:
- a) providing a functional representation of multiple data clusters from data records stored on a database;
said functional representation identifying a distribution probability of said data records;
b) selecting ranges over dimensions of the data for determining a sum or a count of data records from the database falling within the ranges; and
c) determining a sum or a count of data records from the database that fall within the selected ranges by integrating the functional representation over the ranges for clusters having a functional representation that includes said ranges.
2 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method for efficiently compressing contents of a database system to support ad hoc querying and OLAP type aggregation queries. This invention consists of a new compressed representation of the data cube that (a) drastically reduces storage requirements, (b) does not require the discretization hierarchy along each query dimension to be fixed beforehand and (c) treats each dimension as a potential target measure and supports multiple aggregation functions without additional storage costs. The tradeoff is approximate, yet relatively accurate, answers to queries. We outline mechanisms to reduce the error in the approximation. Our performance evaluation indicates that our compression technique effectively addresses the limitation of existing approaches. The basic method relies on representing the contents of the database by a probability distribution consisting of a mixture of Gaussians. Aggregation queries, be they multi-dimensional, conjunctive, or disjunctive, can be answered by performing integration over the probability distribution.
124 Citations
38 Claims
-
1. A method of querying a database containing data records stored in the database comprising the steps of:
-
a) providing a functional representation of multiple data clusters from data records stored on a database;
said functional representation identifying a distribution probability of said data records;
b) selecting ranges over dimensions of the data for determining a sum or a count of data records from the database falling within the ranges; and
c) determining a sum or a count of data records from the database that fall within the selected ranges by integrating the functional representation over the ranges for clusters having a functional representation that includes said ranges. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. Apparatus for querying a database containing data records comprising:
-
a) means for providing a functional representation of data clusters representing multiple number of data records stored on a database;
b) means for selecting ranges over dimensions of the data for determining a sum or a count of data records from the database falling within the ranges; and
c) means for determining the sum or the count of data records from the database that fall within the selected ranges by integrating the functional representation from each cluster over the ranges. - View Dependent Claims (15)
-
-
16. Data mining apparatus for querying a database containing data records comprising:
-
a) a memory device for storing a database comprising multiple data records organized into data fields having a dimension n for storing data record items;
b) a computer having one or more processing units for executing a stored computer program, said computer including a rapid access memory store; and
c) an interface for coupling the memory device for storing the database to the computer to allow records to be retrieved from the database;
whereind) said computer executing a stored program having software components including i) a component for providing a functional representation of multiple data clusters for representing multiple number of data records stored on a database;
ii) a component for selecting ranges over dimensions of the data for determining a sum or a count of data records from the database falling within the ranges; and
iii) a component for integrating the functional representation from each cluster over the ranges to determine the sum or the count of data records from the database that fall within the selected ranges.- View Dependent Claims (17, 18)
-
-
19. A method of counting data records containing continuous attributes that are stored in the database;
- said method comprising the steps of;
a) defining a data cube over which to count by selecting ranges over one or more of the continuous attributes of the data records in a database;
a) providing a functional representation of data clusters from data records stored in the database;
said functional representation identifying a distribution probability of said data records; and
c) determining the count of data records from the database that fall within the data cube by integrating the functional representation of the data clustering over the selected ranges for each of the clusters having a functional representation that includes the selected ranges. - View Dependent Claims (20, 21, 22, 23, 24, 25)
having mixture weights Wl which represent a fraction of the database represented by a data cluster and wherein the number k is the number of such database clusters.
- said method comprising the steps of;
-
25. The method of claim 19 wherein the step of providing the function representation is performed by a server computer and the steps of determining the count and defining the data cube are performed on one or more client computers which may or may not be in communication with the server computer when the determining step is conducted.
-
26. A computer-readable medium having computer executable instructions for performing steps comprising:
-
a) providing a functional representation of data clusters representing data records stored on a database;
said functional representation identifying a distribution probability of said data records;
b) selecting ranges over dimensions of the data for determining a sum or a count of data records from the database falling within the ranges; and
c) determining a sum or a count of data records from the database that fall within the selected ranges by integrating the functional representation of the data clusters over the ranges. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A database management system for querying a database containing data records comprising:
-
a) a memory device for storing a database comprising multiple data records organized into data fields;
b) a computer having one or more processing units for executing a stored computer program, said computer including a rapid access memory store; and
c) an interface for coupling the memory device for storing the database to the computer to allow records to be retrieved from the database;
whereind) said computer executing a stored program having software components including i) a clustering component for providing a cluster model that includes a functional representation of data clusters representing multiple number of data records stored on a database; and
ii) a query execution component which performs an aggregation query comprising a sum, a count or an average computed from either the actual data stored in the database or from an integration of the available functional representation of the clusters from the clustering model of the data.- View Dependent Claims (36, 37, 38)
-
Specification