Vertical implementation of expectation-maximization algorithm in SQL for performing clustering in very large databases
First Claim
Patent Images
1. A method for performing clustering within a relational database management system to group a set of n data points into a set of k clusters, each data point having a dimensionality p, the method comprising the steps of:
- establishing a first table, C, having 1 column and p*k rows, for the storage of means values;
establishing a second table, R, having 1 column and p rows, for the storage of covariance values;
establishing a third table, W, having w columns and k rows, for the storage of w weight, values;
establishing a fourth table, Y, having 1 column and p*n rows, for the storage of values; and
executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables;
said step of executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm includes the step of calculating a Mahalanobis distance for each of said n data points by using SQL aggregate functions to join tables Y, C and R.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for performing cluster analysis inside a relational database management system. The method defines a plurality of tables for the storage of data points and Gaussian mixture parameters and executes a series of SQL statements implementing an Expectation-Maximization clustering algorithm to iteratively update the Gaussian mixture parameters stored within the tables.
31 Citations
6 Claims
-
1. A method for performing clustering within a relational database management system to group a set of n data points into a set of k clusters, each data point having a dimensionality p, the method comprising the steps of:
-
establishing a first table, C, having 1 column and p*k rows, for the storage of means values;
establishing a second table, R, having 1 column and p rows, for the storage of covariance values;
establishing a third table, W, having w columns and k rows, for the storage of w weight, values;
establishing a fourth table, Y, having 1 column and p*n rows, for the storage of values; and
executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables;
said step of executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm includes the step of calculating a Mahalanobis distance for each of said n data points by using SQL aggregate functions to join tables Y, C and R. - View Dependent Claims (2, 3, 4, 5, 6)
k≦
p; and
p<
<
n.
-
-
6. The method for performing clustering within a relational database management system in accordance with claim 5, wherein:
-
p≦
100; and
k≦
100.
-
Specification