Vertical implementation of expectation-maximization algorithm in SQL for performing clustering in very large databases

US 6,519,591 B1
Filed: 12/22/2000
Issued: 02/11/2003
Est. Priority Date: 12/22/2000
Status: Active Grant

First Claim

Patent Images

1. A method for performing clustering within a relational database management system to group a set of n data points into a set of k clusters, each data point having a dimensionality p, the method comprising the steps of:

establishing a first table, C, having 1 column and p*k rows, for the storage of means values;

establishing a second table, R, having 1 column and p rows, for the storage of covariance values;

establishing a third table, W, having w columns and k rows, for the storage of w weight, values;

establishing a fourth table, Y, having 1 column and p*n rows, for the storage of values; and

executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables;

said step of executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm includes the step of calculating a Mahalanobis distance for each of said n data points by using SQL aggregate functions to join tables Y, C and R.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for performing cluster analysis inside a relational database management system. The method defines a plurality of tables for the storage of data points and Gaussian mixture parameters and executes a series of SQL statements implementing an Expectation-Maximization clustering algorithm to iteratively update the Gaussian mixture parameters stored within the tables.

31 Citations

View as Search Results

6 Claims

1. A method for performing clustering within a relational database management system to group a set of n data points into a set of k clusters, each data point having a dimensionality p, the method comprising the steps of:
- establishing a first table, C, having 1 column and p*k rows, for the storage of means values;
  
  establishing a second table, R, having 1 column and p rows, for the storage of covariance values;
  
  establishing a third table, W, having w columns and k rows, for the storage of w weight, values;
  
  establishing a fourth table, Y, having 1 column and p*n rows, for the storage of values; and
  
  executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables;
  
  said step of executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm includes the step of calculating a Mahalanobis distance for each of said n data points by using SQL aggregate functions to join tables Y, C and R.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method for performing clustering within a relational database management system in accordance with claim 1, wherein said step of executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables continues until a specified number of iterations has been performed.
  - 3. The method for performing clustering within a relational database management system in accordance with claim 1, wherein said first, second, third and fourth tables represent matrices.
  - 4. The method for performing clustering within a relational database management system in accordance with claim 3, wherein said third table, R, represents a diagonal matrix.
  - 5. The method for performing clustering within a relational database management system in accordance with claim 1, wherein:
6. The method for performing clustering within a relational database management system in accordance with claim 5, wherein:
- p≦
  
  100; and
  
  k≦
  
  100.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Teradata US, Inc. (Teradata Corporation)
Original Assignee
NCR Corporation
Inventors
Ordonez, Carlos, Cereghini, Paul M.
Primary Examiner(s)
Mizrahi, Diane D.

Application Number

US09/747,857
Time in Patent Office

781 Days
Field of Search

707/2, 707/3, 707/4, 707/5, 707/6, 707/7, 707/101, 706/52, 375/342
US Class Current

707/737
CPC Class Codes

G06F 16/285   Clustering or classification

Y10S 707/954   Relational

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99936   Pattern matching access

Vertical implementation of expectation-maximization algorithm in SQL for performing clustering in very large databases

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

31 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Vertical implementation of expectation-maximization algorithm in SQL for performing clustering in very large databases

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links