High dimensional data mining and visualization via gaussianization

US 6,591,235 B1
Filed: 05/05/2000
Issued: 07/08/2003
Est. Priority Date: 02/04/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method for mining high dimensional data, comprising the steps of:

linearly transforming the high dimensional data into less dependent coordinates, by applying a linear transform of n rows by n columns to the high dimensional data;

marginally Gaussianizing each of the coordinates, said Gaussianizing being characterized by univariate Gaussian means, priors, and variances;

iteratively repeating said transforming and Gaussianizing steps until the coordinates converge to a standard Gaussian distribution;

arranging the coordinates of all iterations hierarchically to facilitate data mining; and

mining the arranged coordinates.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for providing high dimensional data. The high dimensional data is linearly transformed into less dependent coordinates, by applying a linear transform of n rows by n columns to the high dimensional data. Each of the coordinates are marginally Gaussianized, the Gaussianization being characterized by univariate Gaussian means, priors, and variances. The transforming and Gaussianizing steps are iteratively repeated until the coordinates converge to a standard Gaussian distribution. The coordinates of all iterations are arranged hierarchically to facilitate data mining. The arranged coordinates are then mined. According to an embodiment of the invention, the transform step includes applying an iterative maximum likelihood expectation maximization (EM) method to the high dimensional data.

36 Citations

26 Claims

1. A method for mining high dimensional data, comprising the steps of:
- linearly transforming the high dimensional data into less dependent coordinates, by applying a linear transform of n rows by n columns to the high dimensional data;
  
  marginally Gaussianizing each of the coordinates, said Gaussianizing being characterized by univariate Gaussian means, priors, and variances;
  
  iteratively repeating said transforming and Gaussianizing steps until the coordinates converge to a standard Gaussian distribution;
  
  arranging the coordinates of all iterations hierarchically to facilitate data mining; and
  
  mining the arranged coordinates.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, wherein said transforming step further comprises the step of applying an iterative maximum likelihood expectation maximization (EM) method to the high dimensional data.
  - 3. The method according to claim 2, further comprising the step of computing a log likelihood of the high dimensional data, prior to said transforming step.
  - 4. The method according to claim 3, wherein said EM method comprises the steps of:
5. The method according to claim 4, wherein the linear transform is fixed, when the univariate Gaussian variances are updated.
6. The method according to claim 4, wherein the univariate Gaussian variances are fixed, when the linear transform is updated.
7. The method according to claim 4, wherein the linear transform is fixed, when the univariate Gaussian means are updated.
8. The method according to claim 1, wherein said arranging step hierarchically arranges the coordinates of all the iterations in a tree structure.

9. A method for visualizing high dimensional data, comprising the steps of:
- linearly transforming the high dimensional data into less dependent coordinates, by applying a linear transform of n rows by n columns to the high dimensional data;
  
  marginally Gaussianizing each of the coordinates, said Gaussianizing being characterized by univariate Gaussian means, priors, and variances;
  
  iteratively repeating said transforming and Gaussianizing steps until the coordinates converge to a standard Gaussian distribution;
  
  arranging the coordinates of all iterations hierarchically into high dimensional data sets to facilitate data visualization; and
  
  visualizing the high dimensional data sets.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method according to claim 9, wherein said transforming step further comprises the step of applying an iterative expectation maximization (EM) method to the high dimensional data.
  - 11. The method according to claim 10, further comprising the step of computing a log likelihood of the high dimensional data, prior to said transforming step.
  - 12. The method according to claim 11, wherein said EM method comprises an expectation step and a maximization step,
13. The method according to claim 12, wherein the linear transform is fixed, when the univariate Gaussian variances are updated.
14. The method according to claim 13, wherein the univariate Gaussian variances are fixed, when the linear transform is updated.
15. The method according to claim 13, wherein the linear transform is fixed, when the univariate Gaussian means are updated.
16. The method according to claim 9, wherein said arranging step hierarchically arranges the coordinates of all the iterations in a tree structure.

17. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for arranging high dimensional data for data mining, said method steps comprising:
- linearly transforming the high dimensional data into less dependent coordinates, by applying a linear transform of n rows by n columns to the high dimensional data;
  
  marginally Gaussianizing each of the coordinates, said Gaussianizing being characterized by univariate Gaussian means, priors, and variances;
  
  iteratively repeating said transforming and Gaussianizing steps until the coordinates converge to a standard Gaussian distribution; and
  
  arranging the coordinates of all iterations hierarchically to facilitate data mining.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The program storage device according to claim 17, wherein said transforming step further comprises the step of applying an iterative maximum likelihood expectation maximization (EM) method to the high dimensional data.
  - 19. The program storage device according to claim 18, further comprising the step of computing a log likelihood of the high dimensional data, prior to said transforming step.
  - 20. The program storage device according to claim 19, wherein said EM method comprises the steps of:
21. The program storage device according to claim 17, wherein said arranging step hierarchically arranges the coordinates of all the iterations in a tree structure.

22. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for arranging high dimensional data for visualization, said method steps comprising:
- linearly transforming the high dimensional data into less dependent coordinates, by applying a linear transform of n rows by n columns to the high dimensional data;
  
  marginally Gaussianizing each of the coordinates, said Gaussianizing being characterized by univariate Gaussian means, priors, and variances;
  
  iteratively repeating said transforming and Gaussianizing steps until the coordinates converge to a standard Gaussian distribution; and
  
  arranging the coordinates of all iterations hierarchically into high dimensional data sets to facilitate data visualization.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The program storage device according to claim 22 wherein said transforming step further comprises the step of applying an iterative expectation maximization (EM) method to the high dimensional data.
  - 24. The program storage device according to claim 23, further comprising the step of computing a log likelihood of the high dimensional data, prior to said transforming step.
  - 25. The program storage device according to claim 24, wherein said EM method comprises an expectation step and a maximization step,
26. The program storage device according to claim 22, wherein said arranging step hierarchically arranges the coordinates of all the iterations in a tree structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Gopinath, Ramesh Ambat, Chen, Scott Shaobing
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/565,365
Time in Patent Office

1,159 Days
Field of Search

704/231, 704/236, 704/243, 704/244, 704/245
US Class Current

704/236
CPC Class Codes

G06F 18/213 Feature extraction, e.g. by...

G06F 18/2321 using statistics or functio...

High dimensional data mining and visualization via gaussianization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

36 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

High dimensional data mining and visualization via gaussianization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links