System and method for quantifying an extent to which a data mining algorithm captures useful information in input data

US 6,684,208 B2
Filed: 05/15/2001
Issued: 01/27/2004
Est. Priority Date: 03/07/2001
Status: Active Grant

First Claim

Patent Images

1. A method for quantifying an extent to which a data mining algorithm captures useful information in input data, the method comprising:

performing a forward transform on input data;

identifying and quantifying a region of overlap Y_oin the forward transformed data;

performing a reverse transform on the overlap region Y_oto create an overlap region Z in an original feature space;

quantifying a degree of overlap in region Z;

comparing a level of overlap in the Y_oregion with a level of overlap in the Z region; and

quantifying the extent to which a data mining algorithm captures useful information in the input data, based upon a result of the comparison in the levels of overlap between the Y_oregion and the Z region.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for estimating the point of diminishing returns for additional information in data mining processing applications. The present invention provides a convenient method of estimating the extent to which a data mining algorithm captures useful information in raw feature data. First, the input data is processed using a forward transform. A region of overlap Y_oin the forward transformed data is identified and quantified. The region of overlap Y_ois processed with a reverse transform to create an overlap region Z in an original feature space. The degree of overlap in region Z is quantified and compared to a level of overlap in the Y_oregion, such that the comparison quantifies the extent to which a data mining algorithm captures useful information in the input data.

73 Citations

View as Search Results

12 Claims

1. A method for quantifying an extent to which a data mining algorithm captures useful information in input data, the method comprising:
- performing a forward transform on input data;
  
  identifying and quantifying a region of overlap Y_oin the forward transformed data;
  
  performing a reverse transform on the overlap region Y_oto create an overlap region Z in an original feature space;
  
  quantifying a degree of overlap in region Z;
  
  comparing a level of overlap in the Y_oregion with a level of overlap in the Z region; and
  
  quantifying the extent to which a data mining algorithm captures useful information in the input data, based upon a result of the comparison in the levels of overlap between the Y_oregion and the Z region.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising:
3. The method of claim 2, wherein the performing the forward transform comprises transforming a feature set that contains all the features up to the inflection point into a one-dimensional decision space.
4. The method of claim 3, wherein quantifying a region of overlap Y_oin the forward transformed data comprises calculating a degree of overlap between different output classes in the decision space.
5. The method of claim 4, wherein calculating a degree of overlap comprises using one of Kullback-Leibler divergence, Bhattacharrya distance, and multi-modal overlap measures.
6. The method of claim 4, wherein it a dimension N of the feature set is above a threshold value, the input features are orthogonalized with a level of overlap in region Z to convert to N one-dimensional vectors.
7. The method of claim 6, wherein the level of overlap in region Z is computed as a sum of each probability density function (PDF) for each of the N one-dimensional vectors.
8. The method of claim 4, wherein if a dimension N of the feature set is less than a threshold value, a level of overlap in region Z is computed using a Parzens window to estimate a multi-dimensional class-conditional feature probability density function (PDF).
9. The method of claim 4, wherein comparing a level of overlap in the Y_oregion with a level of overlap in the Z region comprises performing a linear regression between the two regions, such that a magnitude of a slope is proportional to the extent to which a data mining algorithm captures useful information in the input data.

10. A method to quantify how close a data mining algorithm is to optimal performance for a given set of input data, the method comprising:
- performing a forward transform on input data;
  
  calculating a degree of confusion in a region of overlap Y_oin the forward transformed data;
  
  performing a reverse transform on the overlap region Y_oto create an overlap region Z in an original feature space;
  
  calculating a degree of confusion in overlap region Z; and
  
  performing a linear regression between the level of confusion in the Y_oregion and the level of confusion in the Z region, such that a magnitude of a slope is proportional to the extent to which a data mining algorithm captures useful information in the input data.

11. A computer readable medium including computer code for quantifying an extent to which a data mining algorithm captures useful information in input data, the computer readable medium comprising:
- computer code for performing a forward transform on input data;
  
  computer code for identifying arid quantifying a region of overlap Y_oin the forward transformed data;
  
  computer code for performing a reverse transform on the overlap region Y_oto create an overlap region Z in an original feature space;
  
  computer code for quantifying a degree of overlap in region Z;
  
  computer code for comparing a level of overlap in the Y_oregion with a level of overlap in the Z region; and
  
  computer code for quantifying the extent to which a data mining algorithm captures useful information in the input data based upon a result of the comparison in the levels of overlap between the Y_oregion and the Z region.

12. A computer system for quantifying how close a data mining algorithm is to optimal performance for a given set of input data, the computer system comprising:
- a processor; and
  
  computer program code that executes on the processor, the computer program code comprising;
  
  computer code for performing a forward transform on input data;
  
  computer code for identifying and quantifying a region of overlap Y_oin the forward transformed data;
  
  computer code for performing a reverse transform on the overlap region Y_oto create an overlap region Z in an original feature space;
  
  computer code for quantifying a degree of overlap in region Z;
  
  computer code for comparing a level of overlap in the Y_oregion with a level of overlap in the Z region; and
  
  computer code for quantifying how close a data mining algorithm is to optimal performance for a given set of input data based upon a result of the comparison in the levels of overlap between the Y_oregion and the Z region.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Loyola Marymount University
Original Assignee
Rockwell Technologies, LLC
Inventors
Kil, David, Fertig, Ken
Primary Examiner(s)
Breene, John
Assistant Examiner(s)
Wassum, Luke S

Application Number

US09/858,768
Publication Number

US 20020128997A1
Time in Patent Office

987 Days
Field of Search

707/1-5, 707/10
US Class Current

707/723
CPC Class Codes

G06F 16/2465 Query processing support fo...

Y10S 707/99935 Query augmenting and refini...

System and method for quantifying an extent to which a data mining algorithm captures useful information in input data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

73 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for quantifying an extent to which a data mining algorithm captures useful information in input data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

73 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links