METHODS AND SYSTEMS FOR TRANSDUCTIVE DATA CLASSIFICATION

US 20080097936A1
Filed: 05/23/2007
Published: 04/24/2008
Est. Priority Date: 07/12/2006
Status: Active Grant

First Claim

Patent Images

1. In a computer-based system, a method for classification of data comprising:

receiving labeled data points, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category;

receiving unlabeled data points;

receiving at least one predetermined cost factor of the labeled data points and unlabeled data, points;

training a transductive classifier using Maximum Entropy Discrimination (MED) through iterative calculation using said at least one cost factor and the labeled data points and the unlabeled data points as training examples, wherein for each iteration of the calculations the unlabeled data point cost factor is adjusted as a function of an expected label value and a data point label prior probability is adjusted according to an estimate of a data point class membership probability;

applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and

outputting a classification of the classified data points, or derivative thereof, to at least one of a user, another system, and another process.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method, data processing apparatus, and article of manufacture are provided for classifying data. Labeled data points are received, each of the labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category; receiving unlabeled data points; receiving at least one predetermined cost factor of the labeled data points and unlabeled data points; training a transductive classifier using MED through iterative calculation using the at least one cost factor and the labeled data points and the unlabeled data points as training examples; applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and outputting a classification of the classified data points, or derivative thereof.

Citations

46 Claims

1. In a computer-based system, a method for classification of data comprising:
- receiving labeled data points, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category;
  
  receiving unlabeled data points;
  
  receiving at least one predetermined cost factor of the labeled data points and unlabeled data, points;
  
  training a transductive classifier using Maximum Entropy Discrimination (MED) through iterative calculation using said at least one cost factor and the labeled data points and the unlabeled data points as training examples, wherein for each iteration of the calculations the unlabeled data point cost factor is adjusted as a function of an expected label value and a data point label prior probability is adjusted according to an estimate of a data point class membership probability;
  
  applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and
  
  outputting a classification of the classified data points, or derivative thereof, to at least one of a user, another system, and another process.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein said function is the absolute value of the expected label of a data point.
  - 3. The method of claim 1 further comprising the step of receiving prior probability information of labeled and unlabeled data points.
  - 4. The method of claim 3 wherein said transductive classifier learns using prior probability information of the labeled and unlabeled data.
  - 5. The method of claim 1 comprising the further step of determining the decision function with minimal KL divergence using a Gaussian prior for the decision function parameters given the included and excluded training examples utilizing the labeled as well as the unlabeled data as learning examples according to their expected label.
  - 6. The method of claim 1 comprising the further step of determining the decision function with minimal KL divergence using a multinomial prior distribution for the decision function parameters.
  - 7. The method of claim 1 wherein the iterative step of training a transductive classifier is repeated until the convergence of data values is reached.
  - 8. The method of claim 7 wherein convergence is reached when the change of the decision function of the transductive classifier falls below a predetermined threshold value.
  - 9. The method of claim 7 wherein convergence is reached when the change of the determined expected label value falls below a predetermined threshold value.
  - 10. The method of claim 1 wherein the label of the included training example has a value of +1 and the label of the excluded training example has a value of −
    - 1.
  - 11. The method of claim 1 wherein the label of the included example is mapped to a first numeric value and the label of the excluded example to a second numeric value.
  - 12. The method of claim 1 further comprising:
    - storing the labeled data points in a memory of a computer;
      
      storing the unlabeled data points in a memory of a computer;
      
      storing the input data points in a memory of a computer; and
      
      storing the at least one predetermined cost factor of the labeled data points and unlabeled data points in a memory of a computer.

13. A method for classification of data comprising:
- providing computer executable program code to be deployed to and executed on a computer system, the program code comprising instructions for;
  
  accessing stored labeled data points in a memory of a computer, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category;
  
  accessing unlabeled data points from a memory of a computer;
  
  accessing at least one predetermined cost factor of the labeled data points and unlabeled data points from a memory of a computer;
  
  training a Maximum Entropy Discrimination (MED) transductive classifier through iterative calculation using said at least one stored cost factor and stored labeled data points and stored unlabeled data points as training examples wherein for each iteration of the calculation the unlabeled data point cost factor is adjusted as a function of an expected label value and a data point prior probability is adjusted according to an estimate of a data point class membership probability;
  
  applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and
  
  outputting a classification of the classified data points, or derivative thereof, to at least one of a user, another system, and another process.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 14. The method of claim 13 wherein said function is the absolute value the expected label of a data point.
  - 15. The method of claim 13 further comprising the step of accessing prior probability information of labeled and unlabeled data points stored in a memory of a computer.
  - 16. The method of claim 15 wherein for each iteration, the prior probability information is adjusted according to an estimate of a data point class membership probability.
  - 17. The method of claim 13 further comprising instructions for determining the decision function with minimal KL divergence to the prior distribution of the decision function parameters given the included and excluded training examples utilizing the labeled as well as the unlabeled data as learning examples according to their expected label,
  - 18. The method of claim 13 wherein the iterative step of training a transductive classifier is repeated until convergence of data values is reached.
  - 19. The method of claim 18 wherein convergence is reached when the change of the decision function of the transductive classification falls below a predetermined threshold value.
  - 20. The method of claim 18 wherein convergence is reached when the change of the determined expected label value falls below a predetermined threshold value.
  - 21. The method of claim 13 wherein the label of the Included training example has a value of −
    - 1 and the label of the excluded training example has a value of −
      
      1.
  - 22. The method of claim 13 wherein the label of the included example is mapped to a first numeric value and the label of the excluded example to a second numeric value.

23. A data processing apparatus comprising:
- at least one memory for storing;
  
  (i) labeled data points wherein each of said labeled data points having at least one label indicating whether the data point is a training example for data points being included in a designated category or a training example for data points being excluded from a designated category;
  
  (ii) unlabeled data points; and
  
  (iii) at least one predetermined cost factor of the labeled data points and unlabeled data points; and
  
  a transductive classifier trainer to iteratively teach the transductive classifier using transductive Maximum Entropy Discrimination (MED) using said at least one stored cost factor and stored labeled data points and stored unlabeled data points as training examples wherein at each iteration of the MED calculation the cost factor of the unlabeled data point is adjusted as a function of an expected label value and a data point label prior probability is adjusted according to an estimate of a data point class membership probability;
  
  wherein a classifier trained by the transductive classifier trainer is used to classify at least one of the unlabeled data points, the labeled data points, and input data points;
  
  wherein a classification of the classified data points, or derivative thereof, is output to at least one of a user, another system, and another process.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 24. The apparatus of claim 23 wherein said function is the absolute value the expected label of a data point.
  - 25. The apparatus of claim 23 wherein said memory also stores prior probability information of labeled and unlabeled data points.
  - 26. The apparatus of claim 25 wherein at each iteration of the MED calculation, the prior probability information is adjusted according to an estimate of a data point class membership probability.
  - 27. The apparatus of claim 23 further comprising a processor for determining the decision function with minimal KL divergence to the prior distribution of the decision function parameters given the included and excluded training examples utilizing the labeled as well as the unlabeled data as learning examples according to their expected label.
  - 28. The apparatus of claim 23 further comprising a means for determining the convergence of data values, and terminating calculations upon determination of convergence.
  - 29. The apparatus of claim 28 wherein convergence is reached when the change of the decision function of the transductive classifier calculation falls below a predetermined threshold value.
  - 30. The apparatus of claim 28 wherein convergence is reached when the change of the determined expected label values falls below a predetermined threshold value.
  - 31. The apparatus of claim 23 the label of the included training example has a value of −
    - 1 and the label of the excluded training example has a value of −
      
      1.
  - 32. The apparatus of claim 23 wherein the label of the included example is mapped to a first numeric value and the label of the excluded example to a second numeric value.

33. An article of manufacture comprising a program storage medium readable by a computer, the medium tangibly embodying one or more programs of instructions executable by a computer to perform a method of data classification comprising:
- receiving labeled data points, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category;
  
  receiving unlabeled data points;
  
  receiving at least one predetermined cost factor of the labeled data points and unlabeled data points;
  
  training a transductive classifier with iterative Maximum Entropy Discrimination (MED) calculation using said at least one stored cost factor and stored labeled data points and stored unlabeled data points as training examples wherein at each iteration of the MED calculation the unlabeled data point cost factor is adjusted as a function of an expected label value and a data point prior probability is adjusted according to an estimate of a data point class membership probability;
  
  applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and
  
  outputting a classification of the classified data points, or derivative thereof, to at least one of a user, another system, and another process.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42)
- - 34. The article of manufacture of claim 33 wherein said function is the absolute value the expected label of a data point.
  - 35. The article of manufacture of claim 33 further comprising the step of storing prior probability information of labeled and unlabeled data points in a memory of a computer.
  - 36. The article of manufacture of claim 35 wherein at each iteration of the MED calculation, the prior probability information is adjusted according to an estimate of a data point class membership probability.
  - 37. The article of manufacture of claim 33 comprising the further step of determining the decision function with minimal KL divergence to the prior distribution of the decision function parameters given the included and excluded training examples utilizing the labeled as well as the unlabeled data as learning examples according to their expected label.
  - 38. The article of manufacture of claim 33 wherein the iterative step of training a transductive classifier is repeated until the convergence of data values is reached.
  - 39. The article of manufacture of claim 38 wherein convergence is reached when the change of the decision function of the transductive classification falls below a predetermined threshold value.
  - 40. The article of manufacture of claim 38 wherein convergence is reached when the change of the determined expected label value falls below a predetermined threshold value.
  - 41. The article of manufacture of claim 33 wherein the label of the included training example has a value of +1 and the label of the excluded training example has a value of −
    - 1.
  - 42. The article of manufacture of claim 33 wherein the label of the included example is mapped to a first numeric value and the label of the excluded example is a second numeric value.

43. In a computer-based system, a method for classification of unlabeled data comprising:
- receiving labeled data points, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category;
  
  receiving labeled and unlabeled data points;
  
  receiving prior label probability information of labeled data points and unlabeled data points;
  
  receiving at least one predetermined cost factor of the labeled data points and unlabeled data points;
  
  determining the expected labels for each labeled and unlabeled data point according to the label prior probability of the data point;
  
  repeating the following substeps until substantial convergence of data values;
  
  generating a scaled cost value for each unlabeled data point proportional to the absolute value of the data point'"'"'s expected label;
  
  training a classifier by determining the decision function that minimizes the KL divergence to the prior probability distribution of the decision function parameters given the included training and excluded training examples utilizing the labeled as well as the unlabeled data as training examples according to their expected label;
  
  determining the classification scores of the labeled and unlabeled data points using the trained classifier;
  
  calibrating the output of the trained classifier to class membership probability;
  
  updating the label prior probabilities of the unlabeled data points according to the determined class membership probabilities;
  
  determining the label and margin probability distributions using Maximum Entropy Discrimination (MED) using the updated label prior probabilities and the previously determined classification scores;
  
  computing new expected labels using the previously determined label probability distribution; and
  
  updating expected labels for each data point by interpolating the new expected labels with the expected label of previous iteration; and
  
  outputting a classification of the input data points, or derivative thereof, to at least one of a user, another system, aid another process.
- View Dependent Claims (44, 45, 46)
- - 44. The method of claim 43 wherein convergence is reached when the change of the decision function falls below a predetermined threshold value.
  - 45. The method of claim 43 wherein convergence is reached when the change of the determined expected label value falls below a predetermined threshold value.
  - 46. The method of claim 43 wherein the label of the included training example has a value of +1 and the label of the excluded training example has a value of −
    - 1.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tungsten Automation Corp.
Original Assignee
Kofax Incorporated
Inventors
Schmidtler, Mauritius A. R., Harris, Christopher K.

Granted Patent

US 7,761,391 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/12
CPC Class Codes

G06N 20/00 Machine learning

G06N 20/10 using kernel methods, e.g. ...

METHODS AND SYSTEMS FOR TRANSDUCTIVE DATA CLASSIFICATION

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

46 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS AND SYSTEMS FOR TRANSDUCTIVE DATA CLASSIFICATION

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

46 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links