METHODS AND SYSTEMS FOR TRANSDUCTIVE DATA CLASSIFICATION
First Claim
1. In a computer-based system, a method for classification of data comprising:
- receiving labeled data points, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category;
receiving unlabeled data points;
receiving at least one predetermined cost factor of the labeled data points and unlabeled data, points;
training a transductive classifier using Maximum Entropy Discrimination (MED) through iterative calculation using said at least one cost factor and the labeled data points and the unlabeled data points as training examples, wherein for each iteration of the calculations the unlabeled data point cost factor is adjusted as a function of an expected label value and a data point label prior probability is adjusted according to an estimate of a data point class membership probability;
applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and
outputting a classification of the classified data points, or derivative thereof, to at least one of a user, another system, and another process.
9 Assignments
0 Petitions
Accused Products
Abstract
A system, method, data processing apparatus, and article of manufacture are provided for classifying data. Labeled data points are received, each of the labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category; receiving unlabeled data points; receiving at least one predetermined cost factor of the labeled data points and unlabeled data points; training a transductive classifier using MED through iterative calculation using the at least one cost factor and the labeled data points and the unlabeled data points as training examples; applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and outputting a classification of the classified data points, or derivative thereof.
-
Citations
46 Claims
-
1. In a computer-based system, a method for classification of data comprising:
-
receiving labeled data points, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category; receiving unlabeled data points; receiving at least one predetermined cost factor of the labeled data points and unlabeled data, points; training a transductive classifier using Maximum Entropy Discrimination (MED) through iterative calculation using said at least one cost factor and the labeled data points and the unlabeled data points as training examples, wherein for each iteration of the calculations the unlabeled data point cost factor is adjusted as a function of an expected label value and a data point label prior probability is adjusted according to an estimate of a data point class membership probability; applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and outputting a classification of the classified data points, or derivative thereof, to at least one of a user, another system, and another process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for classification of data comprising:
- providing computer executable program code to be deployed to and executed on a computer system, the program code comprising instructions for;
accessing stored labeled data points in a memory of a computer, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category; accessing unlabeled data points from a memory of a computer; accessing at least one predetermined cost factor of the labeled data points and unlabeled data points from a memory of a computer; training a Maximum Entropy Discrimination (MED) transductive classifier through iterative calculation using said at least one stored cost factor and stored labeled data points and stored unlabeled data points as training examples wherein for each iteration of the calculation the unlabeled data point cost factor is adjusted as a function of an expected label value and a data point prior probability is adjusted according to an estimate of a data point class membership probability; applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and outputting a classification of the classified data points, or derivative thereof, to at least one of a user, another system, and another process. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- providing computer executable program code to be deployed to and executed on a computer system, the program code comprising instructions for;
-
23. A data processing apparatus comprising:
-
at least one memory for storing;
(i) labeled data points wherein each of said labeled data points having at least one label indicating whether the data point is a training example for data points being included in a designated category or a training example for data points being excluded from a designated category;
(ii) unlabeled data points; and
(iii) at least one predetermined cost factor of the labeled data points and unlabeled data points; anda transductive classifier trainer to iteratively teach the transductive classifier using transductive Maximum Entropy Discrimination (MED) using said at least one stored cost factor and stored labeled data points and stored unlabeled data points as training examples wherein at each iteration of the MED calculation the cost factor of the unlabeled data point is adjusted as a function of an expected label value and a data point label prior probability is adjusted according to an estimate of a data point class membership probability; wherein a classifier trained by the transductive classifier trainer is used to classify at least one of the unlabeled data points, the labeled data points, and input data points; wherein a classification of the classified data points, or derivative thereof, is output to at least one of a user, another system, and another process. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. An article of manufacture comprising a program storage medium readable by a computer, the medium tangibly embodying one or more programs of instructions executable by a computer to perform a method of data classification comprising:
-
receiving labeled data points, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category; receiving unlabeled data points; receiving at least one predetermined cost factor of the labeled data points and unlabeled data points; training a transductive classifier with iterative Maximum Entropy Discrimination (MED) calculation using said at least one stored cost factor and stored labeled data points and stored unlabeled data points as training examples wherein at each iteration of the MED calculation the unlabeled data point cost factor is adjusted as a function of an expected label value and a data point prior probability is adjusted according to an estimate of a data point class membership probability; applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and outputting a classification of the classified data points, or derivative thereof, to at least one of a user, another system, and another process. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42)
-
-
43. In a computer-based system, a method for classification of unlabeled data comprising:
-
receiving labeled data points, each of said labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category; receiving labeled and unlabeled data points; receiving prior label probability information of labeled data points and unlabeled data points; receiving at least one predetermined cost factor of the labeled data points and unlabeled data points; determining the expected labels for each labeled and unlabeled data point according to the label prior probability of the data point; repeating the following substeps until substantial convergence of data values; generating a scaled cost value for each unlabeled data point proportional to the absolute value of the data point'"'"'s expected label; training a classifier by determining the decision function that minimizes the KL divergence to the prior probability distribution of the decision function parameters given the included training and excluded training examples utilizing the labeled as well as the unlabeled data as training examples according to their expected label; determining the classification scores of the labeled and unlabeled data points using the trained classifier; calibrating the output of the trained classifier to class membership probability; updating the label prior probabilities of the unlabeled data points according to the determined class membership probabilities; determining the label and margin probability distributions using Maximum Entropy Discrimination (MED) using the updated label prior probabilities and the previously determined classification scores; computing new expected labels using the previously determined label probability distribution; and updating expected labels for each data point by interpolating the new expected labels with the expected label of previous iteration; and outputting a classification of the input data points, or derivative thereof, to at least one of a user, another system, aid another process. - View Dependent Claims (44, 45, 46)
-
Specification