Clustering based text classification
First Claim
Patent Images
1. A method for text classification, the method comprising:
- clustering text comprising labeled data and unlabeled data in view of the labeled data to generate one or more clusters;
generating expanded labeled data as a function of the one or more clusters, the expanded label data comprising the labeled data and at least a portion of the unlabeled data;
training one or more discriminative classifiers based on the expanded labeled data and remaining ones of the unlabeled data; and
generating, using the one or more discriminative classifiers, classified text for information retrieval.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for clustering-based text classification are described. In one aspect text is clustered as a function of labeled data to generate cluster(s). The text includes the labeled data and unlabeled data. Expanded labeled data is then generated as a function of the cluster(s). The expanded label data includes the labeled data and at least a portion of unlabeled data. Discriminative classifier(s) are then trained based on the expanded labeled data and remaining ones of the unlabeled data.
-
Citations
35 Claims
-
1. A method for text classification, the method comprising:
-
clustering text comprising labeled data and unlabeled data in view of the labeled data to generate one or more clusters; generating expanded labeled data as a function of the one or more clusters, the expanded label data comprising the labeled data and at least a portion of the unlabeled data; training one or more discriminative classifiers based on the expanded labeled data and remaining ones of the unlabeled data; and generating, using the one or more discriminative classifiers, classified text for information retrieval. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable medium having stored thereon computer-program instructions for text classification, the computer-program instructions being executable by a processor, the computer-program instructions comprising instructions for:
-
clustering text comprising labeled data and unlabeled data in view of the labeled data to generate one or more clusters; generating expanded labeled data as a function of the one or more clusters, the expanded label data comprising the labeled data and at least a portion of the unlabeled data; training one or more discriminative classifiers based on the expanded labeled data and remaining ones of the unlabeled data; and generating, using the one or more discriminative classifiers, classified text for information retrieval; wherein a size of the labeled data is small as compared to a size of the unlabeled data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computing device comprising:
-
a processor; and a memory coupled to the processor, the memory comprising computer-program instructions executable by the processor for text classification, the computer-program instructions comprising instructions for; clustering text comprising labeled data and unlabeled data in view of the labeled data to generate one or more clusters; generating expanded labeled data as a function of the one or more clusters, the expanded label data comprising the labeled data and at least a portion of the unlabeled data; training one or more discriminative classifiers based on the expanded labeled data and remaining ones of the unlabeled data; and generating, using the one or more discriminative classifiers, classified text for information retrieval. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computing device comprising:
-
clustering means to cluster text comprising labeled data and unlabeled data in view of the labeled data to generate one or more clusters; generating means to generate expanded labeled data as a function of the one or more clusters, the expanded label data comprising the labeled data and at least a portion of the unlabeled data; training means to train one or more discriminative classifiers based on the expanded labeled data and remaining ones of the unlabeled data; and generating means to classify text based on the one or more discriminative classifiers to create classified text for information retrieval. - View Dependent Claims (31, 32, 33, 34, 35)
-
Specification