System and method for a partially self-training learning system
First Claim
10. In an electronic device, a method, comprising the steps of:
- training a learning system on a set of documents, each of said documents being a collection of data, said labels identifying document categories;
performing an analysis on a selected document without labels with said learning system;
assigning a specified label to the selected document based on said analysis, said analysis comparing word occurrence in said selected document with word occurrence in said set of documents; and
determining the accuracy of the specified label assigned to said selected document.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for a partially self-training learning system is disclosed. The learning systems, such as document classifiers, are initially trained on a small amount of hand-sorted data. The learning systems process unlabeled data by assigning classifications to the data. A confidence level in the classification is verified for each newly classified document. If the classification is made with a sufficiently high confidence level, the learning system trains on the word vector of the newly classified document. If the classification of the newly classified document is not made with a sufficiently high confidence level, the learning system does not use the word vector in the newly classified document for training purposes.
-
Citations
23 Claims
-
10. In an electronic device, a method, comprising the steps of:
-
training a learning system on a set of documents, each of said documents being a collection of data, said labels identifying document categories;
performing an analysis on a selected document without labels with said learning system;
assigning a specified label to the selected document based on said analysis, said analysis comparing word occurrence in said selected document with word occurrence in said set of documents; and
determining the accuracy of the specified label assigned to said selected document. - View Dependent Claims (1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
19-1. The method of claim 18, comprising the further step of:
comparing a calculated probability said selected document was generated by the category referenced by said specified label with a pre-defined parameter in order to determine a confidence level for the accuracy of the specified label assigned to said selected document.
-
21. In an electronic device, a medium holding computer-executable steps for a method, said method comprising the steps of:
-
training a document classifier on a set of documents having labels, said labels identifying document categories, said documents accessible over said network;
analyzing a selected document with said document classifier;
assigning a specified label to said selected document based on said analyzing, said analyzing comparing word occurrence in said selected document with word occurrence in said set of documents having labels;
determining the accuracy of the specified label assigned to said selected document; and
using a word vector in said selected document to further train said document classifier. - View Dependent Claims (22, 23)
-
Specification