DATA CLASSIFICATION METHODS USING MACHINE LEARNING TECHNIQUES
First Claim
1. A method for adapting to a shift in document content, comprising:
- receiving at least one labeled seed document;
receiving unlabeled documents;
receiving at least one predetermined cost factor;
training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents;
classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier;
reclassifying at least some of the categorized documents into the categories using the classifier; and
outputting identifiers of the categorized documents to at least one of a user, another system, and another process.
8 Assignments
0 Petitions
Accused Products
Abstract
A method for adapting to a shift in document content according to one embodiment of the present invention includes receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Methods for separating documents are also presented. Methods for document searching are also presented.
132 Citations
14 Claims
-
1. A method for adapting to a shift in document content, comprising:
-
receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for separating documents, comprising:
-
receiving labeled data; receiving a sequence of unlabeled documents; adapting probabilistic classification rules using transduction based on the labeled data and the unlabeled documents; updating weights used for document separation according to the probabilistic classification rules; determining locations of separations in the sequence of documents; outputting indicators of the determined locations of the separations in the sequence to at least one of a user, another system, and another process; and flagging the documents with codes, the codes correlating to the indicators.
-
-
8. A method for document searching, comprising:
-
receiving a search query; retrieving documents based on the search query; outputting the documents; receiving user-entered labels for at least some of the documents, the labels being indicative of a relevance of the document to the search query; training a classifier based on the search query and the user-entered labels; performing a document classification technique on the documents using the classifier for reclassifying the documents; and outputting identifiers of at least some of the documents based on the classification thereof. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification