DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES
First Claim
1. An article of manufacture comprising:
- a program storage medium readable by a computer, where the medium tangibly embodies one or more programs of instructions executable by a computer to perform a method of data classification, the one or more programs of instructions comprising;
instructions for receiving at least one labeled seed document;
instructions for receiving unlabeled documents;
instructions for receiving at least one predetermined cost factor;
instructions for training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents;
instructions for classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier;
instructions for reclassifying at least some of the categorized documents previously categorized by a different classifier into the categories using the classifier; and
instructions for outputting identifiers of the categorized documents to at least one of a user, another system, and another process.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and article of manufacture enabling adapting to a shift in document content according to one embodiment of the present invention includes instructions for: receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Systems and articles of manufacture for separating documents are also presented. Systems and articles of manufacture for document searching are also presented.
106 Citations
28 Claims
-
1. An article of manufacture comprising:
a program storage medium readable by a computer, where the medium tangibly embodies one or more programs of instructions executable by a computer to perform a method of data classification, the one or more programs of instructions comprising; instructions for receiving at least one labeled seed document; instructions for receiving unlabeled documents; instructions for receiving at least one predetermined cost factor; instructions for training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; instructions for classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; instructions for reclassifying at least some of the categorized documents previously categorized by a different classifier into the categories using the classifier; and instructions for outputting identifiers of the categorized documents to at least one of a user, another system, and another process. - View Dependent Claims (2, 3, 4)
-
5. The article of manufacture of claim I, wherein the unlabeled documents are customer complaints, and further comprising linking product changes with customer complaints.
-
6. The article of manufacture of claim I, wherein the unlabeled documents are invoices.
-
7. An article of manufacture comprising:
a program storage medium readable by a computer, where the medium tangibly embodies one or more programs of instructions executable by a computer to perform a method for separating documents, the One or more programs of instructions comprising; instructions for receiving labeled data; instructions for receiving a sequence of unlabeled documents; instructions for adapting probabilistic classification rules using transduction based on the labeled data and the unlabeled documents; instructions for updating weights used for document separation according to the probabilistic classification rules; instructions for determining locations of separations in the sequence of documents; instructions for outputting indicators of the determined locations of the separations in the sequence to at least one of a user, another system, and another process; and instructions for flagging the documents with codes, the codes correlating to the indicators.
-
8. An article of manufacture comprising:
a program storage medium readable by a computer, where the medium tangibly embodies one or more programs of instructions executable by a computer to perform a method of document searching, the one or more programs of instructions comprising; instructions for receiving a search query; instructions for retrieving documents based on the search query; instructions for outputting the documents; instructions for receiving user-entered labels for at least some of the documents, the labels being indicative of a relevance of the document to the search query; instructions for training a classifier based on the search query and the user-entered labels; instructions for performing a document classification technique on the documents using the classifier for reclassifying the documents; and instructions for outputting identifiers of at least some of the documents based on the classification thereof. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A system comprising:
-
a processor; and a program storage medium and/or memory storing one or more programs of instructions executable by the processor to perform a method of data classification, the one or more programs of instructions comprising; instructions for receiving at least one labeled seed document; instructions for receiving unlabeled documents; instructions for receiving at least one predetermined cost factor; instructions for training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; instructions for classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; instructions for reclassifying at least some of the categorized documents previously categorized by a different classifier into the categories using the classifier; and instructions for outputting identifiers of the categorized documents to at least one of a user, another system, and another process. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A system comprising:
-
a processor; and a program storage medium and/or memory storing one or more programs of instructions executable by the processor to perform a method for separating documents, the one or more programs of instructions comprising; instructions for receiving labeled data; instructions for receiving a sequence of unlabeled documents; instructions for adapting probabilistic classification rules using transduction based on the labeled data and the unlabeled documents; instructions for updating weights used for document separation according to the probabilistic classification rules; instructions for determining locations of separations in the sequence of documents; instructions for outputting indicators of the determined locations of the separations in the sequence to at least one of a user, another system, and another process; and instructions for flagging the documents with codes, the codes correlating to the indicators.
-
-
22. A system comprising:
-
a processor; and a program storage medium and/or memory storing one or more programs of instructions executable by the processor to perform a method of document searching, the one or more programs of instructions comprising; instructions for receiving a search query; instructions for retrieving documents based on the search query; instructions for outputting the documents; instructions for receiving user-entered labels for at least some of the documents, the labels being indicative of a relevance of the document to the search query; instructions for training a classifier based on the search query and the user-entered labels; instructions for performing a document classification technique on at least some of the documents using the classifier for reclassifying the at least some of the documents; and instructions for outputting identifiers of the at least some of the documents based on the classification thereof. - View Dependent Claims (23, 24, 25, 26, 27, 28)
-
Specification