Document classifiers and methods for document classification
First Claim
1. A method of classifying a document comprising:
- using a processor (or computer) to perform the steps of;
classifying said document using output from one or more of said classifier engines based on a comparison of one or more metrics for each classifier engine;
using a set of training documents to determine a precision value for each classifier engine;
identifying a highest precision value and a second highest precision value;
determining if said highest precision value is greater than said second highest precision value by a predetermined amount;
if so, using output from the classifier engine having said highest precision value to classify said document; and
if not, generating for each classifier engine, a list of probabilities of said document being classified by each classifier engine into each one of a group of possible classes, summing said probabilities for each class, and classifying said document into the class with the largest sum of probabilities.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of classifying a document includes providing a plurality of classifier engines and classifying the document using output from one or more of the classifier engines based on a comparison of one or more metrics for each classifier engine. In another embodiment, a method of classifying a document comprises providing a plurality of classifier engines and determining one or more metrics for each classifier engine. These metrics are used to determine how to use the classifier engines to classify the document, and the document is classified accordingly. A further embodiment includes a document classifier utilizing a plurality of classifier engines. In yet another embodiment, a computer-readable medium contains instructions for controlling a computer system to perform a method of using a plurality of classifier engines to classify a document.
-
Citations
28 Claims
-
1. A method of classifying a document comprising:
-
using a processor (or computer) to perform the steps of; classifying said document using output from one or more of said classifier engines based on a comparison of one or more metrics for each classifier engine; using a set of training documents to determine a precision value for each classifier engine; identifying a highest precision value and a second highest precision value; determining if said highest precision value is greater than said second highest precision value by a predetermined amount; if so, using output from the classifier engine having said highest precision value to classify said document; and if not, generating for each classifier engine, a list of probabilities of said document being classified by each classifier engine into each one of a group of possible classes, summing said probabilities for each class, and classifying said document into the class with the largest sum of probabilities. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of classifying a document comprising:
-
using a processor (or computer) to perform the steps of; classifying said document using output from one or more of said classifier engines based on a comparison of one or more metrics for each classifier engine; using a set of training documents to rank said classifier engines from best to worst; generating, for each classifier engine, a list of probabilities of said document being classified by each classifier engine into each one of a group of possible classes; determining if the best classifier engine has a highest probability that is greater than the highest probability of each other classifier engine by a predetermined amount; if so, using output from said best classifier engine to classify said document; and if not, summing said probabilities for each class, and classifying said document into the class with the largest sum of probabilities. - View Dependent Claims (8, 9)
-
-
10. A method of classifying a document comprising:
-
using a processor (or computer) to perform the steps of; classifying said document using output from one or more of said classifier engines based on a comparison of one or more metrics for each classifier engine; using a set of training documents to rank said classifier engines from best to worst; generating, for each classifier engine, a list of probabilities of said document being classified by each classifier engine into each one of a group of possible classes; determining if the best classifier engine has a highest probability that exceeds a predetermined threshold; if so, using output from said best classifier engine to classify said document; and if not, summing said probabilities for each class, and classifying said document into the class with the largest sum of probabilities. - View Dependent Claims (11, 12)
-
-
13. A method of classifying a document comprising:
-
using a processor (or computer) to perform the steps of; determining one or more metrics for each classifier engine; using said metrics to determine how to use said classifier engines to classify said document; and classifying said document accordingly, wherein determining one or more metrics includes using a set of training documents to determine a precision value for each classifier engine and identifying a highest precision value and a second highest precision value, and wherein classifying said document includes; determining if said highest precision value is greater than said second highest precision value by a predetermined amount; if so, using output from the classifier engine having said highest precision value to classify said document; and if not, generating, for each classifier engine, a list of probabilities of said document being classified by each classifier engine into each one of a group of possible classes, summing said probabilities for each class, and classifying said document into the class with the largest sum of probabilities. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A method of classifying a document comprising:
-
using a processor (or computer) to perform the steps of; determining one or more metrics for each classifier engine; using said metrics to determine how to use said classifier engines to classify said document; classifying said document accordingly; using said metrics to rank said classifier engines from best to worst, and wherein classifying said document includes; generating, for each classifier engine, a list of probabilities of said document being classified by each classifier engine into each one of a group of possible classes; determining if the best classifier engine has a highest probability that is greater than the highest probability of each other classifier engine by a predetermined amount; if so, using output from said best classifier engine to classify said document; and if not, summing said probabilities for each class, and classifying said document into the class with the largest sum of probabilities. - View Dependent Claims (20, 21)
-
-
22. A method of classifying a document comprising:
-
using a processor (or computer) to perform the steps of; determining one or more metrics for each classifier engine; using said metrics to determine how to use said classifier engines to classify said document; classifying said document accordingly; using said metrics to rank said classifier engines from best to worst, and wherein classifying said document includes; generating, for each classifier engine, a list of probabilities of said document being classified by each classifier engine into each one of a group of possible classes; determining if the best classifier engine has a highest probability that exceeds a predetermined threshold; if so, using output from said best classifier engine to classify said document; and if not, summing said probabilities for each class, and classifying said document into the class with the largest sum of probabilities. - View Dependent Claims (23, 24)
-
-
25. A document classifier comprising:
-
a plurality of classifier engines; means for generating a metric for each classifier engine; means for comparing said metrics; and means for classifying a document using output from one or more of said classifier engines in response to said means for comparing;
whereinsaid means for generating a metric generates, for each classifier engine, a list of probabilities of said document being classified by each classifier engine into each one of a group of possible classes, and wherein said means for comparing sums said probabilities for each class and identifies the class with the largest sum of probabilities, wherein said means for generating a metric further generates a precision value for each classifier engine, and wherein said means for comparing identifies a highest precision value and a second highest precision value and determines if said highest precision value is greater than said second highest precision value by a predetermined amount, and wherein said means for classifying classifies said document using output from the classifier engine having said highest precision value to classify said document if said highest precision value is greater than said second highest precision value by a predetermined amount and classifies said document into the class with the largest sum of probabilities if said highest precision value is not greater than said second highest precision value by a predetermined amount. - View Dependent Claims (26, 27, 28)
-
Specification