Method and apparatus for training a text classifier
First Claim
1. A method for training a classifier to classify at least one document which has not been manually annotated with respect to a defined class;
- performing an operation on data including a retrieval status value associated with the document to generate at least one parameter value;
calculating a degree of relevance representing the degree to which said document belongs to said defined class, said degree of relevance being a function of at least the retrieval status value and the parameter value; and
training said classifier using said degree of relevance.
6 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for training a text classifier is disclosed. A supervised learning system and an annotation system are operated cooperatively to produce a classification vector which can be used to classify documents with respect to a defined class. The annotation system automatically annotates documents with a degree of relevance annotation to produce machine annotated data. The degree of relevance annotation represents the degree to which the document belongs to the defined class. This machine annotated data is used as input to the supervised learning system. In addition to the machine annotated data, the supervised learning system can also receive manually annotated data and/or a user request. The machine annotated data, along with the manually annotated data and/or the user request, are used by the supervised learning system to produce a classification vector. In one embodiment, the supervised learning system comprises a relevance feedback mechanism. The relevance feedback mechanism is operated cooperatively with the annotation system for multiple iterations until a classification vector of acceptable accuracy is produced. The classification vector produced by the invention is the result of a combination of supervised and unsupervised learning.
-
Citations
29 Claims
-
1. A method for training a classifier to classify at least one document which has not been manually annotated with respect to a defined class;
-
performing an operation on data including a retrieval status value associated with the document to generate at least one parameter value; calculating a degree of relevance representing the degree to which said document belongs to said defined class, said degree of relevance being a function of at least the retrieval status value and the parameter value; and training said classifier using said degree of relevance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for producing a classification vector for use in classifying at least one non-manually annotated document with respect to a defined class, said method comprising the steps of:
-
performing an operation on data including a retrieval status value associated with the document to generate at least one parameter value; calculating a degree of relevance representing the degree to which said non-manually annotated document belongs to the defined class, said degree of relevance being a function of at least the retrieval status value and the parameter value; automatically annotating said non-manually annotated document with said degree of relevance to produce a machine annotated document; and performing a relevance feedback function using said machine annotated document. - View Dependent Claims (12, 13, 14, 15, 16, 29)
-
-
17. An apparatus for training a classifier to classify at least one document which has not been manually annotated with respect to a defined class, said apparatus comprising:
-
an operating processor for performing an operation on data including a retrieval status value associated with the document to generate at least one parameter value; an annotation processor for automatically annotating the document to produce at least one automatically annotated document, said annotation including a degree of relevance representing the degree to which said at least one automatically annotated document belongs to said defined class, said degree of relevance being a function of at least the retrieval status value and the parameter value; and a supervised learning processor for training the classifier using said at least one automatically annotated document. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. An apparatus for training a classifier to classify at least one document which has not been manually annotated with respect to a defined class, said apparatus comprising:
-
means for performing an operation on data including a retrieval status value associated with the document to generate at least one parameter value; means for calculating a degree of relevance representing the degree to which the document belongs to said defined class, said degree of relevance being a function of at least the retrieval status value and the parameter value; and means for training the classifier using said degree of relevance. - View Dependent Claims (25, 26, 27, 28)
-
Specification