Method for transforming data elements within a classification system based in part on input from a human annotator/expert
First Claim
1. A method for transforming computer stored records comprised of at least one data item in a document in a data collection or an element instance within a document into a classification system, the method comprising:
- retrieving documents stored as computer records;
associating with any element instance within the data item in the document a set of distinctive features indicative of accurate identification and classification of the element instance;
providing a training set comprising a subset of elements within the document labeled with class labels;
providing a learning method comprised of a basic learning algorithm and also trained with annotated elements to produce a result of predicting class labels to be assigned to unlabeled elements;
identifying an element instance within the unlabeled elements including predicting a class of selected element instances;
computing a confidence factor that a selected element instance is accurately identified by the predicted class;
for a selected element instance having a confidence factor less than zero, querying a concept evolution command emitting human annotator/expert for a true class label of the instance;
the querying comprising the annotator/expert generating a concept evolution command for redefining the unlabeled element wherein the generating the concept evolution command comprises adjusting at least one associated feature of the training set in an incremental manner;
the adjusting at least one associated feature of the training set comprising a local approach concept evolution comprising associating the local model for each evolution event including a concept evolution command wherein the associating local model comprises the maintenance of the concept evolution directed acyclic graph (DAG) and corresponding an event model to an internal node of the concept evolution DAG; and
extending the training set to include an as yet not included true labeled instance and iterating the identifying and computing for another element instance with a low confidence factor;
wherein the learning model comprises a global approach to reshape a list of the classes and extends the set of features, or wherein the learning method comprises a local approach that creates a local model of one or few events, the definition set of classes remains unchanged, and the training set can be extended with new examples.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system are provided for classifying data items such as a document based upon identification of element instances within the data item. A training set of classes is provided where each class is associated with one or more features indicative of accurate identification of an element instance within the data item. Upon the identification of the data item with the training set, a confidence factor is computed that the selected element instance is accurately identified. When a selected element instance has a low confidence factor, the associated features for the predicted class are changed by an annotator/expert so that the changed class definition of the new associated feature provides a higher confidence factor of accurate identification of element instances within the data item.
-
Citations
12 Claims
-
1. A method for transforming computer stored records comprised of at least one data item in a document in a data collection or an element instance within a document into a classification system, the method comprising:
-
retrieving documents stored as computer records; associating with any element instance within the data item in the document a set of distinctive features indicative of accurate identification and classification of the element instance; providing a training set comprising a subset of elements within the document labeled with class labels; providing a learning method comprised of a basic learning algorithm and also trained with annotated elements to produce a result of predicting class labels to be assigned to unlabeled elements; identifying an element instance within the unlabeled elements including predicting a class of selected element instances; computing a confidence factor that a selected element instance is accurately identified by the predicted class; for a selected element instance having a confidence factor less than zero, querying a concept evolution command emitting human annotator/expert for a true class label of the instance; the querying comprising the annotator/expert generating a concept evolution command for redefining the unlabeled element wherein the generating the concept evolution command comprises adjusting at least one associated feature of the training set in an incremental manner; the adjusting at least one associated feature of the training set comprising a local approach concept evolution comprising associating the local model for each evolution event including a concept evolution command wherein the associating local model comprises the maintenance of the concept evolution directed acyclic graph (DAG) and corresponding an event model to an internal node of the concept evolution DAG; and extending the training set to include an as yet not included true labeled instance and iterating the identifying and computing for another element instance with a low confidence factor; wherein the learning model comprises a global approach to reshape a list of the classes and extends the set of features, or wherein the learning method comprises a local approach that creates a local model of one or few events, the definition set of classes remains unchanged, and the training set can be extended with new examples. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 12)
-
-
9. A method for transforming a data item based upon identification of element instances within the data item into a classification, the method comprising:
-
associating with any element instance in a document a set of features indicative of accurate identification of the element instance; providing a training set comprising a subset of elements within the document labeled with class labels; providing a learning method comprised of an algorithm and trained with annotated elements to produce a result of predicting class labels to be assigned to unlabeled elements; identifying an element instance within the unlabeled elements including predicting a class of selected element instances; computing a confidence factor that a selected element instance is accurately identified by the predicted class the confidence factor computed using the formula; - View Dependent Claims (10)
-
-
11. A method for transforming computer stored records comprising at least one data item in a document in a data collection or an element instance within a document into a classification, the method comprising:
-
retrieving documents stored in a computer system; associating with any element instance comprised of an author, title, or abstract in the document a set of features indicative of accurate identification of the element instance; providing a training set comprising a subset of elements within the document labeled with class labels; providing a learning method including trained with annotated elements for predicting class labels for unlabeled elements; identifying an element instance within the unlabeled elements including predicting a class of selected element instances; computing a confidence factor that a selected element instance is accurately identified by the predicted class; for a selected element instance having a confidence factor of a value less than or equal to zero, querying a human annotator/expert for a true class label of the instance; the querying comprising the annotator/expert generating a concept evolution command for redefining the unlabeled element wherein the generating the concept evolution command comprises adjusting at least one associated feature of the training set in an incremental manner; the adjusting at least one associated feature of the training set comprising a local approach concept evolution comprising associating the local model for each evolution event including a concept evolution command wherein the associating local model comprises the maintenance of the concept evolution directed acyclic graph (DAG) and corresponding an event model to an internal node of the concept evolution DAG; and extending the training set with the true labeled instance and iterating the identifying and computing for another element instance with a low confidence factor; wherein the learning model comprises a global approach to reshape a list of the classes and extends the set of features, or wherein the learning method comprises a local approach that creates a local model of one or few events, the definition set of classes remains unchanged, and the training set can be extended with new examples.
-
Specification