Interactive machine learning system for automated annotation of information in text
First Claim
1. A method of learning annotators for use in an interactive machine learning system, the method comprising the steps of:
- providing at least partially annotated text data or unannotated text data with seeds or seed models of instances of at least one named entity or class to be learned;
iteratively learning annotators for the at least one named entity or class using a machine learning algorithm;
applying the learned annotators to text data resulting in the annotation of at least one named entity or class annotation instance; and
selectively presenting for review and correction, if determined, representations of the at least one named entity or class annotation instance identified by the applying of the learned annotators.
1 Assignment
0 Petitions
Accused Products
Abstract
An interactive machine learning based system that incrementally learns, on the basis of text data, how to annotate new text data. The system and method starts with partially annotated training data or alternatively unannotated training data and a set of examples of what is to be learned. Through iterative interactive training sessions with a user the system trains annotators, and these are in turn used to discover more annotations in the text data. Once all of the text data or a sufficient amount of the text data is annotated, at the user'"'"'s discretion, the system learns a final annotator or annotators, which are exported and available to annotate new textual data. As the iterative training process occurs the user is selectively presented for review and appropriate action, system-determined representations of the annotation instances and provided a convenient and efficient interface so that context of use can be verified if necessary in order to evaluate the annotations and correct them, where required. At the user'"'"'s discretion, annotations that receive a high confidence level can be automatically accepted and those with low confidence levels can be automatically rejected.
155 Citations
39 Claims
-
1. A method of learning annotators for use in an interactive machine learning system, the method comprising the steps of:
-
providing at least partially annotated text data or unannotated text data with seeds or seed models of instances of at least one named entity or class to be learned;
iteratively learning annotators for the at least one named entity or class using a machine learning algorithm;
applying the learned annotators to text data resulting in the annotation of at least one named entity or class annotation instance; and
selectively presenting for review and correction, if determined, representations of the at least one named entity or class annotation instance identified by the applying of the learned annotators. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A method of learning annotators for use in an interactive machine learning system for processing electronic text, the method comprising the steps of:
-
providing examples of a type of a named entity and unannotated textual data; and
iteratively learning annotators based on at least one of the examples of a named entity and unannotated textual data, where at the end of each iteration, any annotation, generated from the learned annotators, having a confidence level within a confidence level range is presented for review and, if required, corrected based on feedback.
-
-
26. A method of learning annotators for use in an interactive machine learning system, the method comprising the steps of:
-
a user sequentially labeling annotation instances in a current document from a document set;
a machine learning algorithm concurrently training on the documents in the document set to learn at least one annotator for at least one named entity or class; and
assigning a confidence level to each of the annotation instances by the learned at least one annotator such that any annotation instance which has a confidence level that is equal to or above a predetermined confidence level threshold and that occurs in a current document being labeled will be presented to the user for review and possible action. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. An apparatus for learning annotators for use in an interactive machine learning system for processing electronic text, comprising:
-
a means for providing at least partially annotated text data or unannotated text data with seeds or seed models of instances of at least one named entity or class to be learned;
a means for iteratively learning annotators for the at least one named entity or class using a machine learning algorithm from the at least one named entity or class;
a means for applying the learned annotators to text data resulting in the annotation of at least one named entity or class annotation instance; and
a means for selectively presenting for review and correction, if determined, representations of annotation instances identified by the learned annotators. - View Dependent Claims (36, 37)
-
-
38. An apparatus for learning annotators for use in an interactive machine learning system for processing electronic text, comprising:
-
means for providing examples of a type of a named entity and unannotated textual data; and
means for iteratively learning annotators based on at least one of the examples of a named entity and unannotated textual data, where at the end of each iteration, any annotation, generated from the learned annotators, having a confidence level within a confidence level range is corrected based on feedback.
-
-
39. A computer program product comprising a computer usable medium having a computer readable program code embodied in the medium, the computer program product includes:
-
a first computer component to provide at least partially annotated text data or unannotated text data with seeds or seed models of instances of at least one named entity or class to be learned;
a second computer component to iteratively learn annotators for the at least one named entity or class using a machine learning algorithm from the at least one named entity or class;
a third computer component to apply the learned annotators to text data resulting in the annotation of at least one named entity or class annotation instance; and
a fourth computer program component to selectively present for review and correction, if determined, representations of annotation instances identified by the learned annotators.
-
Specification