System for automatically annotating training data for a natural language understanding system
First Claim
Patent Images
1. A method of generating annotated training data to train a natural language understanding (NLU) system having one or more models, comprising:
- generating a proposed annotation with the NLU system for each of one or more units of unannotated training data;
displaying the proposed annotations for user verification or correction to obtain a user-confirmed annotation; and
training the NLU system with the user-confirmed annotation; and
displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system;
wherein displaying the proposed annotations for user verification or correction comprises;
receiving a user input indicative of a user-identified portion of the proposed annotation; and
displaying a plurality of alternative proposed annotations for the user-identified portion;
wherein the one or more models impose model constraints and wherein displaying the one or more alternative proposed annotations comprises displaying an alternative proposed annotation for the user-identified portion of data only if the alternative proposed annotation can lead to an overall annotation for the unit that is consistent with the model constraints;
wherein the proposed annotation includes parent and child nodes and wherein displaying a plurality of alternative proposed annotations includes displaying a user actuable delete node input which, when actuated, deletes a child node, and a user actuable add node input which, when actuated, adds a child node, and displaying the plurality of alternative proposed annotations in response to a user deleting a child node associated with the user-identified portion of data;
wherein displaying a plurality of alternative proposed annotations comprises displaying a portion of the unit of data not covered by the proposed annotation, and displaying a plurality of alternative proposed annotations for the portion of data not covered by the proposed annotation;
wherein the user is enabled to select a segment of the portion of data not covered by the proposed annotation and wherein displaying alternative proposed annotations comprises displaying a plurality of one or more alternative proposed annotations for the user-selected segment; and
wherein the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention uses a natural language understanding system that is currently being trained to assist in annotating training data for training that natural language understanding system. Unannotated training data is provided to the system and the system proposes annotations to the training data. The user is offered an opportunity to confirm or correct the proposed annotations, and the system is trained with the corrected or verified annotations.
64 Citations
36 Claims
-
1. A method of generating annotated training data to train a natural language understanding (NLU) system having one or more models, comprising:
-
generating a proposed annotation with the NLU system for each of one or more units of unannotated training data; displaying the proposed annotations for user verification or correction to obtain a user-confirmed annotation; and training the NLU system with the user-confirmed annotation; and displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system; wherein displaying the proposed annotations for user verification or correction comprises; receiving a user input indicative of a user-identified portion of the proposed annotation; and displaying a plurality of alternative proposed annotations for the user-identified portion; wherein the one or more models impose model constraints and wherein displaying the one or more alternative proposed annotations comprises displaying an alternative proposed annotation for the user-identified portion of data only if the alternative proposed annotation can lead to an overall annotation for the unit that is consistent with the model constraints; wherein the proposed annotation includes parent and child nodes and wherein displaying a plurality of alternative proposed annotations includes displaying a user actuable delete node input which, when actuated, deletes a child node, and a user actuable add node input which, when actuated, adds a child node, and displaying the plurality of alternative proposed annotations in response to a user deleting a child node associated with the user-identified portion of data; wherein displaying a plurality of alternative proposed annotations comprises displaying a portion of the unit of data not covered by the proposed annotation, and displaying a plurality of alternative proposed annotations for the portion of data not covered by the proposed annotation; wherein the user is enabled to select a segment of the portion of data not covered by the proposed annotation and wherein displaying alternative proposed annotations comprises displaying a plurality of one or more alternative proposed annotations for the user-selected segment; and wherein the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method of generating annotated training data to train a natural language understanding (NLU) system having one or more models, comprising:
-
generating a proposed annotation with the NLU system for each of one or more units of unannotated training data; displaying the proposed annotations for user verification or correction to obtain a user-confirmed annotation, comprising; displaying a plurality of alternative proposed annotations to data portions associated with a child node in response to that child node being deleted; wherein the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data; training the NLU system with the user-confirmed annotation; and displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system, wherein displaying an indication of a volume of training data comprises; displaying a representation of the one or more models; and visually contrasting portions of the one or more models that have been trained with a threshold volume of training data. - View Dependent Claims (20)
-
-
21. A method of generating annotated training data to train a natural language understanding (NLU) system having one or more models, comprising:
-
generating a proposed annotation with the NLU system for each of one or more units of unannotated training data; displaying the proposed annotations for user verification or correction to obtain a user-confirmed annotation; training the NLU system with the user-confirmed annotation; identifying inconsistencies between the user-confirmed annotation and prior annotations; displaying a user actuable delete node input which, when actuated, deletes a child node; displaying a user actuable add node input which, when actuated, adds a child node; displaying a plurality of alternative proposed annotations to data portions associated with a child node in response to that child node being deleted, such that the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data; and displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system. - View Dependent Claims (22)
-
-
23. A computing environment comprising a processor, the computing environment being configured to execute a user interface for training a natural language understanding (NLU) system that has one or more models, the user interface comprising:
-
a first portion displaying a model display representative of the one or more models; a second portion displaying unannotated training inputs; one or more user-actuable inputs configured to be actuable by a user to indicate a user-selected one of the unannotated training inputs, the computing environment comprising a processor that is configured to receive the user-selected unannotated training inputs and provide an output comprising a plurality of proposed annotations for the user-selected unannotated training inputs; a third portion displaying the proposed annotations for a selected one of the unannotated training inputs; a fourth portion displaying a sample of the unannotated training input not covered by the proposed annotations; one or more user-actuable inputs configured to be actuable by a user to indicate a user-selected segment of the sample not covered, such that the input indicating the user-selected segment is received by the processor; a fifth portion displaying a plurality of alternative proposed annotations for the user-selected segment, provided by the processor in response to the input indicating the user-selected segment; and a sixth portion displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system; such that the fifth portion displaying the plurality of alternative proposed annotations further includes; displaying one or more user actuable alternative annotation node inputs; displaying a user actuable delete node input which, when actuated, deletes a child node; displaying a user actuable add node input which, when actuated, adds a child node; displaying a plurality of alternative proposed annotations to data portions associated with a child node in response to that child node being deleted; and enabling the user to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and using the user-selected alternative proposed annotation for training the natural language understanding (NLU) system. - View Dependent Claims (24, 25, 26, 27, 28)
-
-
29. A method of generating annotated training data for training a natural language understanding (NLU) system having at least one model, comprising:
-
generating a proposed annotation for a unit of unannotated training data; calculating a confidence measure for a plurality of different portions of the proposed annotation; displaying the proposed annotation by visually contrasting portions that have a corresponding confidence measure that falls below a threshold level; displaying user actuable inputs for user correction or verification of the proposed annotation, user actuable inputs comprising; one or more user actuable node inputs for annotation alternatives; a user actuable delete node input which, when actuated, deletes a child node; and a user actuable add node input which, when actuated, adds a child node; displaying a plurality of alternative proposed annotations to data portions associated with the child node in response to that child node being deleted, such that the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data; and displaying an indication of a volume of training data used to train a plurality of different portions of the at least one model of the natural language understanding system. - View Dependent Claims (30, 31, 32)
-
-
33. A method of generating annotated training data for training a natural language understanding (NLU) system having at least one model, comprising:
-
generating, with the NLU system, a proposed annotation for a unit of unannotated training data; displaying the proposed annotation with user actuable inputs for user correction or verification of the proposed annotation to obtain a user-confirmed annotation; training the model with the user-confirmed annotation; and checking for inconsistencies among user-confirmed annotation and data already used to train the model by determining whether the model accurately predicts the prior user-confirmed annotations; the user actuable inputs comprising; one or more user actuable node inputs for annotation alternatives; a user actuable delete node input which, when actuated, deletes a child node; and a user actuable add node input which, when actuated, adds a child node; the method further comprising; displaying a plurality of alternative proposed annotations to data portions associated with the child node in response to that child node being deleted, such that the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data; and displaying an indication of a volume of training data used to train a plurality of different portions of the at least one model of the natural language understanding system. - View Dependent Claims (34, 35, 36)
-
Specification