Reducing time for annotating speech data to develop a dialog application
First Claim
1. In a system that uses annotated speech data, a method for annotating speech data by processing a portion of unannotated speech data with at least one model, the processing comprising:
- evaluating a performance of at least one model with respect to each utterance in the portion of unannotated speech data using a criterion;
creating an annotation list that includes utterances that do not satisfy the criterion by;
using system deficiencies in combination with the criterion to identify utterances to be included in the annotation list; and
using previously annotated speech data in combination with the criterion or the system deficiencies to identify utterances to be included in the annotation list;
identifying an order in which the utterances on the annotation list are to be annotated;
generating via a processor a label for a particular utterance; and
including the particular utterance in the annotation list if the label does not match an existing label of the particular utterance.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.
34 Citations
10 Claims
-
1. In a system that uses annotated speech data, a method for annotating speech data by processing a portion of unannotated speech data with at least one model, the processing comprising:
-
evaluating a performance of at least one model with respect to each utterance in the portion of unannotated speech data using a criterion; creating an annotation list that includes utterances that do not satisfy the criterion by; using system deficiencies in combination with the criterion to identify utterances to be included in the annotation list; and using previously annotated speech data in combination with the criterion or the system deficiencies to identify utterances to be included in the annotation list; identifying an order in which the utterances on the annotation list are to be annotated; generating via a processor a label for a particular utterance; and including the particular utterance in the annotation list if the label does not match an existing label of the particular utterance. - View Dependent Claims (2, 3)
-
-
4. A system for annotating speech data by processing a portion of unannotated speech data with the at least one model, the system comprising:
-
a first module controlling a processor to evaluate a performance of the at least one model with respect to each utterance in the portion of unannotated speech data using a criterion; a second module controlling the processor to create an annotation list that includes utterances that do not satisfy the criterion by; using system deficiencies in combination with the criterion to identify utterances to be included in the annotation list; and using previously annotated speech data in combination with the criterion or the system deficiencies to identify utterances to be included in the annotation list; a third module controlling the processor to identify an order in which the utterances on the annotation list are to be annotated; a fourth module controlling the processor to generate a label for a particular utterance; and a fifth module controlling the processor to include the particular utterance in the annotation list if the label does not match an existing label of the particular utterance. - View Dependent Claims (5, 6, 7)
-
-
8. A non-transitory computer-readable medium storing instructions for controlling a computing device to collect speech data to annotate speech data by processing a portion of unannotated speech data with the at least one model, the instructions comprising:
-
evaluating a performance of the at least one model with respect to each utterance in the portion of unannotated speech data using a criterion; creating an annotation list that includes utterances that do not satisfy the criterion by; using system deficiencies in combination with the criterion to identify utterances to be included in the annotation list; using previously annotated speech data in combination with the criterion or the system deficiencies to identify utterances to be included in the annotation list; identifying an order in which the utterances on the annotation list are to be annotated; generating via a processor a label for a particular utterance; and including the particular utterance in the annotation list if the label does not match an existing label of the particular utterance. - View Dependent Claims (9, 10)
-
Specification