SYSTEMS AND METHODS FOR REDUCING ANNOTATION TIME
First Claim
1. In a system that uses annotated speech data, a method for annotating speech data by processing a portion of unannotated speech data with one or more models, the processing comprising:
- generating a label for a particular utterance; and
including the particular utterance in an annotation list if the label does not match an existing label of the particular utterance.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.
28 Citations
16 Claims
-
1. In a system that uses annotated speech data, a method for annotating speech data by processing a portion of unannotated speech data with one or more models, the processing comprising:
-
generating a label for a particular utterance; and including the particular utterance in an annotation list if the label does not match an existing label of the particular utterance. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system that collects speech data for use in developing a dialog application, the system for annotating the speech data for the dialog application, the system comprising:
-
a module configured to analyze unannotated speech data with one or more speech recognition models, wherein each utterance in the speech data receives a recognition confidence score; a module configured to analyze the speech data that is not annotated with one or more spoken language understanding models, wherein each utterance in the speech data receives an understanding confidence score; and a module configured to create an annotation list that includes at least a portion of the utterances having a recognition confidence score below a confidence threshold score and that includes at least a portion of the utterances having an understanding confidence score below an understanding threshold score. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system that collects speech data for developing a dialog application, wherein the dialog application includes speech recognition models, spoken language understanding models, and labeling models, the system reducing the time required to annotate the speech data, the system comprising:
-
a module configured to select one or more utterances from speech data for annotation based on confidence scores of the one or more utterances, wherein the confidence scores are generated by at least one of;
speech recognition models, spoken language understanding models, and labeling models;a module configured to select one or more utterances from the speech data for annotation based on deficiencies of a dialog application; and a module configured to create an annotation list that includes the selected one or more utterances, wherein the annotation list identifies a type of annotation to be performed for each of the one or more utterances in the annotation list.
-
Specification