Method of active learning for automatic speech recognition
First Claim
1. A method comprising:
- recognizing untranscribed audio utterances that are candidates for transcription using a trained acoustic model and a trained language model;
computing confidence scores associated with an accuracy of speech recognition of the untranscribed audio utterances;
transcribing an audio utterance from the untranscribed audio utterances, the audio utterance having a lowest confidence score in the confidence scores; and
removing the audio utterance which was transcribed from a database comprising the untranscribed audio utterances.
22 Assignments
0 Petitions
Accused Products
Abstract
State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor-intensive and time-consuming. The present invention is an iterative method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples and then selecting the most informative ones with respect to a given cost function for a human to label. The method comprises automatically estimating a confidence score for each word of the utterance and exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. An utterance confidence score is computed based on these word confidence scores; then the utterances are selectively sampled to be transcribed using the utterance confidence scores.
-
Citations
17 Claims
-
1. A method comprising:
-
recognizing untranscribed audio utterances that are candidates for transcription using a trained acoustic model and a trained language model; computing confidence scores associated with an accuracy of speech recognition of the untranscribed audio utterances; transcribing an audio utterance from the untranscribed audio utterances, the audio utterance having a lowest confidence score in the confidence scores; and removing the audio utterance which was transcribed from a database comprising the untranscribed audio utterances. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; recognizing untranscribed audio utterances that are candidates for transcription using a trained acoustic model and a trained language model; computing confidence scores associated with an accuracy of speech recognition of the untranscribed audio utterances; transcribing an audio utterance from the untranscribed audio utterances, the audio utterance having a lowest confidence score in the confidence scores; and removing the audio utterance which was transcribed from a database comprising the untranscribed audio utterances. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
recognizing untranscribed audio utterances that are candidates for transcription using a trained acoustic model and a trained language model; computing confidence scores associated with an accuracy of speech recognition of the untranscribed audio utterances; transcribing an audio utterance from the untranscribed audio utterances, the audio utterance having a lowest confidence score in the confidence scores; and removing the audio utterance which was transcribed from a database comprising the untranscribed audio utterances. - View Dependent Claims (14, 15, 16, 17)
-
Specification