Method of active learning for automatic speech recognition
First Claim
1. A method for reducing the transcription effort for training an automatic speech recognition module, the method comprising:
- (1) training acoustic and language models using a small set of transcribed data St;
(2) recognizing utterances in a set Su that are candidates for transcription using the acoustic and language models;
(3) computing confidence scores of the utterances;
(4) selecting k utterances that have the smallest confidence scores from Su and transcribing them into a new set Si;
(5) redefining Si as the union of St and Si;
(6) redefining Su as Su minus Si; and
(7) returning to step (1) if word accuracy has not converged.
18 Assignments
0 Petitions
Accused Products
Abstract
State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor-intensive and time-consuming. The present invention is an iterative method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples and then selecting the most informative ones with respect to a given cost function for a human to label. The method comprises automatically estimating a confidence score for each word of the utterance and exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. An utterance confidence score is computed based on these word confidence scores; then the utterances are selectively sampled to be transcribed using the utterance confidence scores.
76 Citations
12 Claims
-
1. A method for reducing the transcription effort for training an automatic speech recognition module, the method comprising:
-
(1) training acoustic and language models using a small set of transcribed data St; (2) recognizing utterances in a set Su that are candidates for transcription using the acoustic and language models; (3) computing confidence scores of the utterances; (4) selecting k utterances that have the smallest confidence scores from Su and transcribing them into a new set Si; (5) redefining Si as the union of St and Si; (6) redefining Su as Su minus Si; and (7) returning to step (1) if word accuracy has not converged. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for reducing the transcription effort for training an automatic speech recognition module, the method comprising:
-
(1) pre-processing unlabeled examples of transcribed utterances using a computer device; (2) using a lattice output from a speech recognizer, automatically estimating a confidence score for each word associated with a set of selected examples; (3) computing utterance confidence scores based on the estimated confidence score for each word; and (4) selecting utterances to be transcribed using the utterance confidence scores. - View Dependent Claims (8)
-
-
9. A computer-readable medium that stores a program for controlling a computer device to perform the following method to reduce the transcription effort for training an automatic speech recognition module, the method comprising:
-
(1) training acoustic and language models using a small set of transcribed data St; (2) recognizing utterances in a set Su that are candidates for transcription using the acoustic and language models; (3) computing confidence scores of the utterances; (4) selecting k utterances that have the smallest confidence scores from Su and transcribing them into a new set Si; (5) redefining Si as the union of St and Si; (6) redefining Su as Su minus Si; and (7) returning to step (1) if word accuracy has not converged.
-
-
10. A computer-readable medium that stores a program for controlling a computer device to perform the following method to reduce the transcription effort for training an automatic speech recognition module, the method comprising:
-
(1) pre-processing unlabeled examples of transcribed utterances using a computer device; (2) using a lattice output from a speech recognizer, automatically estimating a confidence score for each word associated with a set of selected examples; (3) computing utterance confidence scores based on the estimated confidence score for each word; and (4) selecting utterances to be transcribed using the utterance confidence scores.
-
-
11. An automatic speech recognition module trained using a method of reducing the transcription effort for training an automatic speech recognition module, the method comprising:
-
(1) training acoustic and language models using a small set of transcribed data St; (2) recognizing utterances in a set Su that are candidates for transcription using the acoustic and language models; (3) computing confidence scores of the utterances; (4) selecting k utterances that have the smallest confidence scores from Su and transcribing them into a new set Si; (5) redefining Si as the union of St and Si; (6) redefining Su as Su minus Si; and (7) returning to step (1) if word accuracy has not converged.
-
-
12. An automatic speech recognition module trained using a method of reducing the transcription effort for training an automatic speech recognition module, the method comprising:
-
(1) pre processing unlabeled examples of transcribed utterances using a computer device; (2) using a lattice output from a speech recognizer, automatically estimating a confidence score for each word associated with a set of selected examples; (3) computing utterance confidence scores based on the estimated confidence score for each word; and (4) selecting utterances to be transcribed using the utterance confidence scores.
-
Specification