Method of active learning for automatic speech recognition

US 8,990,084 B2
Filed: 02/10/2014
Issued: 03/24/2015
Est. Priority Date: 07/29/2002
Status: Expired due to Term

First Claim

Patent Images

1. A method comprising:

recognizing untranscribed audio utterances that are candidates for transcription using a trained acoustic model and a trained language model;

computing confidence scores associated with an accuracy of speech recognition of the untranscribed audio utterances;

transcribing an audio utterance from the untranscribed audio utterances, the audio utterance having a lowest confidence score in the confidence scores; and

removing the audio utterance which was transcribed from a database comprising the untranscribed audio utterances.

View all claims

22 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor-intensive and time-consuming. The present invention is an iterative method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples and then selecting the most informative ones with respect to a given cost function for a human to label. The method comprises automatically estimating a confidence score for each word of the utterance and exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. An utterance confidence score is computed based on these word confidence scores; then the utterances are selectively sampled to be transcribed using the utterance confidence scores.

Citations

17 Claims

1. A method comprising:
- recognizing untranscribed audio utterances that are candidates for transcription using a trained acoustic model and a trained language model;
  
  computing confidence scores associated with an accuracy of speech recognition of the untranscribed audio utterances;
  
  transcribing an audio utterance from the untranscribed audio utterances, the audio utterance having a lowest confidence score in the confidence scores; and
  
  removing the audio utterance which was transcribed from a database comprising the untranscribed audio utterances.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising iteratively repeating the transcribing with additional audio utterances from the untranscribed audio utterances until a word accuracy converges.
  - 3. The method of claim 1, wherein the audio utterance comprises a plurality of audio works.
  - 4. The method of claim 1, further comprising leaving out from consideration for transcription utterances with confidence scores indicating that the untranscribed audio utterances were correctly recognized.
  - 5. The method of claim 1, wherein word posterior probability estimates are used for the confidence scores associated with the untranscribed audio utterances.
  - 6. The method of claim 5, wherein a word is considered to be correctly recognized when an associated posterior probability is higher than a second threshold value.

7. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  recognizing untranscribed audio utterances that are candidates for transcription using a trained acoustic model and a trained language model;
  
  computing confidence scores associated with an accuracy of speech recognition of the untranscribed audio utterances;
  
  transcribing an audio utterance from the untranscribed audio utterances, the audio utterance having a lowest confidence score in the confidence scores; and
  
  removing the audio utterance which was transcribed from a database comprising the untranscribed audio utterances.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, the computer-readable storage medium having additional instructions stored which result in operations comprising iteratively repeating the transcribing with additional audio utterances from the untranscribed audio utterances until a word accuracy converges.
  - 9. The system of claim 7, wherein the audio utterance comprises a plurality of audio works.
  - 10. The system of claim 7, the computer-readable storage medium having additional instructions stored which result in operations comprising leaving out from consideration for transcription utterances with confidence scores indicating that the untranscribed audio utterances were correctly recognized.
  - 11. The system of claim 7, wherein word posterior probability estimates are used for the confidence scores associated with the untranscribed audio utterances.
  - 12. The system of claim 11, wherein a word is considered to be correctly recognized when an associated posterior probability is higher than a second threshold value.

13. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- recognizing untranscribed audio utterances that are candidates for transcription using a trained acoustic model and a trained language model;
  
  computing confidence scores associated with an accuracy of speech recognition of the untranscribed audio utterances;
  
  transcribing an audio utterance from the untranscribed audio utterances, the audio utterance having a lowest confidence score in the confidence scores; and
  
  removing the audio utterance which was transcribed from a database comprising the untranscribed audio utterances.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The computer-readable storage device of claim 13, having additional instructions stored which result in operations comprising iteratively repeating the transcribing with additional audio utterances from the untranscribed audio utterances until a word accuracy converges.
  - 15. The computer-readable storage device of claim 13, wherein the audio utterance comprises a plurality of audio works.
  - 16. The computer-readable storage device of claim 13, having additional instructions stored which result in operations comprising leaving out from consideration for transcription utterances with confidence scores indicating that the untranscribed audio utterances were correctly recognized.
  - 17. The computer-readable storage device of claim 16, wherein word posterior probability estimates are used for the confidence scores associated with the untranscribed audio utterances.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interactions, LLC
Original Assignee
Interactions, LLC
Inventors
Gorin, Allen Louis, Hakkani-Tur, Dilek Z., Riccardi, Guiseppe
Primary Examiner(s)
Armstrong, Angela A

Application Number

US14/176,439
Publication Number

US 20140156275A1
Time in Patent Office

407 Days
Field of Search

704/231, 704/235, 704/240, 704/243, 704/244, 704/251, 704/255, 704/257
US Class Current

704/243
CPC Class Codes

G10L 15/063 Training

Method of active learning for automatic speech recognition

First Claim

22 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method of active learning for automatic speech recognition

First Claim

22 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links