Adaptation of speech models in speech recognition

US 20040243412A1
Filed: 05/29/2003
Published: 12/02/2004
Est. Priority Date: 05/29/2003
Status: Abandoned Application

First Claim

Patent Images

1. A computer system comprising:

(a) a database of speech models;

(b) a speech recognition (SR) engine adapted to compare user utterances to the database of speech models to recognize the user utterances;

(c) an adaptation module adapted to modify the database of speech models based on a set of user utterances corresponding to a set of known inputs;

(d) a pronunciation evaluation module adapted to characterize user utterances relative to corresponding speech models in the database; and

(e) a sequence generator adapted to generate the set of known inputs used by the adaptation module to modify the database of speech models, wherein the sequence generator automatically selects at least a subset of the known inputs based on the characterization of previous user utterances by the pronunciation evaluation module.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-based automatic speech recognition (ASR) system generates a sequence of text material used to train the ASR system. The system compares the sequence of text material to inputs corresponding to a user'"'"'s speech utterances of that text material in order to update the speech models (e.g., phoneme templates) used during normal ASR processing. The ASR system is able to generate a user-dependent sequence of text material for adapting the speech models, where at least some of the text material is based on the evaluation of previous user utterances. In this way, the system can be trained more efficiently by concentrating on particular speech models that are more problematic than others for the particular user (or group of users).

181 Citations

20 Claims

1. A computer system comprising:
- (a) a database of speech models;
  
  (b) a speech recognition (SR) engine adapted to compare user utterances to the database of speech models to recognize the user utterances;
  
  (c) an adaptation module adapted to modify the database of speech models based on a set of user utterances corresponding to a set of known inputs;
  
  (d) a pronunciation evaluation module adapted to characterize user utterances relative to corresponding speech models in the database; and
  
  (e) a sequence generator adapted to generate the set of known inputs used by the adaptation module to modify the database of speech models, wherein the sequence generator automatically selects at least a subset of the known inputs based on the characterization of previous user utterances by the pronunciation evaluation module.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The invention of claim 1, wherein the speech models are phoneme templates in a parametric domain.
  - 3. The invention of claim 1, wherein, using the database of speech models, the SR engine generates and compares parametric representations of the set of known inputs to parametric representations of the user utterances to generate segmentation results for use by the adaptation module and the pronunciation evaluation module.
  - 4. The invention of claim 1, further comprising a score management module adapted to collect results from the pronunciation evaluation module and identify one or more problem phonemes, wherein the sequence generator selects additional known inputs for the set of known inputs based on the one or more problem phonemes.
  - 5. The invention of claim 4, wherein the score management module thresholds phoneme pronunciation scores from the pronunciation evaluation module to identify the one or more problem phonemes.
  - 6. The invention of claim 1, wherein the generation of known inputs for adaptation of speech models in the database automatically terminates when the system determines that all of the speech models are sufficiently adapted.
  - 7. The invention of claim 1, wherein:
    - the speech models are phoneme templates in a parametric domain;
      
      using the database of speech models, the SR engine generates and compares parametric representations of the set of known inputs to parametric representations of the user utterances to generate segmentation results for use by the adaptation module and the pronunciation evaluation module;
      
      further comprising a score management module adapted to collect results from the pronunciation evaluation module and identify one or more problem phonemes, wherein;
      
      the sequence generator selects additional known inputs for the set of known inputs based on the one or more problem phonemes; and
      
      the score management module thresholds phoneme pronunciation scores from the pronunciation evaluation module to identify the one or more problem phonemes; and
      
      the generation of known inputs for adaptation of speech models in the database automatically terminates when the system determines that all of the speech models are sufficiently adapted.

8. A computer-based method for training a computer application having a speech recognition (SR) engine adapted to compare user utterances to a database of speech models to recognize the user utterances, the method comprising:
- generating a set of known inputs;
  
  modifying the database of speech models based on a set of user utterances corresponding to the set of known inputs; and
  
  characterizing user utterances relative to corresponding speech models in the database, wherein at least a subset of the known inputs are automatically selected based on the characterization of previous user utterances.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The invention of claim 8, wherein the speech models are phoneme templates in a parametric domain.
  - 10. The invention of claim 8, wherein, using the database of speech models, the SR engine generates and compares parametric representations of the set of known inputs to parametric representations of the user utterances to generate segmentation results for use in modifying the database and characterizing the user utterances.
  - 11. The invention of claim 8, further comprising collecting results from the pronunciation evaluation module and identifying one or more problem phonemes, wherein additional known inputs are selected for the set of known inputs based on the one or more problem phonemes.
  - 12. The invention of claim 11, wherein phoneme pronunciation scores are thresholded to identify the one or more problem phonemes.
  - 13. The invention of claim 8, wherein the generation of known inputs for adaptation of speech models in the database automatically terminates when it is determined that all of the speech models are sufficiently adapted.
  - 14. The invention of claim 8, wherein:
    - the speech models are phoneme templates in a parametric domain;
      
      using the database of speech models, the SR engine generates and compares parametric representations of the set of known inputs to parametric representations of the user utterances to generate segmentation results for use in modifying the database and characterizing the user utterances;
      
      further comprising collecting results from the pronunciation evaluation module and identifying one or more problem phonemes, wherein;
      
      additional known inputs are selected for the set of known inputs based on the one or more problem phonemes; and
      
      phoneme pronunciation scores are thresholded to identify the one or more problem phonemes; and
      
      the generation of known inputs for adaptation of speech models in the database automatically terminates when it is determined that all of the speech models are sufficiently adapted.

15. A machine-readable medium, having encoded thereon program code, wherein, when the program code is executed by a machine, the machine implements a method for training a computer application having a speech recognition (SR) engine adapted to compare user utterances to a database of speech models to recognize the user utterances, the method comprising:
- generating a set of known inputs;
  
  modifying the database of speech models based on a set of user utterances corresponding to the set of known inputs; and
  
  evaluating the user utterances, wherein at least a subset of the known inputs are automatically selected based on the evaluation of previous user utterances.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The invention of claim 15, wherein, using the database of speech models, the SR engine generates and compares parametric representations of the set of known inputs to parametric representations of the user utterances to generate segmentation results for use in modifying the database and characterizing the user utterances.
  - 17. The invention of claim 15, further comprising collecting results from the pronunciation evaluation module and identifying one or more problem phonemes, wherein additional known inputs are selected for the set of known inputs based on the one or more problem phonemes.
  - 18. The invention of claim 17, wherein phoneme pronunciation scores are thresholded to identify the one or more problem phonemes.
  - 19. The invention of claim 15, wherein the generation of known inputs for adaptation of speech models in the database automatically terminates when it is determined that all of the speech models are sufficiently adapted.
  - 20. The invention of claim 15, wherein:
    - the speech models are phoneme templates in a parametric domain;
      
      using the database of speech models, the SR engine generates and compares parametric representations of the set of known inputs to parametric representations of the user utterances to generate segmentation results for use in modifying the database and characterizing the user utterances;
      
      further comprising collecting results from the pronunciation evaluation module and identifying one or more problem phonemes, wherein;
      
      additional known inputs are selected for the set of known inputs based on the one or more problem phonemes; and
      
      phoneme pronunciation scores are thresholded to identify the one or more problem phonemes; and
      
      the generation of known inputs for adaptation of speech models in the database automatically terminates when it is determined that all of the speech models are sufficiently adapted.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Gupta, Sunil K., Raghavan, Prabhu

Application Number

US10/447,906
Publication Number

US 20040243412A1
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 15/07 to the speaker

Adaptation of speech models in speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

181 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Adaptation of speech models in speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

181 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others