Method for Adapting a Codebook for Speech Recognition

US 20100138222A1
Filed: 11/20/2009
Published: 06/03/2010
Est. Priority Date: 11/21/2008
Status: Active Grant

First Claim

Patent Images

1. A method for adapting a codebook for speech recognition, wherein the codebook is from a set of codebooks comprising a speaker-independent codebook and at least one speaker-dependent codebook, each codebook including a set of Gaussian densities, each Gaussian density being parameterized by a mean vector and a covariance matrix, comprising:

(a) receiving a speech input;

(b) determining a feature vector based on the received speech input;

(c) for each of the Gaussian densities, estimating a first mean vector using an expectation process and taking into account the determined feature vector;

(d) for each of the Gaussian densities, estimating a second mean vector using an Eigenvoice adaptation and taking into account the determined feature vector; and

(e) for each of the Gaussian densities, setting its mean vector to a convex combination of the first and the second mean vector.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for adapting a codebook for speech recognition, wherein the codebook is from a set of codebooks comprising a speaker-independent codebook and at least one speaker-dependent codebook is disclosed. A speech input is received and a feature vector based on the received speech input is determined. For each of the Gaussian densities, a first mean vector is estimated using an expectation process and taking into account the determined feature vector. For each of the Gaussian densities, a second mean vector using an Eigenvoice adaptation is determined taking into account the determined feature vector. For each of the Gaussian densities, the mean vector is set to a convex combination of the first and the second mean vector. Thus, this process allows for adaptation during operation and does not require a lengthy training phase.

27 Citations

View as Search Results

27 Claims

1. A method for adapting a codebook for speech recognition, wherein the codebook is from a set of codebooks comprising a speaker-independent codebook and at least one speaker-dependent codebook, each codebook including a set of Gaussian densities, each Gaussian density being parameterized by a mean vector and a covariance matrix, comprising:
- (a) receiving a speech input;
  
  (b) determining a feature vector based on the received speech input;
  
  (c) for each of the Gaussian densities, estimating a first mean vector using an expectation process and taking into account the determined feature vector;
  
  (d) for each of the Gaussian densities, estimating a second mean vector using an Eigenvoice adaptation and taking into account the determined feature vector; and
  
  (e) for each of the Gaussian densities, setting its mean vector to a convex combination of the first and the second mean vector.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method according to claim 1, wherein the expectation process is a maximum likelihood linear regression process or a maximum a posteriori process.
  - 3. The method according to claim 1, wherein the coefficient of the convex combination is a function of the number of feature vectors assigned to the respective Gaussian density.
  - 4. The method according to claim 1, further comprising:
    - performing speech recognition on the feature vector, wherein a confidence measure is determined with respect to the recognized feature vector, and wherein the coefficient of the convex combination is a function of the confidence measure.
  - 5. A method according to claim 1, wherein steps (c) to (e) are preceded by the step of selecting the codebook from a set of codebooks according to a predetermined criterion.
  - 6. A according to claim 5, wherein selecting further comprises:
    - identifying a speaker corresponding to the speech input and selecting the codebook corresponding to the identified speaker.
  - 7. A method according to claim 5, wherein selecting comprises:
    - creating a new speaker-dependent codebook if the speaker corresponding to the speech input is not identified or if the set of codebooks does not contain a codebook corresponding to the identified speaker.
  - 8. A method according to claim 5, wherein selecting comprises:
    - determining a score for assigning a sequence of feature vectors to each of the codebooks, andselecting the codebook based on the determined scores.
  - 9. A method according to claim 8, wherein determining a score comprises:
    - selecting the classes from the speaker-independent codebook to which a number of feature vectors above a predetermined threshold is assigned;
      
      selecting the classes from the at least one speaker-dependent codebook that correspond to the selected classes from the speaker-independent codebook, and for each of the codebooks, determining a score based on the selected classes only.
  - 10. A method according to claim 5, further comprising:
    - creating a new speaker-dependent codebook if the selected codebook based on the determined scores is the speaker-independent codebook.
  - 11. A method according to claim 5, wherein selecting is performed using a Viterbi process.
  - 12. A method according to claim 1, wherein (b) comprises processing the feature vector to reduce distortions to obtain a corrected feature vector.
  - 13. A method according to claim 12, wherein processing is based on a conditional Gaussian Mixture Model obtained via a minimum mean square error estimate.

14. A computer program product including a tangible computer-readable medium having computer code thereon for adapting a codebook for speech recognition, wherein the codebook is from a set of codebooks comprising a speaker-independent codebook and at least one speaker-dependent codebook, each codebook a set of Gaussian densities, the computer code comprising:
- (a) computer code for receiving a speech input;
  
  (b) computer code for determining a feature vector based on the received speech input;
  
  (c) computer code, for each of the Gaussian densities, for estimating a first mean vector using an expectation process and taking into account the determined feature vector;
  
  (d) computer code, for each of the Gaussian densities, for estimating a second mean vector using an Eigenvoice adaptation and taking into account the determined feature vector; and
  
  (e) computer code, for each of the Gaussian densities, for setting its mean vector to a convex combination of the first and the second mean vector.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The computer program product according to claim 14, wherein the computer code for the expectation process is a maximum likelihood linear regression or a maximum a posteriori process.
  - 16. The computer program product according to claim 14, wherein the coefficient of the convex combination is a function of the number of feature vectors assigned to the respective Gaussian density.
  - 17. The computer program product according to claim 14, further comprising:
    - computer code for performing speech recognition on the feature vector, wherein a confidence measure is determined with respect to the recognized feature vector, and wherein the coefficient of the convex combination is a function of the confidence measure.
  - 18. A computer program product according to claim 14, wherein the computer code for (c) to (e) are preceded by computer code for selecting the codebook from a set of codebooks according to a predetermined criterion.
  - 19. A computer program product according to claim 18, wherein the computer code for selecting comprises identifying a speaker corresponding to the speech input and selecting the codebook corresponding to the identified speaker.
  - 20. A computer program product according to claim 18, wherein the computer code for selecting comprises creating a new speaker-dependent codebook if the speaker corresponding to the speech input is not identified or if the set of codebooks does not contain a codebook corresponding to the identified speaker.
  - 21. A computer program product according to claim 18, wherein the computer code for selecting comprises:
    - computer code for determining a score for assigning a sequence of feature vectors to each of the codebooks, andcomputer code for selecting the codebook based on the determined scores.
  - 22. A computer program product according to claim 21, wherein the computer code for determining a score comprises:
    - computer code for selecting the classes from the speaker-independent codebook to which a number of feature vectors above a predetermined threshold is assigned;
      
      computer code for selecting the classes from the at least one speaker-dependent codebook that correspond to the selected classes from the speaker-independent codebook, andcomputer code, for each of the codebooks, for determining a score based on the selected classes only.
  - 23. A computer program product according to claim 18, further comprising:
    - computer code for creating a new speaker-dependent codebook if the selected codebook based on the determined scores is the speaker-independent codebook.
  - 24. A computer program product according to claim 18, wherein the computer code for selecting is performed using a Viterbi process.
  - 25. A computer program product according to claim 14, wherein the computer code in (b) comprises computer code for processing the feature vector to reduce distortions to obtain a corrected feature vector.
  - 26. A computer program product according to claim 25, wherein the computer code for processing is based on a conditional Gaussian Mixture Model obtained via a minimum mean square error estimate.

27. An apparatus for adapting a codebook for speech recognition, wherein the codebook includes a set of Gaussian densities, the apparatus comprising:
- a receiver for receiving a speech input;
  
  a feature vector module for determining a feature vector based on the received speech input;
  
  a first estimation module for estimating, for each of the Gaussian densities, a first mean vector using an expectation maximization process and taking into account the determined feature vector;
  
  a second estimation module for estimating, for each of the Gaussian densities, a second mean vector using an Eigenvoice adaptation and taking into account the determined feature vector; and
  
  an adapter module, for each of the Gaussian densities, setting its mean vector to a convex combination of the first and the second mean vector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Herbig, Tobias, Gerl, Franz

Granted Patent

US 8,346,551 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/243
CPC Class Codes

G10L 15/065 Adaptation

Method for Adapting a Codebook for Speech Recognition

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

27 Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Method for Adapting a Codebook for Speech Recognition

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links