Method and system of adapting speech recognition models to speaker environment

US 6,003,002 A
Filed: 12/29/1997
Issued: 12/14/1999
Est. Priority Date: 01/02/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of recognizing speech comprising the steps of:

receiving a spoken password utterance for access to a speaker environment;

getting a set of speaker independent(SI) speech recognition models;

determining a mapping sequence between the SI speech recognition models and speech input frames in the spoken password utterance that comprise recognition of the utterance;

determining a transform between the SI speech recognition models and the spoken password utterance using the mapping sequence;

generating speaker adapted (SA) speech recognition models by applying the transform to SI speech recognition models; and

recognizing a nonpassword speech utterance in said speaker environment by applying the SA speech recognition models.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The method and system of adapting speech recognition models to a speaker environment may comprise receiving a spoken password (52) and getting a set of speaker independent (SI) speech recognition models (54). A mapping sequence may be determined for the spoken password (56). Using the mapping sequence, a speaker ID may be identified (58). A transform may be determined (66) between the SI speech recognition models and the spoken password using the mapping sequence. Speaker adapted (SA) speech recognition models may be generated (68) by applying the transform to SI speech recognition models. A speech input may be recognized (70) by applying the SA speech recognition models.

Citations

23 Claims

1. A method of recognizing speech comprising the steps of:
- receiving a spoken password utterance for access to a speaker environment;
  
  getting a set of speaker independent(SI) speech recognition models;
  
  determining a mapping sequence between the SI speech recognition models and speech input frames in the spoken password utterance that comprise recognition of the utterance;
  
  determining a transform between the SI speech recognition models and the spoken password utterance using the mapping sequence;
  
  generating speaker adapted (SA) speech recognition models by applying the transform to SI speech recognition models; and
  
  recognizing a nonpassword speech utterance in said speaker environment by applying the SA speech recognition models.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising the steps of:
    - getting a speaker ID;
      
      getting speaker dependent (SD) speech recognition models using the speaker ID; and
      
      verifying the identity of the speaker by applying the SD speech recognition models to the spoken password utterance.
  - 3. The method of claim 1, wherein the spoken password utterance is a numeric phrase.
  - 4. The method of claim 1, wherein the speech recognition models are Hidden Markov Modeling (HMM) models.
  - 5. The method of claim 1, wherein the mapping sequence is determined by using a check-sum grammar comprising the steps of:
    - converting the spoken password utterance into a set of speech feature vectors; and
      
      determining the mapping sequence by minimizing the difference between the speech feature vectors and the SI speech recognition models, while enforcing the check-sum grammar constraints.
  - 6. The method of claim 1, further comprising the step of generating a SA speech recognition model for each SI speech recognition model.
  - 7. The method of claim 1, wherein the SA speech recognition model is generated when needed to recognize the speech input.
  - 8. The method of claim 1, wherein the step of determining a transform between the SI speech recognition models and the spoken password utterance comprises the step of determining an affine transform for mean vectors of the SI speech recognition models.
  - 9. The method of claim 8, further comprising the step of constraining an affine transformation matrix of the affine transform to be diagonal.
  - 10. The method of claim 8, further comprising the step of constraining an affine transformation matrix of the affine transform to an identity matrix.

11. A method of recognizing speech, comprising the steps of:
- receiving a spoken password utterance for access to a speaker environment;
  
  getting a set of speaker independent (SI) speech recognition models;
  
  determining a mapping sequence between the SI speech recognition models and speech input frames for the spoken password utteranceidentifying a speaker ID from the mapping sequence between the SI speech recognition models and the spoken password utterance;
  
  determining a transform between the SI speech recognition models and the spoken password utterance using the mapping sequence;
  
  generating speaker adapted (SA) speech recognition models by applying the transform to SI speech recognition models; and
  
  recognizing a nonpassword speech utterance in said speaker environment by applying the SA speech recognition models to the nonpassword speech utterance.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The method of claim 11, further comprising the steps of:
    - getting a speaker ID;
      
      getting speaker dependent (SD) speaker recognition models using the speaker ID; and
      
      verifying an identity of the speaker by applying the SD speaker recognition models to the spoken password utterance.
  - 13. The method of claim 11, wherein the spoken password utterance is a numeric phrase.
  - 14. The method of claim 11, wherein the speech recognition models are Hidden Markov Modeling (HMM) models.
  - 15. The method of claim 11, wherein the mapping sequence is determined by using a check-sum grammar, comprising the steps of:
    - converting the spoken password utterance into a set of speech feature vectors; and
      
      determining a mapping sequence by minimizing the difference between the speech feature vectors and the SI speech recognition models, while enforcing the check-sum grammar constraints.
  - 16. The method of claim 11, wherein the step of determining a transform between the SI speech recognition models and the spoken password utterance comprises the step of determining and affine transform for the spoken password utterance.
  - 17. The method of claim 16, further comprising the step of confining an affine transformation matrix of the affine transform to be diagonal.
  - 18. The method of claim 16, further comprising the step of confining an affine transformation matrix of the affine transform to be an identity matrix of the SI speech recognition models.

19. A speech recognition system, comprising:
- a recognition engine having an identification module and an adaption module;
  
  a database having a set of speaker independent (SI) speech recognition models;
  
  the identification module operable to receive a spoken password utterance, determine a mapping sequence of the spoken password utterance in a speaker environment to SI speech recognition models, and identify the speaker from the mapping sequence;
  
  the adaption module operable to determine a transform between the SI speech recognition models and the spoken password utterance using the mapping sequence and to generate a speaker adapted (SA) speech recognition model by applying the transform to SI speech recognition models; and
  
  the recognition engine operable to recognize a nonpassword speech utterance in said speaker environment by applying the SA speech recognition model.
- View Dependent Claims (20)
- - 20. The speech recognition system of claim 19, further comprising:
    - the recognition engine including a verification module; and
      
      the verification module operable to get a speaker ID, get SD speaker recognition models using the speaker ID, and verify an identity of the speaker by applying the SD speech recognition models to the spoken password utterance.

21. A speech recognition system, comprising:
- a recognition engine having an identification module and an adaption module;
  
  a database having a set of speaker independent (SI) speech recognition models;
  
  the identification module operable to receive a spoken password utterance determine a mapping sequence of the spoken password utterance in a speaker environment to SI speech recognition models, and identify the speaker from the mapping sequence;
  
  the adaption module operable to determine a transform between the SI speech recognition models and the spoken password utterance using the mapping sequence and to generate a speaker adapted (SA) speech recognition model by applying the transform to SI speech recognition models; and
  
  the recognition engine operable to recognize a nonpassword speech utterance in said speaker environment by applying the SA speech recognition model to the nonpassword speech utterance.
- View Dependent Claims (22)
- - 22. The speech recognition system of claim 21, further comprising:
    - the recognition engine including a verification module; and
      
      the verification module operable to get a speaker ID, get speaker dependent (SD) speaker recognition models using the speaker ID, and verify an identity of the speaker by applying the SD speech recognition models to the spoken password utterance.

23. A method of recognizing speech comprising the steps of:
- receiving a spoken keyword utterance in a speaker environment;
  
  getting a set of speaker independent (SI) speech recognition models;
  
  determining a mapping sequence between the SI speech recognition models and the speech input frames in the spoken keyword utterance;
  
  determining a transform between the SI speech recognition models and the spoken keyword utterance using the mapping sequence;
  
  generating speaker adapted (SA) speech recognition models by applying the transform to SI speech recognition models; and
  
  recognizing a nonkeyword speech utterance in said speaker environment by applying the SA speech recognition models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Netsch, Lorin P.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US08/999,442
Time in Patent Office

715 Days
Field of Search

704/246-250, 704/251-258, 704/236
US Class Current

704/236
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

Method and system of adapting speech recognition models to speaker environment

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system of adapting speech recognition models to speaker environment

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links