Method and apparatus for providing a speaker adapted speech recognition model set

US 20040162728A1
Filed: 02/18/2003
Published: 08/19/2004
Est. Priority Date: 02/18/2003
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

providing a speaker independent speech recognition model set to be used when recognizing speech;

providing a speaker independent speech feature space model that is at least partially different from the speaker independent speech recognition model;

receiving speech from a particular speaker;

using the speech to provide a corresponding speaker dependent speech feature space model;

using the speaker independent speech feature space model and the speaker dependent speech feature space model to provide at least one resultant set of alignment indices;

using the at least one resultant set of alignment indices to modify the speaker independent speech recognition model to provide a speaker adapted speech recognition model set.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech feature vectors (10) are provided and utilized to develop a corresponding estimated speaker dependent speech feature space model (20) (in one embodiment, it is not necessary that this model (20) have defined correlations with the verbal content of the represented speech itself). A model alignment unit (21) then contrasts this model (20) against the contents of a speaker independent speech feature space model (24) to provide alignment indices to a transformation estimation unit (23). In one embodiment, these alignment indices are based, as least in part, upon a measure of the differences between likelihoods of occurrence for the elements that comprise the constituency of these models. The transformation estimation unit (23) utilizes these alignment indices to provide transformation parameters to a model transformation unit (25) that uses such parameters to transform a speaker independent speech recognition model set (26) and yield a resultant speaker adapted speech recognition model set (27).

68 Citations

View as Search Results

24 Claims

1. A method, comprising:
- providing a speaker independent speech recognition model set to be used when recognizing speech;
  
  providing a speaker independent speech feature space model that is at least partially different from the speaker independent speech recognition model;
  
  receiving speech from a particular speaker;
  
  using the speech to provide a corresponding speaker dependent speech feature space model;
  
  using the speaker independent speech feature space model and the speaker dependent speech feature space model to provide at least one resultant set of alignment indices;
  
  using the at least one resultant set of alignment indices to modify the speaker independent speech recognition model to provide a speaker adapted speech recognition model set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1 wherein providing a speaker independent speech feature space model includes providing a speaker independent acoustic feature space model.
  - 3. The method of claim 1 wherein providing a corresponding speaker dependent speech feature space model includes providing a corresponding speaker dependent acoustic feature space model.
  - 4. The method of claim 1 wherein using the speech to provide a corresponding speaker dependent speech feature space model includes using the speech to determine speech feature vectors that are used to at least estimate the speaker dependent speech feature space model.
  - 5. The method of claim 1 wherein using the speech to provide a corresponding speaker dependent speech feature space model includes:
    - providing at least one threshold value;
      
      using an initially received speech feature vector as an initial estimate of a mean of a first speech feature class;
      
      comparing a subsequently received speech feature vector with the mean to provide a comparison result;
      
      when the comparison result corresponds to the at least one threshold value in a first way, using the subsequently received speech feature vector to update the mean for the first speech feature class;
      
      when the comparison results corresponds to the at least one threshold value in a second way, which second way is different from the first way, using the subsequently received speech feature vector to characterize a different speech feature class.
  - 6. The method of claim 1 wherein using the speaker independent speech feature space model and the speaker dependent speech feature space model to provide at least one resultant set of alignment indices includes determining correspondences between speech feature classes of the speaker independent speech feature space model and the speaker dependent speech feature space model.
  - 7. The method of claim 6 wherein using the speaker independent speech feature space model and the speaker dependent speech feature space model to provide at least one resultant set of alignment indices further includes determining an alignment between the speech feature classes of the speaker independent speech feature space model and the speech feature classes of the speaker dependent speech feature space model, which alignment meets at least a first criteria.
  - 8. The method of claim 7 wherein the first criteria includes a measure of a difference between likelihoods of occurrence for at least some of the speech feature classes of the speaker independent speech feature space model and likelihoods of occurrence for at least some of the speech feature classes of the speaker independent speech feature space model.
  - 9. The method of claim 8 wherein the measure of a difference comprises a measure of a difference between a speech feature class n-gram probability for the speaker independent speech feature space model and a speech feature class n-gram probability for the speaker dependent speech feature space model.
  - 10. The method of claim 1 wherein using the speaker independent speech feature space model and the speaker dependent speech feature space model to provide at least one resultant set of alignment indices includes determining at least one mean for the speaker dependent speech feature space model.
  - 11. The method of claim 10 wherein using the at least one resultant set of alignment indices to modify the speaker independent speech recognition model to provide a speaker adapted speech recognition model set includes using the at least one mean to modify at least one corresponding mean for the speaker independent speech recognition model.

12. A method comprising:
- providing a speaker independent speech recognition model set to be used when recognizing speech;
  
  providing a speaker independent speech feature space model;
  
  receiving speech from a particular speaker;
  
  using the speech to provide a corresponding speaker dependent speech feature space model;
  
  using the speaker independent speech feature space model and the speaker dependent speech feature space model to provide at least one resultant set of alignment indices as a function, at least in part, of a likelihood that a given speaker dependent speech feature space model will occur;
  
  using the at least one resultant set of alignment indices to modify the speaker independent speech recognition model to provide a speaker adapted speech recognition model set.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The method of claim 12 wherein providing a speaker independent speech feature space model includes providing a speaker independent acoustic feature space model.
  - 14. The method of claim 12 wherein providing a corresponding speaker dependent speech feature space model includes providing a corresponding speaker dependent acoustic feature space model.
  - 15. The method of claim 12 wherein using the speech to provide a corresponding speaker dependent speech feature space model includes using the speech to determine speech feature vectors that are used to at least estimate the speaker dependent speech feature space model.
  - 16. The method of claim 12 wherein using the speech to provide a corresponding speaker dependent speech feature space model includes:
    - providing at least one threshold value;
      
      using an initially received speech feature vector as an initial estimate of a mean of a first speech feature class;
      
      comparing a subsequently received speech feature vector with the mean to provide a comparison result;
      
      when the comparison result corresponds to the at least one threshold value in a first way, using the subsequently received speech feature vector to update the mean for the first speech feature class;
      
      when the comparison results corresponds to the at least one threshold value in a second way, which second way is different from the first way, using the subsequently received speech feature vector to characterize a different speech feature class.
  - 17. The method of claim 12 wherein using the speaker independent speech feature space model and the speaker dependent speech feature space model to provide at least one resultant set of alignment indices as a function, at least in part, of a likelihood that a given speaker dependent speech feature space model will occur further includes determining an alignment between the speech feature classes of the speaker independent speech feature space model and the speech feature classes of the speaker dependent speech feature space model, which alignment meets at least a first criteria.
  - 18. The method of claim 17 wherein the first criteria includes a measure of a difference between likelihoods of occurrence for at least some of the speech feature classes of the speaker independent speech feature space model and likelihoods of occurrence for at least some of the speech feature classes of the speaker independent speech feature space model.
  - 19. The method of claim 18 wherein the measure of a differences comprises a measure of a difference between a speech feature class n-gram probability for the speaker independent speech feature space model and a speech feature class n-gram probability for the speaker dependent speech feature space model.
  - 20. The method of claim 12 wherein using the speaker independent speech feature space model and the speaker dependent speech feature space model to provide at least one resultant set of alignment indices includes determining at least one mean for the speaker dependent speech feature space model.
  - 21. The method of claim 20 wherein using the at least one resultant set of alignment indices to modify the speaker independent speech recognition model to provide a speaker adapted speech recognition model set includes using the at least one mean to modify at least one corresponding mean for the speaker independent speech recognition model.

22. An apparatus comprising:
- a speech feature vector input;
  
  a speaker dependent speech feature space model estimation unit that is operably coupled to the speech feature vector input and having an output providing speaker dependent acoustic feature space model information;
  
  speaker independent acoustic feature space model information;
  
  a speech feature model alignment unit responsive to the speaker dependent acoustic feature space model information and the speaker independent acoustic feature space model information and having an output providing model alignment indices that correspond to differences between speaker dependent feature space models and speaker independent feature space models that correspond to one another as a function, at least in part, of a probability of occurrence of each such model;
  
  a transformation estimation unit responsive to the model alignment indices and having an output providing model transformation parameters;
  
  a model transformation unit responsive to the model transformation parameters and to a speaker independent speech recognition model set and having an output providing a speaker adapted speech recognition model set.
- View Dependent Claims (23, 24)
- - 23. The apparatus of claim 22 wherein the speaker independent acoustic feature space model information differs in at least some respect from the speaker independent speech recognition model set.
  - 24. The apparatus of claim 22 wherein the output of the speech feature space model alignment unit provides model alignment indices that correspond to differences between speaker dependent feature space models and speaker independent feature space models that further correspond to one another as a function, at least in part, of n-gram probability information for classes that comprise each of the speaker dependent and speaker independent feature space models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Holter, Trym, Epps, Julien, Thomson, Mark

Granted Patent

US 7,340,396 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/255
CPC Class Codes

G10L 15/07 to the speaker

Method and apparatus for providing a speaker adapted speech recognition model set

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

68 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for providing a speaker adapted speech recognition model set

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

68 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links