Rapid adaptation of speech models

US 6,151,575 A
Filed: 10/28/1997
Issued: 11/21/2000
Est. Priority Date: 10/28/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method of generating a source-adapted model for use in speech recognition, the method comprising:

generating a collection of elements from an initial model;

assembling source speech data that corresponds to elements in the collection of elements from a set of source speech data for a particular source associated with the source-adapted model;

generating statistics from the assembled source speech data;

modifying the statistics using an element of the initial model and a smoothing factor that accounts for the relative importance of the element of the initial model and the assembled source speech data;

using the modified statistics in determining a transform that maps between the assembled source speech data and the collection of elements of the initial model; and

producing elements of the source-adapted model from corresponding elements of the initial model by applying the transform to the elements of the initial model;

wherein determining the transform comprises determining a relationship between each element of the initial model in the collection and a portion of the assembled source speech data that corresponds to that element.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A source-adapted model for use in speech recognition is generated by defining a linear relationship between a first element of an initial model and a first element of the source-adapted model. Thereafter, speech data that corresponds to the first element of the initial model is assembled from a set of speech data for a particular source associated with the source-adapted model. A linear transform that maps between the assembled speech data and the first element of the initial model is then determined. Finally, a first element of the source-adapted model is produced from the first element of the initial model using the linear transform.

148 Citations

View as Search Results

25 Claims

1. A method of generating a source-adapted model for use in speech recognition, the method comprising:
- generating a collection of elements from an initial model;
  
  assembling source speech data that corresponds to elements in the collection of elements from a set of source speech data for a particular source associated with the source-adapted model;
  
  generating statistics from the assembled source speech data;
  
  modifying the statistics using an element of the initial model and a smoothing factor that accounts for the relative importance of the element of the initial model and the assembled source speech data;
  
  using the modified statistics in determining a transform that maps between the assembled source speech data and the collection of elements of the initial model; and
  
  producing elements of the source-adapted model from corresponding elements of the initial model by applying the transform to the elements of the initial model;
  
  wherein determining the transform comprises determining a relationship between each element of the initial model in the collection and a portion of the assembled source speech data that corresponds to that element.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 23)
- - 2. The method of claim 1, wherein generating the collection comprises generating a collection of elements associated with similarly-sounding speech units.
  - 3. The method of claim 1, wherein generating the collection comprises generating a collection of elements having similar values.
  - 4. The method of claim 1, wherein generating the collection comprises generating a collection of elements associated with similarly-sounding speech units and having similar values.
  - 5. The method of claim 4, wherein a human operator identifies classes of elements having similarly-sounding speech units and generating the collection comprises using an automatic procedure that employs the identified classes and similarities between elements.
  - 6. The method of claim 1, wherein the statistics comprise an average value of the assembled source speech data for a particular element.
  - 7. The method of claim 6, wherein the statistics comprise a count of speech frames associated with the assembled source speech data for the particular element.
  - 8. The method of claim 1, wherein the smoothing factor accounts for an amount of source speech data available for a particular element.
  - 9. The method of claim 8, wherein the statistics comprise an average value of the assembled source speech data for a particular element, the average value being modified based on the smoothing factor and a value for the element from the initial model.
  - 10. The method of claim 1, wherein:
    - each element of the initial model includes a mean portion and a variance portion, andproducing an element of the source-adapted model from a corresponding element of the initial model comprises applying the transform to the mean portion of the element of the initial model and leaving the variance portion unchanged.
  - 23. The method of claim 1, wherein the transform comprises a linear transform.

11. A method of generating a source-adapted model for use in speech recognition, the method comprising:
- generating a collection of elements from an initial model;
  
  assembling speech data that corresponds to elements in the collection of elements from a set of speech data for a particular source associated with the source-adapted model;
  
  generating statistics from the assembled source speech data;
  
  modifying the statistics using an element of the initial model and a smoothing factor that accounts for an amount of source speech data available for a particular element;
  
  determining a transform that maps between the assembled speech data and the collection of elements of the initial model using the modified statistics; and
  
  producing elements of the source-adapted model from corresponding elements of the initial model by applying the transform to the elements of the initial model.
- View Dependent Claims (12, 24)
- - 12. The method of claim 11, wherein the statistics comprise an average value of the assembled source speech data for a particular element, the average value being modified based on the smoothing factor and a value for the element from the initial model.
  - 24. The method of claim 11, wherein the transform comprises a linear transform.

13. A method of generating a source-adapted model for use in speech recognition, the method comprising:
- generating a collection of elements from the initial model;
  
  assembling speech data that corresponds to elements in the collection of elements from a set of speech data for a particular source associated with the source-adapted model;
  
  generating statistics from the assembled source speech data;
  
  modifying the statistics using an element of the initial model and a smoothing factor that controls the relative importance of the element of the initial model and the assembled source speech data;
  
  using the modified statistics in determining a transform that maps between the assembled speech data and the collection of elements of the initial model; and
  
  producing elements of the source-adapted model from corresponding elements of the initial model by applying the transform to the elements of the initial model;
  
  wherein a human operator identifies classes of related elements and the collection of elements is generated using an automatic procedure that favors including elements from a common class in the collection and penalizes including elements from different classes in the collection.
- View Dependent Claims (25)
- - 25. The method of claim 13, wherein the transform comprises a linear transform.

14. A method of training a speech model for use in speech recognition, the method comprising:
- assembling sets of speech data that correspond to elements of an initial model from a set of speech data for one or more sources, the assembled speech data including multiple items;
  
  calculating representative values for each set of speech data;
  
  modifying a representative value for a set of speech data using a corresponding element of the initial model and a smoothing factor that controls the relative importance of the corresponding element of the initial model and the set of speech data;
  
  determining a relationship between the representative values for each set of speech data and values of the corresponding element of the initial model, where a relationship for a first set of speech data differs from a relationship for a second set of speech data;
  
  modifying each item of the sets of speech data using the relationship for the set to which the item belongs; and
  
  generating elements of the speech model using the modified sets of speech data.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The method of claim 14, wherein the initial model has a structure that differs from a structure of the speech model.
  - 16. The method of claim 14, wherein the initial model comprises a first number of elements and the speech model comprises a second number of elements, the first number being substantially larger than the second number.
  - 17. The method of claim 14, wherein the source comprises a particular speaker.
  - 18. The method of claim 14, wherein:
    - assembling sets of speech data that correspond to elements of an initial model comprises assembling different sets for speech data from different sources;
      
      a relationship for a first set of speech data for a first source differs from a relationship for a second set of speech data for the first source and from a relationship for a third set of speech data for a second source; and
      
      generating an element of the speech model comprises using modified sets of speech data from different sources.
  - 19. The method of claim 14, wherein the relationship for a first set of speech data comprises a linear transformation.
  - 20. The method of claim 14, wherein:
    - determining a relationship comprises determining a transform that maps between the set of speech data and the element of the initial model, and generating an inverse of the transform; and
      
      modifying each element of a set of speech data comprises applying the inverse of the transform to each element of the set of speech data.

21. A method of training a speech model for use in speech recognition, the method comprising:
- assembling speech data that corresponds to a first element of the speech model from a set of speech data for a first source;
  
  determining a transform that maps between the assembled speech data and the first element of the speech model;
  
  generating an inverse of the transform;
  
  modifying the assembled speech data using the inverse of the transform; and
  
  updating the first element of the speech model using the modified assembled speech data,wherein determining the transform comprises modifying the transform using the first element of the speech model and a smoothing factor that controls the relative importance of the first element of the speech model and the assembled speech data.
- View Dependent Claims (22)
- - 22. The method of claim 21, further comprising:
    - from a set of speech data for a second source, assembling speech data that corresponds to the first element of the speech model;
      
      determining a second transform that maps between the assembled speech data for the second source and the first element of the speech model; and
      
      generating an inverse of the second transform;
      
      wherein the step of updating comprises updating the first element of the speech model using the assembled speech data for the first and second sources and the inverses of the first and second transforms.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Inventors
Newman, Michael Jack, Nagesha, Venkatesh, Gillick, Laurence S.
Primary Examiner(s)
Zele, Krista
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US08/958,957
Time in Patent Office

1,120 Days
Field of Search

704/231, 704/236, 704/239, 704/240, 704/247, 704/254, 704/255
US Class Current

704/260
CPC Class Codes

G10L 15/07 to the speaker

Rapid adaptation of speech models

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

148 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Rapid adaptation of speech models

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

148 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links