Method for deriving acoustic models for use in speech recognition

US 4,914,703 A
Filed: 12/05/1986
Issued: 04/03/1990
Est. Priority Date: 12/05/1986
Status: Expired due to Fees

First Claim

Patent Images

1. A method of deriving an acoustic model of a first class of speech sounds, which acoustic model is to be compared against a portion of speech to be recognized to determine the likelihood that the portion of speech corresponds to that first class of speech sounds, said method including:

calculating a first statistic of the first class of speech sounds from acoustic data derived from one or more samples of that class of speech sounds;

calculating a second statistic from acoustic data derived from samples of a second class of speech sounds, at least some of which samples correspond to speech sounds which do not belong to the first class of speech sounds for which said acoustic model is being made;

calculating a combined statistic which is a weighted combination of the first and second statistics; and

using said combined statistic as at least a part of said acoustic model of said first class of speech sounds.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides a method of deriving generally improved statistical acoustic model of a first class of speech sounds, given a limited amount of sampling data from that first class. This is done by combining a first statistic calculated from samples of that class of speech sounds with a corresponding second statistic calculated from samples of a second, broader, class of speech sounds. Preferably the second statistic is calculated from many more samples than the first statistic, so it has less sampling error that the first statistic, and preferably the second class is a super-set of the first class, so that the second statistic will provide information about the first class. In one embodiment, the invention combines statistics from the models of a plurality of first classes of speech sounds to reduce the sampling error of such statistics and thus improve the accuracy with which such models can be divided into groups of similar models. The first and second statistics can be measurements of spread, of central tendency, or both. They also can relate to different types of parameters, including spectral parameters and parameters representing the duration of speech sounds.

Citations

24 Claims

1. A method of deriving an acoustic model of a first class of speech sounds, which acoustic model is to be compared against a portion of speech to be recognized to determine the likelihood that the portion of speech corresponds to that first class of speech sounds, said method including:
- calculating a first statistic of the first class of speech sounds from acoustic data derived from one or more samples of that class of speech sounds;
  
  calculating a second statistic from acoustic data derived from samples of a second class of speech sounds, at least some of which samples correspond to speech sounds which do not belong to the first class of speech sounds for which said acoustic model is being made;
  
  calculating a combined statistic which is a weighted combination of the first and second statistics; and
  
  using said combined statistic as at least a part of said acoustic model of said first class of speech sounds.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. A method as described in claim 1, wherein said first statistic is calculated from fewer samples of speech sounds than is said second statistic.
  - 3. A method as described in claim 1, wherein said second class of speech sounds includes the first class of speech sounds as a subset.
  - 4. A method as described in claim 1, wherein said first class of speech sounds correspond to one speech unit, and the second class of speech sounds correspond to a plurality of such speech units, such that the average spectral difference between two samples from said first class is less than the average spectral difference between two samples from said second class.
  - 5. A method as described in claim 4, wherein said first and second statistic are measures of spread of the samples from their respective classes.
  - 6. A method as described in claim 5, wherein the speech samples of said first class of speech sounds are samples from the corresponding portion of one word spoken one or more times and the speech samples of said second class of speech sounds are samples from a plurality of words.
  - 7. A method as described in claim 6 which is used to derive acoustic models for representing clusters of speech classes, wherein:
    - there are a plurality of first classes of speech sounds, each corresponding to a speech unit which includes speech sounds associated with a given part of one of a plurality of different words;
      
      a first statistic is calculated for each of the first classes from acoustic data derived from one or more samples of that class of speech sounds;
      
      the second class of speech sounds encompasses a plurality of speech units corresponding to a plurality of said first classes of speech sounds;
      
      a combined statistic is calculated for use in the acoustic model of each of said first classes which is a weighted combination of the second statistic and of the first statistic for that particular first class;
      
      the acoustic models from the plurality of first classes are then clustered together to divide them into a plurality of groups of relatively similar acoustic models; and
      
      acoustic models are calculated for each cluster.
  - 8. A method as described in claim 1, wherein:
    - the first class of speech sounds includes speech sounds spoken by one or more speakers from a smaller group of speakers than a larger group of speakers by whom speech sounds in the second class of speech sounds are spoken.
  - 9. A method as described in claim 8, wherein both the first and second statistic are measurements of central tendency.
  - 10. A method as described in claim 8, wherein both the first and second statistic are measurements of spread.
  - 11. A method as described in claim 8 for deriving acoustic models of a first class of speech sounds, wherein:
    - said first class of speech sounds corresponds to one speech unit, as that speech unit is spoken by said smaller group of speakers; and
      
      said second class of speech sounds corresponds to the same speech unit as spoken by said larger group of speakers.
  - 12. A method as described in claim 1, wherein:
    - the first and second statistic are both measurements of spread;
      
      the first statistic is derived from fewer samples than said second statistic.
  - 13. A method as described in claim 12, wherein:
    - said method further includes calculating an estimate of the spread of the individual measurements of spread of the first statistic; and
      
      the contribution of the first statistic in the calculation of the combined statistic relative to the contribution of the second statistic is monotonically decreasing function of said estimate of spread.
  - 14. A method as described in claim 13, wherein the estimate of spread is calculated by a method that causes that estimate to vary as an inverse monotonic function of the number of samples from which the first statistic is calculated.
  - 15. A method as described in claim 12, wherein:
    - the second class of speech sounds includes a plurality of sub-classes of speech sounds;
      
      an individual measurement of spread is calculated for each of these sub-classes;
      
      a measurement of spread is calculated for the individual measurements of spread calculated for these sub-classes;
      
      the contribution of the second statistic in the calculation of the combined statistic relative to the contribution of the first statistic is a monotonically decreasing function of the measurement of spread of the individual measurements of spread of the sub-classes of the second class of speech sounds.
  - 16. A method as described in claim 15, wherein the first class is the ith sub-class of the second class, for a given value of i, and wherein the calculation of said combined statistic is calculated according to the formula:
    - ##EQU4## in which;
      
      comb_-- est_-- sigma_i is the combined statistic, which is an estimate of the measurement of spread for the acoustic model of the first class of speech sounds;
      
      est_-- sigma_i is the first statistic, which is an estimate of the measurement of spread derived from the samples of the first class of speech sounds;
      
      est_-- variance_i is the estimate of variance of est_-- sigma_i ;
      
      est_-- prior_-- sigma is the estimated prior sigma, which is an estimate of the measurement of spread for the second class of speech sounds; and
      
      est_-- gamma is the estimated variance of the spreads of the subclass i.
  - 17. A method as described in claim 16, wherein said est_-- variance_i is calculated according to the following formula:
    - space="preserve" listing-type="equation">est.sub.-- variance.sub.i =K[comb.sub.-- est.sub.-- sigma.sub.i ].sup.2 /n.sub.i
      in whichK is a constant and;
      
      n_i is the number of samples from the ith sub-class used to calculate comb_-- est_-- sigma_i.
  - 18. A method as described in claim 16, wherein said est_-- prior_-- sigma is calculated according to the following formula:
    - space="preserve" listing-type="equation">est.sub.-- prior.sub.-- sigma=[1/N](sum.sub.i (n.sub.i *est.sub.-- sigma.sub.i)]
      in whichsum_i means the sum of the expression in the following parentheses over all values of i;
      
      n_i is the number of samples from the ith sub-class used to calculate comb_-- est_-- sigma_i ; and
      
      N is the sum_i (n_i).
  - 19. A method as described in claim 16, wherein said est_-- gamma is calculated according to the following formula:
    - ##EQU5## in which;
      
      sum_i means of the expression in the following parentheses over all values of i;
      
      n_i is the number of samples from the ith sub-class used to calculate comb_-- est_-- sigma_i ;
      
      m is the number of sub-class in the second class of speech sounds for which est_-- prior_-- sigma is calculated;
      
      K is a constant; and
      
      N is the sum_i (n_i).
  - 20. A method as described in claim 1, wherein:
    - the acoustic model has a plurality of dimensionsa separate first statistic is calculated for each of said dimensions from acoustic data derived from one or more samples of the first class of speech sounds;
      
      a separate second statistic is calculated for each of said dimensions from acoustic data derived from samples of a the second class of speech sounds; and
      
      a combined statistic is calculated for each of said dimensions and is used in said acoustic model.
  - 21. A method as described in claim 17, wherein each of a plurality of said dimensions relate to acoustic energy within a given frequency band.
  - 22. A method as described in claim 1, wherein the first and second statistic relate to the duration of speech sounds.

23. A method of deriving an acoustic model of each of a plurality of speech units as spoken by a first group of speakers, which acoustic models are to be compared against a portion of speech to be recognized to determine the likelihood that the portion of speech corresponds to those speech units, said method including;
- having the one or more speakers from said first group speak one or more utterances of one or more vocabulary words;
  
  generating a sequence of acoustic data representing each of such utterances;
  
  time aligning the sequence of acoustic data from the utterances of a vocabulary word by said first group of speakers against an acoustic model of that vocabulary word which is comprised of a sequence of acoustic models representing the speech units normally contained in the word, each of which speech unit models includes a second statistic calculated from acoustic data derived from samples of its corresponding speech unit as spoken by a larger second group of speakers;
  
  calculating a first statistic from the acoustic data time aligned against each of the acoustic models of a given speech unit;
  
  calculating a combined statistic by taking a weighted combination of the first and second statistic for that speech unit; and
  
  using said combined statistic as at least a part of said acoustic model of said speech unit.
- View Dependent Claims (24)
- - 24. A method as in claim 23, wherein:
    - the small group of speakers speak utterances of a first group of vocabulary words and the method is used to calculate a combined statistic for each of a plurality of speech units occurring in the acoustic models of that group of vocabulary words;
      
      the combined statistic calculated for a given speech unit is used in the acoustic representation of vocabulary words in a second group of vocabulary words outside said first group, when the acoustic representations of those words in that second group are compared against portions of speech to be recognized; and
      
      the second statistic used to calculate the combined statistic for a given speech unit is calculated from acoustic data derived from utterances of that speech unit in words outside the first group of vocabulary words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Original Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Inventors
Gillick, Laurence
Primary Examiner(s)
Shoop, Jr., William M.
Assistant Examiner(s)
Young, Brian K.

Application Number

US06/938,545
Time in Patent Office

1,215 Days
Field of Search

381/41, 381/42, 381/43, 381/45-53, 364/513.5
US Class Current

704/245
CPC Class Codes

G10L 15/14 using statistical models, e...

Method for deriving acoustic models for use in speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Method for deriving acoustic models for use in speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links