Speech recognition apparatus

US 4,038,503 A
Filed: 12/29/1975
Issued: 07/26/1977
Est. Priority Date: 12/29/1975
Status: Expired due to Term

First Claim

Patent Images

1. In a speech analysis system in which an audio signal is spectrum analyzed to determine the behavior of formant resonances over an interval of time, a frequency compensation method comprising:

repeatedly within said interval, evaluating a set of parameters determining the short-term power spectrum of said audio signal in a subinterval within the said interval, thereby to generate a sequence of short-term power spectra;

for each parameter in the set, determining the maximum value of the parameter occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval;

smoothing the peak spectrum by averaging each maximum value with values from said set of maximum values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the typical frequency separation between formant frequencies; and

for each short-term power spectrum in said sequence of spectra, dividing the value for each parameter in the set by the corresponding smoothed maximum value in the smoothed peak spectrum, thereby to generate over said interval a sequence of frequency band equalized spectra corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the spectrum.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In the speech recognition apparatus disclosed herein, an audio signal is digitized and a succession of short-term power spectra are generated over a time interval corresponding to a spoken word. The short-term power spectra are frequency band equalized as a function of the peak amplitude occurring in each frequency band over the word interval. The changes in amplitude in each frequency band subjective time and then a limited number of frequency band equalized spectra are selected as representing equal intervals of subjective time so as to suppress variations in rate of articulation. The selected spectra are then non-linearly sealed in amplitude and transformed so as to maximize the separation between phonetically different sounds. By means of a maximum-likelihood method, the transformed selected spectra are compared with a data base representing a vocabulary to be recognized.

Citations

10 Claims

1. In a speech analysis system in which an audio signal is spectrum analyzed to determine the behavior of formant resonances over an interval of time, a frequency compensation method comprising:
- repeatedly within said interval, evaluating a set of parameters determining the short-term power spectrum of said audio signal in a subinterval within the said interval, thereby to generate a sequence of short-term power spectra;
  
  for each parameter in the set, determining the maximum value of the parameter occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval;
  
  smoothing the peak spectrum by averaging each maximum value with values from said set of maximum values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the typical frequency separation between formant frequencies; and
  
  for each short-term power spectrum in said sequence of spectra, dividing the value for each parameter in the set by the corresponding smoothed maximum value in the smoothed peak spectrum, thereby to generate over said interval a sequence of frequency band equalized spectra corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the spectrum.

2. In a speech analysis system in which an audio signal is analyzed over an interval corresponding to a spoken word to determine the behavior of formant resonances relative to a sequence of reference vectors representing a preselected word, a method of selecting sample points within said interval comprising:
- repeatedly over said interval, evaluating a set of parameters corresponding to the energy spectrum of said signal at that time, each such set of values being characterizable as a vector having a coordinate corresponding to each parameter;
  
  summing over the said set of parameters the magnitudes of the values of the changes that occur between successive evaluations of each parameter, thereby to obtain a value corresponding to the arc length increment traversed by the multi-coordinate vector during the subinterval between successive evaluations;
  
  accumulating the arc length increments over successive subintervals so as to obtain a sequence of arc lengths throughout the said interval and a total arc length for the said interval;
  
  dividing the total arc length into a sequence of equal length segments corresponding in number to the number of vectors in the sequence of reference vectors;
  
  separating said sequence of arc lengths into groups, the cumulative arc length for each group being substantially equal to said equal length segments; and
  
  for each segment, selecting a set of parameter values defining a representative vector from the vectors associated with the corresponding group of arc lengths and comparing the selected set with the parameter values defining the corresponding recognition vector, the several comparisons so performed being indicative of the match between the audio signal and the speech corresponding to the recognition vectors.
- View Dependent Claims (3)
- - 3. A speech analysis system as set forth in claim 2 wherein the magnitudes of the changes that occur between successive evaluations of each parameter are multiplied by a respective predetermined weighting factor prior to summing the set of parameters, thereby to emphasize the importance of changes in certain of the parameters and to de-emphasize changes in other parameters.

4. In a speech analysis system in which an audio signal is analyzed over an interval corresponding to a spoken word to determine the behavior of formant resonances relative to a sequence of reference vectors representing a preselected word, a method of obtaining and selecting sample points within said interval comprising:
- repeatedly within said interval, evaluating a set of parameters determining the short-term power spectrum of said audio signal in a subinterval within the said interval, thereby to generate a sequence of shot-term power spectra;
  
  for each parameter in the set, determining the maximum value of the parameter occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval;
  
  smoothing the peak spectrum by averaging each maximum value with values from said set of maximum values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the typical frequency separation between formant frequencies;
  
  for each short-term power spectrum in said sequence of spectra, dividing the value for each parameter in the set by the corresponding smoothed maximum value in the smoothed peak spectrum, thereby to generate over said interval a sequence of frequency band equalized spectra corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the spectrum, each such set of equalized parameters being characterizable as a vector having a coordinate corresponding to each parameter;
  
  summing over the said set of equalized parameters the magnitudes of the values of the changes that occur between successive evaluations of each equalized parameter, thereby to obtain a value corresponding to the arc length increment traversed by the multi-coordinate vector during the subinterval between successive evaluations;
  
  accumulating the arc length increments over successive subintervals so as to obtain a sequence of arc lengths throughout the said interval and a total arc length for the said interval;
  
  dividing the total arc length into a sequence of equal length segments corresponding in number to the number of vectors in the sequence of reference vectors;
  
  separating said sequences of arc lengths into groups, the cumulative arc length for each group being substantially equal to said equal length segments; and
  
  for each segment, selecting a set of equalized parameter values defining a representative vector from the vectors associated with the corresponding group of arc lengths and comparing the selected set with the parameter values defining the corresponding reference vector, the several comparisons so performed being indicative of the match between the audio signal and the speech corresponding to the reference vectors.

5. In a speech analysis system, a method of enhancing the information content of the spectrum of an audio signal representing speech, said method comprising:
- generating a set of values S(f) corresponding to the energy spectrum of said signal, each value representing the energy in a corresponding frequency band f;
  
  generating a value A corresponding to the average of said set of N values, where ##EQU8## and F_o represents the width of each frequency band; and
  
  for each value in said set, generating a corresponding nonlinearly scaled value S_s (f), where ##EQU9##

6. In a speech analysis system in which an audio signal is spectrum analyzed to determine the behavior of formant resonances over an interval of time, a frequency compensation and amplitude scaling method comprising:
- repeatedly within said interval, evaluating a set of parameters determining the short-term power spectrum of said audio signal in a subinterval within the said interval, thereby to generate a sequence of short-term power spectra;
  
  for each parameter in the set, determining the maximum value of the parameter occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval;
  
  smoothing the peak spectrum by averaging each maximum value with values from the set of maximum values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the typical frequency separation between formant frequencies;
  
  for each short-term power spectrum in said sequence of spectra, dividing the value for each parameter in the set by the corresponding smoothed maximum value in the smoothed peak spectrum, thereby to generate for each spectrum, a corresponding frequency band equalized spectrum comprising a set of equalized parameters S(f);
  
  generating a value A corresponding to the average of said set of N values, where ##EQU10## and F_o represents the width of each frequency band; and
  
  non-linearly scaling each spectrum by generating, for each value S(f) in each frequency band equalized spectrum, a corresponding value S_s (f), where ##EQU11##

7. In a speech recognition system, a method of comparing the spectrum of an audio signal representing speech with a vector of recognition coefficients (a_i, b_i, c), said method comprising:
- generating a set of parameters S(f) corresponding to the short-term power spectrum of said signal, each parameter representing the energy in a corresponding frequency band f;
  
  generating a value A corresponding to the average of said set of N parameters, where ##EQU12## and F_o represents the width of each frequency band;
  
  for each parameter in said set, generating a corresponding non-linearly scaled value S_s (f), where ##EQU13## generating from these values a set of linearly scaled values L_k, where ##EQU14## where the constant coefficients P_jk enhance the phonetic attributes of the processed speech and are independent of the particular speech patterns represented by the coefficients (a_i, b_i, c), and M equals the number of possible decision choices, andgenerating a numerical comparison value X, where ##EQU15## the comparison value being indicative of the match between the audio signal and the speech represented by the recognition coefficients.
- View Dependent Claims (8, 9, 10)
- - 8. A speech recognition system as set forth in claim 7 wherein the set of parameters S(f) is generated repeatedly over an interval corresponding to at least one spoken word, each such set of parameters being characterizable as a vector having a coordinate corresponding to each parameter;
    - summing over the said set of parameters the magnitudes of the values of the changes that occur between successive evaluations of each parameter, thereby to obtain a value corresponding to the arc length increment traversed by the multi-coordinate vector during the subinterval between successive evaluations;
      
      accumulating the arc length increments over successive subintervals so as to obtain a sequence of arc lengths throughout the said interval and a total arc length for the said interval;
      
      dividing the total arc length into a sequence of equal length segments corresponding in number to the number of vectors in the sequence of reference vectors;
      
      separating said sequence of arc lengths into groups, the cumulative arc length for each group being substantially equal to said equal length segments; and
      
      for each segment, selecting said set of parameter values S(f) defining a representative vector from the vectors associated with the corresponding group of arc lengths and comparing the selected set with the parameter values defining the corresponding recognition vector, the several comparisons so performed being indicative of the match between the audio signal and the speech corresponding to the recognition vectors.
  - 9. A speech recognition system as set forth in claim 8 wherein the set of parameters S(f) is generated repeatedly within an interval corresponding to at least one spoken word;
    - for each parameter in the set, determining the maximum occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval;
      
      smoothing the peak spectrum by averaging each maximum value with values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the normal frequency separation between formant frequencies; and
      
      for each set S(f), dividing each parameter therein by the corresponding smoothed maximum value in the smoothed peak spectrum, thereby to generate a set of frequency band equalized spectra corresponding to a frequency compensated audio signal over said interval.
  - 10. A speech recognition system as set forth in claim 7 wherein the set of values S(f) is generated repeatedly within an interval corresponding to at least one spoken word;
    - for each value in the set, determining the maximum occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval;
      
      smoothing the peak spectrum by averaging each peak value with values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the normal frequency separation between formant frequencies; and
      
      for each set S(f), dividing each value therein by the corresponding value in the smoothed peak spectrum, thereby to generate a set of frequency equalized spectra corresponding to the energy content of said audio signal over said interval, the values in the equalized spectra being utilized to generate the non-linearly scaled values S_s (f).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Voice Industries Corporation
Original Assignee
Dialog Systems Incorporated
Inventors
Moshier, Stephen L.
Primary Examiner(s)
Claffy, Kathleen H.
Assistant Examiner(s)
Kemeny, E. S.

Application Number

US05/644,722
Time in Patent Office

575 Days
Field of Search

179/1 SA, 179/1 SD
US Class Current

704/234
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

G10L 25/00 Speech or voice analysis te...

Speech recognition apparatus

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition apparatus

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links