Speech recognition apparatus
First Claim
1. In a speech analysis system in which an audio signal is spectrum analyzed to determine the behavior of formant resonances over an interval of time, a frequency compensation method comprising:
- repeatedly within said interval, evaluating a set of parameters determining the short-term power spectrum of said audio signal in a subinterval within the said interval, thereby to generate a sequence of short-term power spectra;
for each parameter in the set, determining the maximum value of the parameter occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval;
smoothing the peak spectrum by averaging each maximum value with values from said set of maximum values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the typical frequency separation between formant frequencies; and
for each short-term power spectrum in said sequence of spectra, dividing the value for each parameter in the set by the corresponding smoothed maximum value in the smoothed peak spectrum, thereby to generate over said interval a sequence of frequency band equalized spectra corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the spectrum.
5 Assignments
0 Petitions
Accused Products
Abstract
In the speech recognition apparatus disclosed herein, an audio signal is digitized and a succession of short-term power spectra are generated over a time interval corresponding to a spoken word. The short-term power spectra are frequency band equalized as a function of the peak amplitude occurring in each frequency band over the word interval. The changes in amplitude in each frequency band subjective time and then a limited number of frequency band equalized spectra are selected as representing equal intervals of subjective time so as to suppress variations in rate of articulation. The selected spectra are then non-linearly sealed in amplitude and transformed so as to maximize the separation between phonetically different sounds. By means of a maximum-likelihood method, the transformed selected spectra are compared with a data base representing a vocabulary to be recognized.
-
Citations
10 Claims
-
1. In a speech analysis system in which an audio signal is spectrum analyzed to determine the behavior of formant resonances over an interval of time, a frequency compensation method comprising:
-
repeatedly within said interval, evaluating a set of parameters determining the short-term power spectrum of said audio signal in a subinterval within the said interval, thereby to generate a sequence of short-term power spectra; for each parameter in the set, determining the maximum value of the parameter occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval; smoothing the peak spectrum by averaging each maximum value with values from said set of maximum values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the typical frequency separation between formant frequencies; and for each short-term power spectrum in said sequence of spectra, dividing the value for each parameter in the set by the corresponding smoothed maximum value in the smoothed peak spectrum, thereby to generate over said interval a sequence of frequency band equalized spectra corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the spectrum.
-
-
2. In a speech analysis system in which an audio signal is analyzed over an interval corresponding to a spoken word to determine the behavior of formant resonances relative to a sequence of reference vectors representing a preselected word, a method of selecting sample points within said interval comprising:
-
repeatedly over said interval, evaluating a set of parameters corresponding to the energy spectrum of said signal at that time, each such set of values being characterizable as a vector having a coordinate corresponding to each parameter; summing over the said set of parameters the magnitudes of the values of the changes that occur between successive evaluations of each parameter, thereby to obtain a value corresponding to the arc length increment traversed by the multi-coordinate vector during the subinterval between successive evaluations; accumulating the arc length increments over successive subintervals so as to obtain a sequence of arc lengths throughout the said interval and a total arc length for the said interval; dividing the total arc length into a sequence of equal length segments corresponding in number to the number of vectors in the sequence of reference vectors; separating said sequence of arc lengths into groups, the cumulative arc length for each group being substantially equal to said equal length segments; and for each segment, selecting a set of parameter values defining a representative vector from the vectors associated with the corresponding group of arc lengths and comparing the selected set with the parameter values defining the corresponding recognition vector, the several comparisons so performed being indicative of the match between the audio signal and the speech corresponding to the recognition vectors. - View Dependent Claims (3)
-
-
4. In a speech analysis system in which an audio signal is analyzed over an interval corresponding to a spoken word to determine the behavior of formant resonances relative to a sequence of reference vectors representing a preselected word, a method of obtaining and selecting sample points within said interval comprising:
-
repeatedly within said interval, evaluating a set of parameters determining the short-term power spectrum of said audio signal in a subinterval within the said interval, thereby to generate a sequence of shot-term power spectra; for each parameter in the set, determining the maximum value of the parameter occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval; smoothing the peak spectrum by averaging each maximum value with values from said set of maximum values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the typical frequency separation between formant frequencies; for each short-term power spectrum in said sequence of spectra, dividing the value for each parameter in the set by the corresponding smoothed maximum value in the smoothed peak spectrum, thereby to generate over said interval a sequence of frequency band equalized spectra corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the spectrum, each such set of equalized parameters being characterizable as a vector having a coordinate corresponding to each parameter; summing over the said set of equalized parameters the magnitudes of the values of the changes that occur between successive evaluations of each equalized parameter, thereby to obtain a value corresponding to the arc length increment traversed by the multi-coordinate vector during the subinterval between successive evaluations; accumulating the arc length increments over successive subintervals so as to obtain a sequence of arc lengths throughout the said interval and a total arc length for the said interval; dividing the total arc length into a sequence of equal length segments corresponding in number to the number of vectors in the sequence of reference vectors; separating said sequences of arc lengths into groups, the cumulative arc length for each group being substantially equal to said equal length segments; and for each segment, selecting a set of equalized parameter values defining a representative vector from the vectors associated with the corresponding group of arc lengths and comparing the selected set with the parameter values defining the corresponding reference vector, the several comparisons so performed being indicative of the match between the audio signal and the speech corresponding to the reference vectors.
-
-
5. In a speech analysis system, a method of enhancing the information content of the spectrum of an audio signal representing speech, said method comprising:
-
generating a set of values S(f) corresponding to the energy spectrum of said signal, each value representing the energy in a corresponding frequency band f; generating a value A corresponding to the average of said set of N values, where ##EQU8## and Fo represents the width of each frequency band; and
for each value in said set, generating a corresponding nonlinearly scaled value Ss (f), where ##EQU9##
-
-
6. In a speech analysis system in which an audio signal is spectrum analyzed to determine the behavior of formant resonances over an interval of time, a frequency compensation and amplitude scaling method comprising:
-
repeatedly within said interval, evaluating a set of parameters determining the short-term power spectrum of said audio signal in a subinterval within the said interval, thereby to generate a sequence of short-term power spectra; for each parameter in the set, determining the maximum value of the parameter occurring over the interval, the set of maximum values thereby determined corresponding to a peak spectrum over the interval; smoothing the peak spectrum by averaging each maximum value with values from the set of maximum values corresponding to adjacent frequencies, the width of the band of frequencies contributing to each averaged value being approximately equal to the typical frequency separation between formant frequencies; for each short-term power spectrum in said sequence of spectra, dividing the value for each parameter in the set by the corresponding smoothed maximum value in the smoothed peak spectrum, thereby to generate for each spectrum, a corresponding frequency band equalized spectrum comprising a set of equalized parameters S(f); generating a value A corresponding to the average of said set of N values, where ##EQU10## and Fo represents the width of each frequency band; and
non-linearly scaling each spectrum by generating, for each value S(f) in each frequency band equalized spectrum, a corresponding value Ss (f), where ##EQU11##
-
-
7. In a speech recognition system, a method of comparing the spectrum of an audio signal representing speech with a vector of recognition coefficients (ai, bi, c), said method comprising:
-
generating a set of parameters S(f) corresponding to the short-term power spectrum of said signal, each parameter representing the energy in a corresponding frequency band f; generating a value A corresponding to the average of said set of N parameters, where ##EQU12## and Fo represents the width of each frequency band;
for each parameter in said set, generating a corresponding non-linearly scaled value Ss (f), where ##EQU13## generating from these values a set of linearly scaled values Lk, where ##EQU14## where the constant coefficients Pjk enhance the phonetic attributes of the processed speech and are independent of the particular speech patterns represented by the coefficients (ai, bi, c), and M equals the number of possible decision choices, andgenerating a numerical comparison value X, where ##EQU15## the comparison value being indicative of the match between the audio signal and the speech represented by the recognition coefficients. - View Dependent Claims (8, 9, 10)
-
Specification