Distance measure in a speech recognition system for speech recognition using frequency shifting factors to compensate for input signal frequency shifts
First Claim
1. A speech recognition system comprising:
- a line spectral pair generator to generate line spectral pair frequencies from a speech input signal; and
a first speech classifier comprising;
a quantizer for determining a distance measure between an ith speech input signal line spectral pair frequency and an ith reference speech signal line spectral pair frequency, wherein the distance measure, for i=1 to N1, is proportional to (i) a difference between the ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency and (ii) a shift of the difference by an ith frequency shifting factor to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise, wherein N1 is greater than or equal to one and less than or equal to P;
wherein the first speech classifier is capable of receiving output data based on the distance measures and is capable of generating speech classification output data for classifying the speech input signal.
9 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ). Frames of the speech input signal are represented by a vector ƒ of line spectral pair frequencies and are fuzzy matrix quantized to respective a vector ƒ entries in a codebook of the FMQ. A distance measure between ƒ and ƒ, d(ƒ,ƒ), is defined as ##EQU1## where the constants α1, a2, β1 and β2 are set to substantially minimize quantization error, and ei is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal. The speech recognition system may also include hidden Markov models and neural networks, such as a multilevel perceptron neural network, speech classifiers.
37 Citations
36 Claims
-
1. A speech recognition system comprising:
-
a line spectral pair generator to generate line spectral pair frequencies from a speech input signal; and a first speech classifier comprising; a quantizer for determining a distance measure between an ith speech input signal line spectral pair frequency and an ith reference speech signal line spectral pair frequency, wherein the distance measure, for i=1 to N1, is proportional to (i) a difference between the ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency and (ii) a shift of the difference by an ith frequency shifting factor to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise, wherein N1 is greater than or equal to one and less than or equal to P; wherein the first speech classifier is capable of receiving output data based on the distance measures and is capable of generating speech classification output data for classifying the speech input signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A speech recognition system comprising:
-
means for generating P order line spectral pair frequencies for a speech input signal; means for determining a difference, for i=1 to N1, between the ith line spectral pair frequency and an ith line spectral frequency of a reference speech signal; means for shifting the difference by an ith frequency shifting factor, for i=1 to N1, to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise; means for determining a difference, for i=N1 +1 to P, between ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency; means for weighting of the difference by an ith frequency weighting factor, for i=N1 +1 to P, wherein ith frequency shifting and weighting factor is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal; and means for utilizing the shifted difference to classify the speech input signal. - View Dependent Claims (15, 16, 17)
-
-
18. A method of generating a robust distance measure in a speech recognition system comprising the steps of:
-
generating P order line spectral pair frequencies for a speech input signal; determining a difference, for i=1 to N1, between the ith line spectral pair frequency and an ith line spectral frequency of a reference speech signal; shifting the difference, for i=1 to N1, by an ith frequency shifting factor to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise; and utilizing the shifted difference to classify the speech input signal. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A method of robust speech recognition in an automotive environment comprising the steps of:
-
receiving a speech input signal; representing the speech input signal with a vector ƒ
of P line spectral pair frequencies;representing a codeword in a quantizer codebook as a vector ƒ
of P line spectral pair frequencies; anddetermining a distance measure between the vector ƒ and
the vector ƒ
, wherein the distance measure, d(ƒ
,ƒ
), is defined by;
##EQU29## wherein the constants α
1, α
2, β
1 and β
2 are set to substantially minimize quantization error, and ei is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal. - View Dependent Claims (24, 25, 26)
-
-
27. An apparatus comprising:
a quantizer that receives speech input signal line spectral pair frequencies, that includes at least one codebook having reference speech signal line specral pair frequencies, and that determines a distance measure between an ith speech input signal line spectral pair frequency and an ith reference speech signal line spectral pair frequency, wherein the distance measure, for i=1 to N1, is proportional to (i) a difference between the ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency and (ii) a subtraction from the difference of an ith frequency shifting factor to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise, wherein N1 is greater that or equal to one and less than or equal to P. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36)
Specification