Line spectral frequencies and energy features in a robust signal recognition system
First Claim
1. A speech recognition system comprising:
- a line spectral pair frequency coefficient generator;
an energy coefficients generator; and
a first speech classifier capable of using Nth order vectors to generate first speech classification output data for classifying a speech input signal as recognized speech, wherein the speech input signal is represented by a number of frames with each frame represented by one of the Nth order vectors, wherein components of each Nth order vector include respective line spectral pair frequency coefficients for P orders generated by the line spectral pair frequency coefficient generator, a first energy coefficient generated by the energy coefficients generator and representing original energy of the speech input signal for the respective frame, and a second energy coefficient generated by the energy coefficients generator and representing a first derivative of the original energy of the speech input signal for the respective frame, wherein N and P are integers.
9 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ). Frames of the speech input signal are represented in a matrix by a vectorf of line spectral pair frequencies and energy coefficients and are fuzzy matrix quantized to respective vector f entries of a matrix codeword in a codebook of the FMQ. The energy coefficients include the original energy and the first and second derivatives of the original energy which increase recognition accuracy by, for example, being generally distinctive speech input signal parameters and providing noise signal suppression especially when the noise signal has a relatively constant energy over at least two time frame intervals. To reduce data while maintaining sufficient resolution, the energy coefficients may be normalized and logarithmically represented. A distance measure between f and f, d(f, f), is defined as ##EQU1## where the constants α1, α2, β1 and β2 are set to substantially minimize quantization error, ei is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal, the first G LSP frequencies are most likely to be frequency shifted by noise, and the last P+3 coefficients represent the three energy coefficients. This robust distance measure can be used to enhance speech recognition performance in generally any speech recognition system using line spectral pair based distance measures.
59 Citations
35 Claims
-
1. A speech recognition system comprising:
-
a line spectral pair frequency coefficient generator; an energy coefficients generator; and a first speech classifier capable of using Nth order vectors to generate first speech classification output data for classifying a speech input signal as recognized speech, wherein the speech input signal is represented by a number of frames with each frame represented by one of the Nth order vectors, wherein components of each Nth order vector include respective line spectral pair frequency coefficients for P orders generated by the line spectral pair frequency coefficient generator, a first energy coefficient generated by the energy coefficients generator and representing original energy of the speech input signal for the respective frame, and a second energy coefficient generated by the energy coefficients generator and representing a first derivative of the original energy of the speech input signal for the respective frame, wherein N and P are integers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An apparatus comprising:
-
means for generating P order line spectral pair frequencies for an acoustic input signal; means for determining a difference, for i=1 to G, between the ith line spectral pair frequency and an ith line spectral frequency of a reference acoustic signal; means for shifting the difference by an ith frequency shifting factor, for i=1 to G, to at least partially compensate for frequency shifting of the ith acoustic input signal line spectral pair frequency by acoustic noise; means for determining a difference, for i=G +1 to P, between ith acoustic input signal line spectral pair frequency and the ith reference acoustic signal line spectral pair frequency; means for weighting of the difference by an ith frequency weighting factor, for i=G+1 to P, wherein ith frequency shifting and weighting factor is the error power spectrum of the acoustic input signal and a predicted acoustic input signal at the ith line spectral pair frequency of the acoustic input signal; means for determining an energy of the acoustic input signal; means for determining a first derivative of the acoustic input signal energy; and means for utilizing the shifted and weighted differences for each of the P line spectral pair frequencies, the energy of the acoustic input signal, and the first derivative of the acoustic input signal energy to classify the acoustic input signal.
-
-
21. A method of generating a robust distance measure in a speech recognition system comprising the steps of:
-
determining energy coefficients of each of X frames of a speech input signal, wherein the step of determining energy coefficients comprises the steps of; determining a first energy coefficient for each of the X frames, wherein the first energy coefficient represents original energy of the speech input signal for a respective one of the X frames; and determining a second energy coefficient for each of the X frames, wherein the second energy coefficient represents a first derivative of the original energy of the respective one of the X frames; determining P order line spectral pair frequencies for the speech input signal; representing the energy coefficients and line spectral pair frequencies as components of a vector; determining respective differences between the energy coefficients of the speech input signal and corresponding energy coefficients of a plurality of reference codewords; determining respective differences between the respective P line spectral frequencies of the speech input signal and corresponding P line spectral frequencies of the reference codewords; and utilizing the energy coefficients and line spectral pair frequencies respective differences to classify the speech input signal as one of the reference codewords. - View Dependent Claims (22, 23, 24, 25)
-
-
26. A method of robust speech recognition in an automotive environment comprising the steps of:
-
receiving a speech input signal corrupted by automotive environment noise; representing each frame of the speech input signal with a vector f of P line spectral pair frequencies and X energy coefficients; representing each of n codewords in a quantizer codebook as a respective vector f of P line spectral pair frequencies and X energy coefficients, wherein n is a nonnegative integer; and determining a distance measure between the vector f and each respective vector f, wherein the distance measure, d(f,f), is defined by;
##EQU33## using the distance measure to classify the speech input signal as recognized speech;wherein the constants α
1, α
2, α
3, β
1 and β
2 are set to substantially minimize quantization error, and ei is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the input signal. - View Dependent Claims (27, 28, 29, 30)
-
-
31. An apparatus comprising:
a first classifier capable of using Nth order vectors to generate first speech classification output data for classifying the input signal, wherein the input signal is represented by a number of frames with each frame represented by an Nth order vector, wherein components of each Nth order vector include respective line spectral pair frequency coefficients for P orders, a first energy coefficient representing original energy of the input signal for the respective frame, and a second energy coefficient representing a first derivative of the original energy of the speech input signal for the respective frame, wherein N and P are integers. - View Dependent Claims (32, 33, 34)
-
35. A method comprising the steps of:
-
determining energy coefficients of each of X frames of an input signal, wherein the step of determining energy coefficients comprises the steps of; determining a first energy coefficient for each of the X frames, wherein the first energy coefficient represents original energy of the input signal for a respective one of the X frames; and determining a second energy coefficient for each of the X frames, wherein the second energy coefficient represents a first derivative of the original energy of the respective one of the X frames; determining P order line spectral pair frequencies for the input signal; representing the energy coefficients and line spectral pair frequencies as components of a vector; determining respective differences between the energy coefficients of the input signal and corresponding energy coefficients of a plurality of reference codewords; determining respective differences between the respective P line spectral frequencies of the input signal and corresponding P line spectral frequencies of the reference codewords; and utilizing the energy coefficients and line spectral pair frequencies respective differences to classify the input signal as one of the reference codewords.
-
Specification