On-line parametric histogram normalization for noise robust speech recognition
First Claim
1. A method of improving noise robustness in a speech recognition system, the system including a front-end for extracting speech features from an input speech and a back-end for speech recognition based on the extracted features, wherein the front-end comprises:
- means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
means, responsive to the data segments, for spectrally converting the data segments into a plurality of spectral coefficients having a related probability distribution of values for providing spectral data indicative of the spectral coefficients; and
means, responsive to the spectral data, for performing decorrelation conversion on the spectral coefficients for providing the extracted features, characterized by obtaining a parametric representation of the probability distribution of values of the spectral coefficients;
modifying the parametric representation based on one or more reference values; and
adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the decorrelation conversion.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for improving noise robustness in speech recognition, wherein a front-end is used for extracting speech feature from an input speech and for providing a plurality of scaled spectral coefficients. The histogram of the scaled spectral coefficients is normalized to the histogram of a training set using Gaussian approximations. The normalized spectral coefficients are then converted into a set of cepstrum coefficients by a decorrelation module and further subjected to ceptral domain feature-vector normalization.
34 Citations
28 Claims
-
1. A method of improving noise robustness in a speech recognition system, the system including a front-end for extracting speech features from an input speech and a back-end for speech recognition based on the extracted features, wherein the front-end comprises:
-
means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
means, responsive to the data segments, for spectrally converting the data segments into a plurality of spectral coefficients having a related probability distribution of values for providing spectral data indicative of the spectral coefficients; and
means, responsive to the spectral data, for performing decorrelation conversion on the spectral coefficients for providing the extracted features, characterized by obtaining a parametric representation of the probability distribution of values of the spectral coefficients;
modifying the parametric representation based on one or more reference values; and
adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the decorrelation conversion. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A speech recognition front-end for use in a speech recognition system having a back-end, the front end extracting speech features from an input speech so as to allow the back-end to recognize the input speech based on the extracted features, the front-end comprising:
-
means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
means for spectrally converting the data into a plurality of spectral coefficients having a related probability distribution of values for providing spectral data indicative of the spectral coefficients; and
means for performing decorrelation conversion on the spectral coefficients for providing the extracted features to the back-end, characterized by means, responsive to the spectral coefficients, for obtaining a parametric representation of the probability distribution of values of the spectral, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A network element in a communication system including a back-end for receiving speech data from the network element, the network element comprising:
-
a voice input device to receive input speech; and
a front-end, responsive to the input speech, for extracting speech features from the input speech for providing speech data indicative of the speech features so as to allow the back-end to recognize the input speech based on the speech features, wherein the front-end comprises;
means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
means for spectrally converting the data into a plurality of spectral coefficients for providing spectral data indicative of the spectral coefficients having a related probability distribution of values; and
means for performing decorrelation conversion on the spectral coefficients for providing the extracted features, said network element characterized in that the front-end further comprises means, responsive to the spectral coefficients, for obtaining a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A computer program for use in a speech recognition front-end for extracting speech features from an input speech so as to allow a speech recognition back-end to recognize the input speech based on the extracted features, wherein the front-end comprises:
-
means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
means for spectrally converting the data into a plurality of spectral coefficients having a related probability distribution of values for providing spectral data indicative of the spectral coefficients; and
means for performing decorrelation conversion on the spectral coefficients for providing the extracted feature, said computer program characterized by an algorithm for generating a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion. - View Dependent Claims (23, 24, 25, 26, 27, 28)
-
Specification