On-line parametric histogram normalization for noise robust speech recognition
First Claim
Patent Images
1. A method, comprising:
- providing in a speech recognition system speech data indicative of an input speech at a plurality of time instants based on the input speech, the speech data comprising a plurality of data segments;
spectrally converting the data segments into a plurality of spectral coefficients having a probability distribution of values in spectral domain for providing spectral data indicative of the spectral coefficients based on the data segments;
obtaining a parametric representation of the probability distribution of values of the spectral coefficients based on the spectral data;
modifying the parametric representation based on one or more reference values for providing a modified parametric representation;
adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data; and
performing decorrelation conversion on the changed spectral data for providing extracted features of the input speech.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for improving noise robustness in speech recognition, wherein a front-end is used for extracting speech feature from an input speech and for providing a plurality of scaled spectral coefficients. The histogram of the scaled spectral coefficients is normalized to the histogram of a training set using Gaussian approximations. The normalized spectral coefficients are then converted into a set of cepstrum coefficients by a decorrelation module and further subjected to ceptral domain feature-vector normalization.
-
Citations
32 Claims
-
1. A method, comprising:
-
providing in a speech recognition system speech data indicative of an input speech at a plurality of time instants based on the input speech, the speech data comprising a plurality of data segments; spectrally converting the data segments into a plurality of spectral coefficients having a probability distribution of values in spectral domain for providing spectral data indicative of the spectral coefficients based on the data segments; obtaining a parametric representation of the probability distribution of values of the spectral coefficients based on the spectral data; modifying the parametric representation based on one or more reference values for providing a modified parametric representation; adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data; and performing decorrelation conversion on the changed spectral data for providing extracted features of the input speech. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A speech recognition front-end comprising:
-
a processing module, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants; a transform module for spectrally converting the data into a plurality of spectral coefficients having a related probability distribution of values in a spectral domain for providing spectral data indicative of the spectral coefficients; a software program, responsive to the spectral coefficients, for obtaining a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data; and a decorrelation module, responsive to the modified parametric representation, for providing extracted features based on the changed spectral data. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A network element in a communication system comprising:
-
a voice input device to receive input speech; and a speech recognition front-end, responsive to the input speech, for extracting speech features from the input speech for providing speech data indicative of the speech features so as to allow the back-end to recognize the input speech based on the speech features, wherein the front-end comprises; a processing module, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants; a transform module for spectrally converting the data into a plurality of spectral coefficients for providing spectral data indicative of the spectral coefficients having a related probability distribution of values in spectral domain; a computation module for performing decorrelation conversion on the spectral coefficients for providing the extracted features, and a software program, responsive to the spectral coefficients, for obtaining a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A software application product comprising a storage medium having a software application for use in a speech recognition front-end, the front end configured for extracting speech features from an input speech so as to allow a speech recognition back-end to recognize the input speech based on the extracted features, wherein the front-end
is configured to provide data indicative of the input speech at a plurality of time instants; -
to spectrally convert the data into a plurality of spectral coefficients having a related probability distribution of values in spectral domain for providing spectral data indicative of the spectral coefficients; and to perform decorrelation conversion on the spectral coefficients for providing the extracted feature, said software application comprising an algorithm for generating a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion. - View Dependent Claims (23, 24, 25, 26, 27, 28)
-
-
29. An electronic module, comprising:
-
means, responsive to an input speech in a speech recognition front-end, for providing data indicative of the input speech at a plurality of time instants, the speech data comprising a plurality of data segments; means for spectrally converting the data segments into a plurality of spectral coefficients having a probability distribution of values in a spectral domain for providing spectral data indicative of the spectral coefficients; means for performing decorrelation conversion on the spectral coefficients for providing extracted features based on the data segments; means for obtaining a parametric representation of the probability distribution of values of the spectral coefficients, means for modifying the parametric representation based on one or more reference values, and means, for adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data prior to the decorrelation conversion on the spectral coefficients. - View Dependent Claims (30, 31, 32)
-
Specification