Data-driven filtering of cepstral time trajectories for robust speech recognition
First Claim
1. A method for speech processing in a distributed-speech recognition system having a front-end and a back-end for recognizing words from speech signals, said method comprising the steps of:
- extracting speech features from the speech signals, wherein the speech features contain a speech-to-noise ratio;
normalizing the speech features;
filtering the normalized speech features in a frequency domain; and
conveying the filtered speech features from the front-end to the back-end.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for speech processing in a distributed speech recognition system having a front-end and a back-end. The speech processing steps in the front-end are as follows: extracting speech features from a speech signal and normalizing the speech features in order to alter the power of the noise component in the modulation spectrum in relation to the power of the signal component, especially with frequencies above 10 Hz. A low-pass filter is then used to filter the normalized modulation spectrum in order to improve the signal-to-noise ratio (SNR) in the speech signal. The combination of feature vector normalization and low-pass filtering is effective in noise removal, especially in a low SNR environment.
-
Citations
20 Claims
-
1. A method for speech processing in a distributed-speech recognition system having a front-end and a back-end for recognizing words from speech signals, said method comprising the steps of:
-
extracting speech features from the speech signals, wherein the speech features contain a speech-to-noise ratio;
normalizing the speech features;
filtering the normalized speech features in a frequency domain; and
conveying the filtered speech features from the front-end to the back-end. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A distributed speech recognition front-end comprising:
-
first means, responsive to a speech signal, for extracting speech features from said speech signal and for providing a first signal indicative of the extracted speech features;
second means, responsive to the first signal, for normalizing the extracted speech features and for providing a second signal indicative of the normalized speech features;
third means, responsive to the second signal, for filtering the normalized speech features in a frequency domain in order to reduce noise in the second signal and for providing a third signal indicative of the filtered speech features; and
means for conveying the third signal to a distributed speech recognition back-end in order for the back-end to recognize words representative of the speech signal from the third signal. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A distributed speech recognition system for processing a speech signal, said system comprising:
-
a front-end, responsive to the speech signal, for extracting speech features from the speech signal and for providing a first signal indicative of the extracted speech features; and
a back-end, responsive to the first signal, for recognizing words representative of the speech signals and for providing a second signal indicative of the recognized words, wherein the front-end has means to normalize the extracted-speech features and means to filter the normalized speech features in order to reduce noise in the speech signal. - View Dependent Claims (15, 16)
-
-
17. A speech recognition feature extractor for extracting speech features from a speech signal, comprising:
-
a time-to-frequency domain transformer for generating spectral magnitude values in a frequency domain of the speech signal and for providing a first signal indicative of the spectral magnitude values;
a feature generator, responsive to the first signal, for generating a plurality of feature vectors and for providing a second signal indicative of the generated speech features;
a normalizing means, responsive to the second signal, for normalizing the generated feature vectors and for providing a third signal indicative of the normalized feature vectors; and
a frequency filtering means, responsive to the first signal, for reducing noise in the normalized feature vectors and for providing the speech features indicative of the noise-reduction feature vectors. - View Dependent Claims (18, 19)
-
-
20. A communication device having a voice input unit to allow a user to input speech signals to the device, and means for providing speech data to an external apparatus, wherein the external apparatus includes a distributed-speech recognition back-end capable of recognizing speech based on the speech data, said communication device comprising
a front-end unit, responsive to the speech signals, for extracting speech features from the speech signals for providing a first signal indicative of the extracted speech features, wherein the front-end includes: -
means, responsive to the first signal, for normalizing the extracted-speech features for providing a second signal indicative of the normalized speech features, and means, responsive to the second signal, for filtering the normalized speech features in order to reduce noise in the speech signals and for including the filtered speech features in the speech data.
-
Specification