Speech recognition method using time-frequency masking mechanism
First Claim
1. A speech recognition method in which an input speech in converted to a time sequence of a feature vector, said feature vector including one of a spectrum and a cepstrum, and a distance or probability between the input speech time sequence and a time sequence of the feature vector or a statistical model thereof, is calculated for recognition, comprising the steps of:
- effecting a time frequency masking by an operation of obtaining a masked speech spectrum by subtracting, from speech spectrum at present, a masking pattern which is a function of frequency obtained by smoothing immediately preceding speech spectrum by time and frequency; and
recognizing the speech by using the masked speech spectrum obtained by the above described operation at every time point.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition method in which input speech signals are converted to digital signals and then time sequentially converted to cepstrum coefficients or logarithmic spectra. Dynamic spectrum time sequence is obtained by time frequency filtering of cepstrum coefficients, or masked spectrum time sequence is obtained by time frequency masking of the logarithmic vector time sequence. Based on the dynamic cepstrum time sequence or masked spectrum time sequence obtained in this manner, speech is recognized.
29 Citations
13 Claims
-
1. A speech recognition method in which an input speech in converted to a time sequence of a feature vector, said feature vector including one of a spectrum and a cepstrum, and a distance or probability between the input speech time sequence and a time sequence of the feature vector or a statistical model thereof, is calculated for recognition, comprising the steps of:
-
effecting a time frequency masking by an operation of obtaining a masked speech spectrum by subtracting, from speech spectrum at present, a masking pattern which is a function of frequency obtained by smoothing immediately preceding speech spectrum by time and frequency; and recognizing the speech by using the masked speech spectrum obtained by the above described operation at every time point.
-
-
2. A speech recognition method, comprising the steps of:
-
converting an input speech to a digitized speech signal; converting said digitized speech signal to cepstrum coefficients at every prescribed time interval; obtaining a time sequence of dynamic cepstrum by subtracting a masking pattern from an input speech cepstrum at present; and recognizing the speech by using said dynamic cepstrum. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9)
-
-
10. A speech recognition method, comprising the steps of:
-
converting an input speech to a digitized speech signal; segmenting said digitized speech signal at every prescribed time interval in order to obtain a logarithmic spectrum time sequence by Fourier transform; effecting a time frequency masking by an operation of obtaining masked speech spectrum by subtracting, from speech spectrum at present, a masking pattern which is a function of frequency obtained by smoothing immediately preceding speech spectrum by time and frequency for obtaining a masked spectrum time sequence; and recognizing the speech by using said masked spectrum time sequence. - View Dependent Claims (11, 12, 13)
-
Specification