Speech recognition
First Claim
1. A method of recognizing an unknown speech utterance comprising the steps of:
- (i) representing said unknown speech utterance as a first sequence of parameter frames, each parameter frame representing a corresponding time frame of said utterance;
(ii) providing a plurality of reference templates, each comprising a second sequence of parameter frames expressed in the same kind of parameters as the first sequence of parameter frames;
each parameter frame of the first sequence and second sequence of parameter frames comprising a set of primary parameters and a set of secondary parameters, each of the secondary parameters representing the signed difference between corresponding primary parameters in respective parameter frames derived for different time frames;
(iii) computing a dynamic loudness component Δ
Ci,o from said unknown speech utterance as a secondary parameter, and providing a corresponding dynamic loudness component Δ
T.sub..,o in each of said secondary parameter frames, said dynamic loudness components being a signed rate of change in overall amplitude between frames;
(iv) comparing each of the primary and secondary parameters in the sequence of parameter frames of the unknown utterance with each reference template and determining which of the reference templates most closely resembles the unknown utterance.
2 Assignments
0 Petitions
Accused Products
Abstract
In a speech recognizer, for recognizing unknown utterances in isolated-word speech or continuous speech, improved recognition accuracy is obtained by augmenting the usual spectral representation of the unknown utterance with a dynamic component. A corresponding dynamic component is provided in the templates with which the spectral representation of the utterance is compared. In preferred embodiments, the representation is mel-based cepstral and the dynamic components comprise vector differences between pairs of primary cepstra. Preferably the time interval between each pair is about 50 milliseconds. It is also preferable to compute a dynamic perceptual loudness component along with the dynamic parameters.
85 Citations
10 Claims
-
1. A method of recognizing an unknown speech utterance comprising the steps of:
-
(i) representing said unknown speech utterance as a first sequence of parameter frames, each parameter frame representing a corresponding time frame of said utterance; (ii) providing a plurality of reference templates, each comprising a second sequence of parameter frames expressed in the same kind of parameters as the first sequence of parameter frames; each parameter frame of the first sequence and second sequence of parameter frames comprising a set of primary parameters and a set of secondary parameters, each of the secondary parameters representing the signed difference between corresponding primary parameters in respective parameter frames derived for different time frames; (iii) computing a dynamic loudness component Δ
Ci,o from said unknown speech utterance as a secondary parameter, and providing a corresponding dynamic loudness component Δ
T.sub..,o in each of said secondary parameter frames, said dynamic loudness components being a signed rate of change in overall amplitude between frames;(iv) comparing each of the primary and secondary parameters in the sequence of parameter frames of the unknown utterance with each reference template and determining which of the reference templates most closely resembles the unknown utterance. - View Dependent Claims (2, 3, 4, 5)
-
-
6. Apparatus for recognizing an unknown speech utterance in a speech signal comprising:
-
(i) means for representing an unknown speech utterance as a first sequence of parameter frames, each parameter frame representing a corresponding time frame of said utterance; (ii) means for providing a plurality of parameter frames expressed in the same kind of parameters as the first sequence of parameter frames; each parameter frame of the first sequence and second sequence of parameter frames comprising a set of primary parameters and a set of secondary parameters, each of the secondary parameters representing the signed difference between corresponding primary parameters in respective parameter frames derived from different time frames; (iii) means responsive to said unknown speech utterance for computing a dynamic loudness component Δ
Ci,0 for said first sequence of parameter frames and means for providing a dynamic loudness component Δ
Tl,0 for said second sequence of parameter frames, each component being one of the secondary parameters said dynamic loudness components being a signed rate of change in overall amplitude between frames;(iv) means for comparing each of the primary and secondary parameters in the sequence of parameter frames of the utterance with each reference template and for determining which of the reference templates most nearly resembles the unknown utterance. - View Dependent Claims (7, 8, 9, 10)
-
Specification