Method of and apparatus for speech recognition wherein decisions are made based on phonemes
First Claim
1. A method for recognizing speech comprising:
- (a) performing a linear prediction analysis of plural phonemes including the vowels and a nasal sound to calculate pth order LPC cepstrum coefficients in response to periodic frame derived for plural word utterances by plural speakers;
(b) in response to the calculated LPC cepstrum coefficients calculating a covariance matrix W that is a function of all the phonemes and a mean value mi for each of the particular phonemes, wherei represents the particular phoneme;
(c) deriving a weighting coefficient ##EQU25## where j=1,2 . . . pδ
jj'"'"' =value of element jj'"'"' of inverse matrix W-1 of covariance matrix W;
(d) deriving the values aij, δ
jj'"'"', mij'"'"', and mit W-1 mi for each of said phonemes as coefficient values for the phonemes;
(e) in response to known phoneme sounds being uttered by a speaker deriving the value of an LPC cepstrum coefficient for each phoneme;
(f) storing these LPC cepstrum coefficients with the previously stored coefficient values of the corresponding phonemes to derive standard patterns for the phonemes;
(g) during a recognition mode while replicas of unknown words including the phonemes are derived;
(i) performing phoneme segmentation of each unknown word and(ii) for each segmented phoneme determining the similarity of LPC cepstrum coefficients of each segmented phoneme of the unknown words with the stored coefficient values of the standard patterns for the phonemes in accordance with ##EQU26## where t is a matrix transportation factor;
(h) selecting the standard phoneme most similar to the uttered phoneme in response to the value of Li ;
(i) combining the selected standard phonemes to form a phoneme string for an uttered word; and
(j) comparing the formed phoneme string for an uttered word with stored phoneme strings for known words to determined which of the known words is the uttered word.
0 Assignments
0 Petitions
Accused Products
Abstract
Linear prediction coefficients of a speech signal including unknown words are derived for each of successive periodic frame intervals. For every frame over the duration of an individual phoneme of the speech signal, the degree of similarity of stored coefficients of known words and derived coefficients of the unknown words are calculated so that at the end of the individual phonemes, the degree of similarity is calculated. Phoneme segmentation data are derived in response to the speech signal and combined with the calculated degree of similarity over the individual phoneme to derive phoneme strings of the speech signal. The derived and stored phoneme strings are compared to indicate the words stored in a word dictionary having the greatest similarity with the derived phoneme strings.
75 Citations
7 Claims
-
1. A method for recognizing speech comprising:
-
(a) performing a linear prediction analysis of plural phonemes including the vowels and a nasal sound to calculate pth order LPC cepstrum coefficients in response to periodic frame derived for plural word utterances by plural speakers; (b) in response to the calculated LPC cepstrum coefficients calculating a covariance matrix W that is a function of all the phonemes and a mean value mi for each of the particular phonemes, where i represents the particular phoneme; (c) deriving a weighting coefficient ##EQU25## where j=1,2 . . . p δ
jj'"'"' =value of element jj'"'"' of inverse matrix W-1 of covariance matrix W;(d) deriving the values aij, δ
jj'"'"', mij'"'"', and mit W-1 mi for each of said phonemes as coefficient values for the phonemes;(e) in response to known phoneme sounds being uttered by a speaker deriving the value of an LPC cepstrum coefficient for each phoneme; (f) storing these LPC cepstrum coefficients with the previously stored coefficient values of the corresponding phonemes to derive standard patterns for the phonemes; (g) during a recognition mode while replicas of unknown words including the phonemes are derived; (i) performing phoneme segmentation of each unknown word and (ii) for each segmented phoneme determining the similarity of LPC cepstrum coefficients of each segmented phoneme of the unknown words with the stored coefficient values of the standard patterns for the phonemes in accordance with ##EQU26## where t is a matrix transportation factor;
(h) selecting the standard phoneme most similar to the uttered phoneme in response to the value of Li ;(i) combining the selected standard phonemes to form a phoneme string for an uttered word; and (j) comparing the formed phoneme string for an uttered word with stored phoneme strings for known words to determined which of the known words is the uttered word. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification