Method for utilizing formant frequencies in speech recognition
First Claim
Patent Images
1. A method for recognizing speech signals organized into a sequence of frames, said method comprising:
- storing a plurality of reference frames of formant frequencies representative of linguistic units providing a vocabulary of words;
generating a plurality of format frequency candidates for each frame of the speech signals desired to be recognized;
creating a plurality of all possible subsets of optimum formant frequencies for each frame based upon the plurality of formant frequency candidates generated for the respective frame, wherein each subset comprises some but not all of the formant frequency candidates generated for that frame;
comparing each subset of the plurality of all possible subsets of optimum formant frequencies for each frame of the speech signals to be recognized with each of the plurality of reference frames of formant frequencies;
selecting one subset from said plurality of all possible subsets of optimum formant frequencies for each frame from said formant frequency candidates which best matches the stored formant frequencies of a corresponding reference frame of formant frequencies in accordance with predetermined criteria; and
recognizing said speech signals in response to the selected one subset of optimum formant frequencies for respective frames.
0 Assignments
0 Petitions
Accused Products
Abstract
A speech recognizer which utilizes hypothesis testing to determine formant frequencies for use in speech recognition. A pre-processor (36) receives speech signal frames and utilizes linear predictive coding to generate all formant frequency candidates. An optimum formant selector (38) operates with a comparator (40) to select from the formant candidates those formants which best match stored reference formants. A dynamic time warper (42) and high level recognition logic (44) operate to determine whether or not to declare a recognized word.
96 Citations
7 Claims
-
1. A method for recognizing speech signals organized into a sequence of frames, said method comprising:
-
storing a plurality of reference frames of formant frequencies representative of linguistic units providing a vocabulary of words; generating a plurality of format frequency candidates for each frame of the speech signals desired to be recognized; creating a plurality of all possible subsets of optimum formant frequencies for each frame based upon the plurality of formant frequency candidates generated for the respective frame, wherein each subset comprises some but not all of the formant frequency candidates generated for that frame; comparing each subset of the plurality of all possible subsets of optimum formant frequencies for each frame of the speech signals to be recognized with each of the plurality of reference frames of formant frequencies; selecting one subset from said plurality of all possible subsets of optimum formant frequencies for each frame from said formant frequency candidates which best matches the stored formant frequencies of a corresponding reference frame of formant frequencies in accordance with predetermined criteria; and recognizing said speech signals in response to the selected one subset of optimum formant frequencies for respective frames. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for recognizing speech signals organized into a sequence of frames, said method comprising:
-
storing a plurality of reference frames of formant frequencies representative of linguistic units providing a vocabulary of words in which each reference frame contains a predetermined number of primary formant frequencies; generating a plurality of format frequency candidates for each frame of the speech signals desired to be recognized; grouping the formant frequency candidates for each frame into all possible combinations of subsets thereof having the same predetermined number of formant frequencies of each of the plurality of reference frames, wherein each subset comprises some but not all of the formant frequency candidates generated for that frame; comparing each of the possible combinations of subsets of formant frequency candidates for each frame of the speech signals to be recognized with the formant frequencies contained in each of the plurality of reference frames; selecting one subset of all possible subsets of optimum formant frequencies for each frame from said formant frequency candidates for the respective frame which best matches the stored formant frequencies of a respective reference frame in accordance with predetermined criteria; and recognizing said speech signals in response to the selected optimum formant frequencies comprising the selected subset of formant frequencies for each respective frame.
-
Specification