Method for utilizing formant frequencies in speech recognition

US 5,146,539 A
Filed: 11/08/1988
Issued: 09/08/1992
Est. Priority Date: 11/30/1984
Status: Expired due to Term

First Claim

Patent Images

1. A method for recognizing speech signals organized into a sequence of frames, said method comprising:

storing a plurality of reference frames of formant frequencies representative of linguistic units providing a vocabulary of words;

generating a plurality of format frequency candidates for each frame of the speech signals desired to be recognized;

creating a plurality of all possible subsets of optimum formant frequencies for each frame based upon the plurality of formant frequency candidates generated for the respective frame, wherein each subset comprises some but not all of the formant frequency candidates generated for that frame;

comparing each subset of the plurality of all possible subsets of optimum formant frequencies for each frame of the speech signals to be recognized with each of the plurality of reference frames of formant frequencies;

selecting one subset from said plurality of all possible subsets of optimum formant frequencies for each frame from said formant frequency candidates which best matches the stored formant frequencies of a corresponding reference frame of formant frequencies in accordance with predetermined criteria; and

recognizing said speech signals in response to the selected one subset of optimum formant frequencies for respective frames.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognizer which utilizes hypothesis testing to determine formant frequencies for use in speech recognition. A pre-processor (36) receives speech signal frames and utilizes linear predictive coding to generate all formant frequency candidates. An optimum formant selector (38) operates with a comparator (40) to select from the formant candidates those formants which best match stored reference formants. A dynamic time warper (42) and high level recognition logic (44) operate to determine whether or not to declare a recognized word.

96 Citations

View as Search Results

7 Claims

1. A method for recognizing speech signals organized into a sequence of frames, said method comprising:
- storing a plurality of reference frames of formant frequencies representative of linguistic units providing a vocabulary of words;
  
  generating a plurality of format frequency candidates for each frame of the speech signals desired to be recognized;
  
  creating a plurality of all possible subsets of optimum formant frequencies for each frame based upon the plurality of formant frequency candidates generated for the respective frame, wherein each subset comprises some but not all of the formant frequency candidates generated for that frame;
  
  comparing each subset of the plurality of all possible subsets of optimum formant frequencies for each frame of the speech signals to be recognized with each of the plurality of reference frames of formant frequencies;
  
  selecting one subset from said plurality of all possible subsets of optimum formant frequencies for each frame from said formant frequency candidates which best matches the stored formant frequencies of a corresponding reference frame of formant frequencies in accordance with predetermined criteria; and
  
  recognizing said speech signals in response to the selected one subset of optimum formant frequencies for respective frames.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein said step of generating a plurality of formant frequency candidates for each frame of the speech signals to be recognized utilizes linear predictive coding to generate said formant frequency candidates for the respective frame.
  - 3. The method of claim 2 wherein said step of generating a plurality of formant frequency candidates for each frame of the speech signals to be recognized further includes:
    - factoring the output of said linear predictive coding with a Bairstow algorithm to provide roots indicative of respective formant frequency candidates for each respective frame.
  - 4. The method of claim 1 wherein said step of selecting utilizes the pitch frequency of the speech signals as an aspect of said predetermined criteria.
  - 5. The method of claim 1 wherein said step of selecting comprises:
    - generating representations of said formant frequency candidates as mel frequency and log bandwidth; and
      
      modeling said representations as multivariate Gaussian random variables.
  - 6. The method of claim 5 wherein said step of recognizing said speech signals further includes:
    - computing a likelihood function of the best match of a selected subset of optimum formant frequencies for a respective frame with a corresponding reference frame of formant frequencies utilizing a covariance matrix being correct speech signal recognition.

7. A method for recognizing speech signals organized into a sequence of frames, said method comprising:
- storing a plurality of reference frames of formant frequencies representative of linguistic units providing a vocabulary of words in which each reference frame contains a predetermined number of primary formant frequencies;
  
  generating a plurality of format frequency candidates for each frame of the speech signals desired to be recognized;
  
  grouping the formant frequency candidates for each frame into all possible combinations of subsets thereof having the same predetermined number of formant frequencies of each of the plurality of reference frames, wherein each subset comprises some but not all of the formant frequency candidates generated for that frame;
  
  comparing each of the possible combinations of subsets of formant frequency candidates for each frame of the speech signals to be recognized with the formant frequencies contained in each of the plurality of reference frames;
  
  selecting one subset of all possible subsets of optimum formant frequencies for each frame from said formant frequency candidates for the respective frame which best matches the stored formant frequencies of a respective reference frame in accordance with predetermined criteria; and
  
  recognizing said speech signals in response to the selected optimum formant frequencies comprising the selected subset of formant frequencies for each respective frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Chen, Yeunung, Leonard, R. Gary, Doddington, George R.
Primary Examiner(s)
Shaw, Dale M.
Assistant Examiner(s)
Knepper, David D.

Application Number

US07/270,427
Time in Patent Office

1,400 Days
Field of Search

381/36-39, 381/43, 381/50
US Class Current

704/241
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 25/15 the extracted parameters be...

Method for utilizing formant frequencies in speech recognition

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

96 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Method for utilizing formant frequencies in speech recognition

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

96 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links