Mel-frequency linear prediction speech recognition apparatus and method

US 20020065649A1
Filed: 08/15/2001
Published: 05/30/2002
Est. Priority Date: 08/25/2000
Status: Abandoned Application

First Claim

Patent Images

1. A speech recognition system comprising:

microphone means for receiving acoustic waves and converting the acoustic waves into electronic signals;

linear prediction (LP) signal processing means, coupled to said microphone means, for processing the electronic signals to generate LP parametric representations of the electronic signals;

mel-frequency linear prediction (MFLP) generating means, coupled to said LP signal processing means, for mel-frequency warping said LP parametric representations to generate MFLP parametric representations of the electronic signals; and

word comparison means coupled to said MFLP means, for comparing said MFLP parametric representations of the electronic signals to parametric representation of words in a database.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is an apparatus and method for generating parametric representation of input speech based on a mel-frequency warping of the vocal tract spectrum which is computationally efficient and provides increased recognition accuracy over conventional LP cepstrum approaches. It is capable of rapid processing operable in many different devices. The invention is a speech recognition system comprising linear prediction (LP) signal processor and a mel-frequency linear prediction (MFLP) generator for mel-frequency warping the LP parameters to generate MFLP parametric representations for robust, perceptually modeled speech recognition requiring minimal computation and storage.

25 Citations

View as Search Results

19 Claims

1. A speech recognition system comprising:
- microphone means for receiving acoustic waves and converting the acoustic waves into electronic signals;
  
  linear prediction (LP) signal processing means, coupled to said microphone means, for processing the electronic signals to generate LP parametric representations of the electronic signals;
  
  mel-frequency linear prediction (MFLP) generating means, coupled to said LP signal processing means, for mel-frequency warping said LP parametric representations to generate MFLP parametric representations of the electronic signals; and
  
  word comparison means coupled to said MFLP means, for comparing said MFLP parametric representations of the electronic signals to parametric representation of words in a database.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The speech recognition system of claim 1 wherein said mel-frequency linear prediction (MFLP) generating means comprises:
    - non-uniform discrete Fourier transform (NDFT) generator means for generating the NDFT of said LP parametric representations of the electronic signals;
      
      warper means, coupled to said NDFT generator means, for mel-frequency warping said NDFT;
      
      smoothing means, coupled to said warper means, for smoothing said mel-frequency warped NDFT; and
      
      cepstral parameter converter means, coupled to said smoothing means, for converting said LP parametric representations of the electronic signals to cepstral parameters.
  - 3. The speech recognition system of claim 2 wherein said smoothing means utilizes a low-order all-pole LP generator.
  - 4. The speech recognition system of claim 1 wherein said word comparison means is a dynamic time warper speech recognition system.
  - 5. The speech recognition system of claim 1 wherein said word comparison means is a hidden Markov model speech recognition system.
  - 6. The speech recognition system of claim 1 wherein said word comparison means is a neural network speech recognition system.

7. A speech recognition system for recognizing a speech signal, comprising:
- a pre-emphasizer for spectrally flattening the speech signal;
  
  a frame blocker, coupled to said pre-emphasizer, for frame blocking the speech signal;
  
  a windower, coupled to said frame blocker, for windowing each blocked frame;
  
  a pre-warp LP generator, coupled to said windower, to generating a plurality of pre-warp LP parameters;
  
  a mel-NDFT warper, coupled to said pre-warp LP generator, for utilizing a non-uniform discrete Fourier transform (NDFT) to warp said pre-warp LP parameters on a mel scale to generate a plurality of mel scale-warped LP parameters;
  
  a power spectrum generator, coupled to said mel-NDFT warper, for generating a warped vocal-tract power spectrum from said mel scale-warped LP parameters;
  
  an IDFT generator, coupled to said power spectrum generator, for generating an inverse discrete Fourier transform of the warped vocal-tract power spectrum;
  
  a post-warp LP generator, coupled to said IDFT generator, for generating a plurality of post-warp LP parameters; and
  
  a cepstrum converter, coupled to said post-warp LP generator, for converting said post-warp LP parameters to a plurality of MFLP cepstral coefficients.
- View Dependent Claims (8, 9, 10, 11, 14, 15, 16, 17, 18)
- - 8. The speech recognition system of claim 7 wherein said pre-emphasizer is a fixed low-order digital filter.
  - 9. The speech recognition system of claim 7 wherein said windower is a Hamming window.
  - 10. The speech recognition system of claim 7 wherein said warped vocal-tract power spectrum is modeled utilizing a predetermined number of peaks.
  - 11. The speech recognition system of claim 7 further comprising:
    - a word template for storing a plurality of cepstral coefficient parametric representations of word pronunciations;
      
      a dynamic time warper for dynamic behavior analysis of said MFLP cepstral coefficients; and
      
      a word comparator, coupled to said cepstrum converter, to said word template, and to said dynamic time warper, for comparing said plurality of MFLP cepstral coefficients with said plurality of cepstral coefficient parametric representations of word pronunciations;
  - 14. The method of claim 13 wherein step (a) comprises the steps of:
    - (a) calculating the discrete-time Fourier transform (DTFT) of the finite impulse response LP parameters;
      
      (b) taking a predetermined number of samples of said DTFT of the finite impulse response LP parameters;
      
      (c) utilizing a non-uniform grid for said DTFT of the LP vocal-tract spectrum to generate a non-uniform discrete Fourier transform (NDFT); and
      
      (d) oversampling a mel filterbank to generate a warped grid for said NDFT of the finite impulse response LP parameters.
  - 15. The method of claim 13 wherein said non-uniform grid of step (c) is substantially similar to the mel frequency scale.
  - 16. The method of claim 14 wherein said oversampling of step (d) is linear from 0 to 1000 Hz and frequency samples in the octaves greater than 1000 Hz are sampled at equal spaces in the log domain.
  - 17. The method of claim 13 wherein said predetermined number of peaks in step (b) is two.
  - 18. The method of claim 13 wherein said step (c) comprises the steps of:
    - computing the inverse discrete Fourier transform (DFT) said modeled mel-frequency warped LP vocal-tract spectrum;
      
      generating a predetermined number of samples of an autocorrelation sequence of said modeled mel-frequency warped LP vocal-tract spectrum; and
      
      performing linear prediction to generate a plurality of LP parameters from said modeled mel-frequency warped LP vocal-tract spectrum.

12. A mobile communication device comprising:
- a flash memory;
  
  a microprocessor, coupled to said flash memory, a DSP processor, coupled to said flash memory and said microprocessor, and responsive to said flash memory and said microprocessor, for performing mel-frequency linear prediction (MFLP) speech recognition;
  
  a read-only-memory (ROM) device, coupled to said DSP processor, for storage of data; and
  
  a random access memory (RAM) device 505, for storage of data.

13. A method for modifying the linear prediction (LP) vocal-tract spectrum comprising the steps of:
- (a) mel-frequency warping the LP vocal-tract spectrum to generate a mel-frequency warped LP vocal-tract spectrum;
  
  (b) modeling said mel-frequency warped LP vocal-tract spectrum utilizing a predetermined number of peaks; and
  
  (c) performing linear prediction on said modeled mel-frequency warped LP vocal-tract spectrum to generate an LP mel-frequency warped LP vocal-tract spectrum.

19. A method for processing speech acoustic signals, comprising the steps of:
- (a) receiving the speech acoustic waves utilizing a microphone;
  
  (b) converting the speech acoustic waves into electronic signals;
  
  (c) parameterizing the electronic signals utilizing linear prediction (LP);
  
  (d) mel-frequency warping said linear prediction parametric representations; and
  
  (e) comparing said mel-frequency warped linear prediction parametric representation with parametric representations of words in a database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verbaltek Incorporated
Original Assignee
Verbaltek Incorporated
Inventors
Kim, Yoon

Application Number

US09/929,944
Publication Number

US 20020065649A1
Time in Patent Office

Days
Field of Search
US Class Current

704/219
CPC Class Codes

G10L 19/08 Determination or coding of ...

Mel-frequency linear prediction speech recognition apparatus and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

25 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Mel-frequency linear prediction speech recognition apparatus and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links