Mel-frequency linear prediction speech recognition apparatus and method
First Claim
Patent Images
1. A speech recognition system comprising:
- microphone means for receiving acoustic waves and converting the acoustic waves into electronic signals;
linear prediction (LP) signal processing means, coupled to said microphone means, for processing the electronic signals to generate LP parametric representations of the electronic signals;
mel-frequency linear prediction (MFLP) generating means, coupled to said LP signal processing means, for mel-frequency warping said LP parametric representations to generate MFLP parametric representations of the electronic signals; and
word comparison means coupled to said MFLP means, for comparing said MFLP parametric representations of the electronic signals to parametric representation of words in a database.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is an apparatus and method for generating parametric representation of input speech based on a mel-frequency warping of the vocal tract spectrum which is computationally efficient and provides increased recognition accuracy over conventional LP cepstrum approaches. It is capable of rapid processing operable in many different devices. The invention is a speech recognition system comprising linear prediction (LP) signal processor and a mel-frequency linear prediction (MFLP) generator for mel-frequency warping the LP parameters to generate MFLP parametric representations for robust, perceptually modeled speech recognition requiring minimal computation and storage.
25 Citations
19 Claims
-
1. A speech recognition system comprising:
-
microphone means for receiving acoustic waves and converting the acoustic waves into electronic signals;
linear prediction (LP) signal processing means, coupled to said microphone means, for processing the electronic signals to generate LP parametric representations of the electronic signals;
mel-frequency linear prediction (MFLP) generating means, coupled to said LP signal processing means, for mel-frequency warping said LP parametric representations to generate MFLP parametric representations of the electronic signals; and
word comparison means coupled to said MFLP means, for comparing said MFLP parametric representations of the electronic signals to parametric representation of words in a database. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A speech recognition system for recognizing a speech signal, comprising:
-
a pre-emphasizer for spectrally flattening the speech signal;
a frame blocker, coupled to said pre-emphasizer, for frame blocking the speech signal;
a windower, coupled to said frame blocker, for windowing each blocked frame;
a pre-warp LP generator, coupled to said windower, to generating a plurality of pre-warp LP parameters;
a mel-NDFT warper, coupled to said pre-warp LP generator, for utilizing a non-uniform discrete Fourier transform (NDFT) to warp said pre-warp LP parameters on a mel scale to generate a plurality of mel scale-warped LP parameters;
a power spectrum generator, coupled to said mel-NDFT warper, for generating a warped vocal-tract power spectrum from said mel scale-warped LP parameters;
an IDFT generator, coupled to said power spectrum generator, for generating an inverse discrete Fourier transform of the warped vocal-tract power spectrum;
a post-warp LP generator, coupled to said IDFT generator, for generating a plurality of post-warp LP parameters; and
a cepstrum converter, coupled to said post-warp LP generator, for converting said post-warp LP parameters to a plurality of MFLP cepstral coefficients. - View Dependent Claims (8, 9, 10, 11, 14, 15, 16, 17, 18)
-
-
12. A mobile communication device comprising:
-
a flash memory;
a microprocessor, coupled to said flash memory, a DSP processor, coupled to said flash memory and said microprocessor, and responsive to said flash memory and said microprocessor, for performing mel-frequency linear prediction (MFLP) speech recognition;
a read-only-memory (ROM) device, coupled to said DSP processor, for storage of data; and
a random access memory (RAM) device 505, for storage of data.
-
-
13. A method for modifying the linear prediction (LP) vocal-tract spectrum comprising the steps of:
-
(a) mel-frequency warping the LP vocal-tract spectrum to generate a mel-frequency warped LP vocal-tract spectrum;
(b) modeling said mel-frequency warped LP vocal-tract spectrum utilizing a predetermined number of peaks; and
(c) performing linear prediction on said modeled mel-frequency warped LP vocal-tract spectrum to generate an LP mel-frequency warped LP vocal-tract spectrum.
-
-
19. A method for processing speech acoustic signals, comprising the steps of:
-
(a) receiving the speech acoustic waves utilizing a microphone;
(b) converting the speech acoustic waves into electronic signals;
(c) parameterizing the electronic signals utilizing linear prediction (LP);
(d) mel-frequency warping said linear prediction parametric representations; and
(e) comparing said mel-frequency warped linear prediction parametric representation with parametric representations of words in a database.
-
Specification