Systems and methods for speech recognition using frequency domain linear prediction polynomials to form temporal and spectral envelopes from frequency domain representations of signals

US 7,672,838 B1
Filed: 12/01/2004
Issued: 03/02/2010
Est. Priority Date: 12/01/2003
Status: Active Grant

First Claim

Patent Images

1. An electronic method of extracting speech features from signals for use in performing automatic speech recognition, the method comprising using a computer processor to perform:

receiving a signal;

performing a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation;

dividing the frequency domain representation into a plurality of frequency bands;

fitting a FDLP polynomial to each of the plurality of frequency bands;

performing a frequency-to-time domain transformation to extract temporal envelopes from each of the plurality of frequency bands using the fitted FDLP polynomial;

constructing spectral envelopes by taking a plurality of points at each of a plurality of time values in the temporal envelopes, wherein each of the spectral envelopes has points taken at a particular one of the time values;

fitting a smooth envelope to each of the spectral envelopes; and

generating at least one speech feature based at least in part on the temporal and spectral envelopes of each of the plurality of frequency bands.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated.

Citations

9 Claims

1. An electronic method of extracting speech features from signals for use in performing automatic speech recognition, the method comprising using a computer processor to perform:
- receiving a signal;
  
  performing a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation;
  
  dividing the frequency domain representation into a plurality of frequency bands;
  
  fitting a FDLP polynomial to each of the plurality of frequency bands;
  
  performing a frequency-to-time domain transformation to extract temporal envelopes from each of the plurality of frequency bands using the fitted FDLP polynomial;
  
  constructing spectral envelopes by taking a plurality of points at each of a plurality of time values in the temporal envelopes, wherein each of the spectral envelopes has points taken at a particular one of the time values;
  
  fitting a smooth envelope to each of the spectral envelopes; and
  
  generating at least one speech feature based at least in part on the temporal and spectral envelopes of each of the plurality of frequency bands.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the step of fitting a smooth envelope to each of the spectral envelopes further comprises iterating the fitting in frequency and time domains.
  - 3. The method of claim 1, wherein the smooth envelope is fitted by fitting a linear prediction polynomial to each of the spectral envelopes.
  - 4. The method of claim 3, wherein the linear prediction polynomial is fitted by calculating the inverse Fourier transform of the magnitude Fourier transform of the spectral envelope raised to a given power.
  - 5. The method of claim 1, further comprising modifying each of the spectral envelopes by nonlinearly warping the frequency axis.
  - 6. The method of claim 1, further comprising modifying each of the spectral envelopes by nonlinearly warping the time axis.

7. A system for extracting speech features from signals for use in performing automatic speech recognition, the system comprising:
- means for receiving a signal;
  
  means for performing a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation;
  
  means for dividing the frequency domain representation into a plurality of frequency bands;
  
  means for fitting a FDLP polynomial to each of the plurality of frequency bands;
  
  means for performing a frequency-to-time domain transformation to extract temporal envelopes from each of the plurality of frequency bands using the fitted FDLP polynomial;
  
  means for constructing spectral envelopes by taking a plurality of points at each of a plurality of time values in the temporal envelopes, wherein each of the spectral envelopes has points taken at a particular one of the time values;
  
  means for fitting a smooth envelope to each of the spectral envelopes; and
  
  means for generating at least one speech feature based at least in part on the temporal and spectral envelopes of each of the plurality of frequency bands.

8. A system for extracting speech features from signals for use in performing automatic speech recognition, the system comprising:
- a processor at least partially executing a modeling application configured to;
  
  receive a signal;
  
  perform a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation;
  
  divide the frequency domain representation into a plurality of frequency bands;
  
  fit a FDLP polynomial to each of the plurality of frequency bands;
  
  perform a frequency-to-time domain transformation to extract temporal envelopes from each of the plurality of frequency bands using the fitted FDLP polynomial;
  
  construct spectral envelopes by taking a plurality of points at each of a plurality of time values in the temporal envelopes, wherein each of the spectral envelopes has points taken at a particular one of the time values;
  
  fit a smooth envelope to each of the spectral envelopes; and
  
  generate at least one speech feature based at least in part on the temporal and spectral envelopes of each of the plurality of frequency bands.

9. A computer readable medium for storing computer executable instructions for extracting speech features from signals, the executable instructions comprising the steps of:
- receiving a signal;
  
  performing a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation;
  
  dividing the frequency domain representation into a plurality of frequency bands;
  
  fitting a FDLP polynomial to each of the plurality of frequency bands;
  
  performing a frequency-to-time domain transformation to extract temporal envelopes from each of the plurality of frequency bands using the fitted FDLP polynomial;
  
  constructing spectral envelopes by taking a plurality of points at each of a plurality of time values in the temporal envelopes, wherein each of the spectral envelopes has points taken at a particular one of the time values;
  
  fitting a smooth envelope to each of the spectral envelopes; and
  
  generating at least one speech feature based at least in part on the temporal and spectral envelopes of each of the plurality of frequency bands.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Trustees Of Columbia University In The City Of New York (Columbia University)
Original Assignee
Trustees Of Columbia University In The City Of New York (Columbia University)
Inventors
Athineos, Marios, Ellis, Daniel P. W., Hermansky, Hynek
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
YEN, ERIC L

Application Number

US11/000,874
Time in Patent Office

1,917 Days
Field of Search

704/231, 704/235, 704/236, 704/251, 704/219, 704/209
US Class Current

704/209
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 25/12 the extracted parameters be...

Systems and methods for speech recognition using frequency domain linear prediction polynomials to form temporal and spectral envelopes from frequency domain representations of signals

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for speech recognition using frequency domain linear prediction polynomials to form temporal and spectral envelopes from frequency domain representations of signals

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links