Discrimination of components of audio signals based on multiscale spectro-temporal modulations

US 7,505,902 B2
Filed: 07/28/2005
Issued: 03/17/2009
Est. Priority Date: 07/28/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A method for discriminating sounds in an audio signal comprising the steps of:

forming an auditory spectrogram from the audio signal, said auditory spectrogram characterizing a physiological response to sound represented by the audio signal;

establishing a plurality of modulation-selective filters tuned to a range of frequency and temporal modulations of said auditory spectrogram;

filtering said auditory spectrogram into a plurality of multidimensional, time-varying cortical response signals, each of said cortical response signals indicative of the frequency modulations of said auditory spectrogram over a corresponding predetermined range of scales and of the temporal modulations of said auditory spectrogram over a corresponding predetermined range of rates;

decomposing said cortical response signals into orthogonal multidimensional component signals;

said cortical response signals existing in a cubic representation of rate, scale, and frequency components prior to the step of decompositiom;

said orthogonal multidimensional component signals including multiple scales of time and spectral resolution;

truncating said orthogonal multidimensional component signals; and

classifying said truncated component signals to discriminate therefrom a signal corresponding to a predetermined sound.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio signal (172) representative of an acoustic signal is provided to an auditory model (105). The auditory model (105) produces a high-dimensional feature set based on physiological responses, as simulated by the auditory model (105), to the acoustic signal. A multidimensional analyzer (106) orthogonalizes and truncates the feature set based on contributions by components of the orthogonal set to a cortical representation of the acoustic signal. The truncated feature set is then provided to classifier (108), where a predetermined sound is discriminated from the acoustic signal.

Citations

20 Claims

1. A method for discriminating sounds in an audio signal comprising the steps of:
- forming an auditory spectrogram from the audio signal, said auditory spectrogram characterizing a physiological response to sound represented by the audio signal;
  
  establishing a plurality of modulation-selective filters tuned to a range of frequency and temporal modulations of said auditory spectrogram;
  
  filtering said auditory spectrogram into a plurality of multidimensional, time-varying cortical response signals, each of said cortical response signals indicative of the frequency modulations of said auditory spectrogram over a corresponding predetermined range of scales and of the temporal modulations of said auditory spectrogram over a corresponding predetermined range of rates;
  
  decomposing said cortical response signals into orthogonal multidimensional component signals;
  
  said cortical response signals existing in a cubic representation of rate, scale, and frequency components prior to the step of decompositiom;
  
  said orthogonal multidimensional component signals including multiple scales of time and spectral resolution;
  
  truncating said orthogonal multidimensional component signals; and
  
  classifying said truncated component signals to discriminate therefrom a signal corresponding to a predetermined sound.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method for discriminating sounds in an audio signal as recited in claim 1, where said filtering step includes the step of convolving in both requisite time and requisite frequency said auditory spectrogram with each of a plurality of spectro-temporal response fields.
  - 3. The method for discriminating sounds in an audio signal as recited in claim 2, where said filtering step further includes the step of providing a corresponding wavelet as said each spectro-temporal response fields.
  - 4. The method for discriminating sounds in an audio signal as recited in claim 1 further including the step of averaging with respect to time over a predetermined number of time increments said cortical response signals prior to said decomposing step.
  - 5. The method for discriminating sounds in an audio signal as recited in claim 4, where said decomposing step includes the step of decomposing said cortical response signals into orthogonal scale, rate and frequency components.
  - 6. The method for discriminating sounds in an audio signal as recited in claim 1 further including the steps of:
    - forming a training auditory spectrogram from a known audio signal, said known audio signal associated with a corresponding known sound;
      
      establishing a plurality of modulation-selective filters tuned to a range of frequency and temporal modulations of said training auditory spectrogram;
      
      filtering said training auditory spectrogram into a plurality of multidimensional, time-varying training cortical response signals, each of said training cortical response signals indicative of the frequency modulations of said training auditory spectrogram over a corresponding predetermined range of scales and of the temporal modulations of said training auditory spectrogram over a corresponding predetermined range of rates;
      
      decomposing said training cortical response signals into orthogonal multidimensional component training signals;
      
      said cortical response signals existing in a cubic representation of rate, scale, and frequency components prior to the step of decomposition;
      
      said orthogonal multidimensional component training signals including multiple scales of time and spectral resolution;
      
      determining a signal size corresponding to each of said orthogonal multidimensional component training signals, said signal size setting a size of said corresponding orthogonal multidimensional component training signal to retain for classification;
      
      truncating said orthogonal multidimensional component training signals to said signal size;
      
      classifying said truncated orthogonal multidimensional component training signals;
      
      comparing said classification of said truncated orthogonal multidimensional component training signals with a classification of said known sound; and
      
      increasing said signal size and repeating the method at said training signal truncating step if said classification of said truncated orthogonal multidimensional component training signals does not match said classification of said known sound to within a predetermined tolerance.
  - 7. The method for discriminating sounds in an audio signal as recited in claim 6, where said signal size determining step includes the steps of:
    - establishing a contribution threshold;
      
      determining a contribution to each said orthogonal component training signals by a corresponding signal component thereof;
      
      selecting as said signal size a number of said corresponding signal components whose contribution to each said orthogonal component training signals is greater than said contribution threshold.
  - 8. The method for discriminating sounds in an audio signal as recited in claim 6, where said orthogonal multidimensional component signal truncating step includes the step of truncating each of said orthogonal component signals to said corresponding signal size.
  - 9. The method for discriminating sounds in an audio signal as recited in claim 1, where said classifying step includes the step of specifying human speech as said predetermined sound.

10. A method for discriminating sounds in an acoustic signal comprising the steps of:
- providing a known audio signal associated with a known sound having a known sound classification;
  
  forming a training auditory spectrogram from said known audio signal;
  
  establishing a plurality of modulation-selective filters tuned to a range of frequency and temporal modulations of said training auditory spectrogram;
  
  filtering said training auditory spectrogram into a plurality of multidimensional, time-varying training cortical response signals, each of said training cortical response signals indicative of the frequency modulations of said training auditory spectrogram over a corresponding predetermined range of scales and of the temporal modulations of said training auditory spectrogram over a corresponding predetermined range of rates;
  
  decomposing said training cortical response signals into orthogonal multidimensional component training signals;
  
  said training cortical response signals existing in a cubic representation of rate, scale, and frequency components prior to the step of decomposition;
  
  said orthogonal multidimensional component training signals including multiple scales of time and spectral resolution;
  
  determining a signal size corresponding to each of said orthogonal multidimensional component training signals, said signal size setting a size of said corresponding orthogonal multidimensional component training signal to retain for classification;
  
  truncating said orthogonal multidimensional component training signals to said signal size;
  
  classifying said truncated orthogonal multidimensional component training signals;
  
  comparing said classification of said truncated orthogonal multidimensional component training signals with a classification of said known sound;
  
  increasing said signal size and repeating the method at said training signal truncating step if said classification of said truncated orthogonal multidimensional component training signals does not match said classification of said known sound to within a predetermined tolerance;
  
  converting the acoustic signal to an audio signal;
  
  forming an auditory spectrogram from said audio signal, said auditory spectrogram characterizing a physiological response to sound represented by the audio signal;
  
  establishing a plurality of modulation-selective filters tuned to a range of frequency and temporal modulations of said auditory spectrogram;
  
  filtering said auditory spectrogram into a plurality of multidimensional, time-varying cortical response signals, each of said cortical response signals indicative of the frequency modulations of said auditory spectrogram over a corresponding predetermined range of scales and the temporal modulations of said auditory spectrogram over a corresponding predetermined range of rates;
  
  decomposing said cortical response signals into orthogonal multidimensional component signals;
  
  said cortical response signals existing in a cubic representation of rate, scale, and frequency components prior to the step of decomposition;
  
  said orthogonal multidimensional component signals including multiple scales of time and spectral resolution;
  
  truncating said orthogonal multidimensional component signals to said signal size; and
  
  classifying said truncated component signals to discriminate therefrom a signal corresponding to a predetermined sound.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The method for discriminating sounds in an acoustic signal as recited in claim 10, where said training auditory spectrogram filtering step and said auditory spectrogram filtering step both include the step of filtering via directional selective filters said auditory spectrogram into directional components of said plurality of multidimensional cortical response signals.
  - 12. The method for discriminating sounds in an acoustic signal as recited in claim 11, where said training auditory spectrogram filtering step and said auditory spectrogram filtering step both include the step of selecting maximally directed cortical response signals as said plurality of multidimensional cortical response signals.
  - 13. The method for discriminating sounds in an acoustic signal as recited in claim 11, where said training auditory spectrogram filtering step and said auditory spectrogram filtering step both include the step providing downward selective filters and upward selective filters as said directional selective filters.
  - 14. The method for discriminating sounds in an acoustic signal as recited in claim 10, where said classifying step includes the step of specifying human speech as said predetermined sound.

15. A system to discriminate sounds in an acoustic signal comprising:
- an early auditory model execution unit operable to produce at an output thereof an auditory spectrogram of an audio signal provided as an input thereto, said audio signal being a representation of said acoustic signal;
  
  a cortical model execution unit coupled to said output of said auditory model execution unit so as to receive said auditory spectrogram and to produce therefrom at an output thereof a time-varying signal representative of a cortical response to the acoustic signal;
  
  said cortical response signal existing in a cubic representation of rate, scale, and frequency components;
  
  a multi-linear analyzer coupled to said output of said cortical model execution unit and operable to determine a set of multidimensional orthogonal axes from said cortical representations, said multi-linear analyzer further operable to produce a reduced data set relative to said set of multidimensional orthogonal axes; and
  
  a classifier for determining speech from said reduced data set.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system for discriminating sounds in an acoustic signal as recited in claim 15, wherein said cortical model execution unit includes a bank of spectro-temporal modulation selective filters.
  - 17. The system for discriminating sounds in an acoustic signal as recited in claim 16, wherein said each of said spectro-temporal modulation selective filters is characterized by a wavelet.
  - 18. The system for discriminating sounds in an acoustic signal as recited in claim 16, wherein said each of said spectro-temporal modulation selective filters is directionally selective.
  - 19. The system for discriminating sounds in an acoustic signal as recited in claim 15, wherein said classifier includes at least one support vector machine.
  - 20. The system for discriminating sounds in an acoustic signal as recited in claim 15, where said classifier is operable to discriminate human speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
University of Maryland
Original Assignee
University of Maryland
Inventors
Shamma, Shihab A., Mesgarani, Nima
Primary Examiner(s)
Smits; Talivaldis I
Assistant Examiner(s)
KOVACEK, DAVID M

Application Number

US11/190,933
Publication Number

US 20060025989A1
Time in Patent Office

1,328 Days
Field of Search

704200-2001, 704/204, 704205-206, 704/207, 704/229, 704231-257, 704220-228, 704500-504, 381/110, 600300-301, 600/372, 600/379, 600382-383
US Class Current

704/231
CPC Class Codes

G10L 21/0272 Voice signal separating

Discrimination of components of audio signals based on multiscale spectro-temporal modulations

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Discrimination of components of audio signals based on multiscale spectro-temporal modulations

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links