Multi-source phoneme classification for noise-robust automatic speech recognition

US 7,319,959 B1
Filed: 05/14/2003
Issued: 01/15/2008
Est. Priority Date: 05/14/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method of processing an audio signal comprising:

computing 600 spectral values on a logarithmic frequency scale from the audio signal;

separating the 600 spectral values into a plurality of streams which group sounds from a same source prior to classification;

analyzing each separated stream to determine phoneme-level classification; and

outputting one or more words of the audio signal.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are disclosed for processing an audio signal including separating the audio signal into a plurality of streams which group sounds from a same source prior to classification and analyzing each separate stream to determine phoneme-level classification. One or more words of the audio signal may then be outputted.

42 Citations

View as Search Results

15 Claims

1. A method of processing an audio signal comprising:
- computing 600 spectral values on a logarithmic frequency scale from the audio signal;
  
  separating the 600 spectral values into a plurality of streams which group sounds from a same source prior to classification;
  
  analyzing each separated stream to determine phoneme-level classification; and
  
  outputting one or more words of the audio signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of processing an audio signal as recited in claim 1 wherein phoneme-level classification accuracy is enhanced by providing as an input to a classifier a spectral envelope.
  - 3. The method of processing an audio signal as recited in claim 1 wherein phoneme-level classification accuracy is enhanced by providing as an input to a classifier detected transients.
  - 4. The method of processing an audio signal as recited in claim 1 wherein phoneme-level classification accuracy is enhanced by providing as an input to a classifier pitch and voicing information.
  - 5. The method of processing an audio signal as recited in claim 1 further comprising normalizing for speaker characteristics prior to classification.
  - 6. The method of processing an audio signal as recited in claim 1 further comprising performing noise-threshold tracking.
  - 7. The method of processing an audio signal as recited in claim 6 further comprising adjusting gain and setting noise floor reference levels based on the noise-threshold tracking.
  - 8. The method of processing an audio signal as recited in claim 1 further comprising training with a full phoneme target set.
  - 9. The method of processing an audio signal as recited in claim 1 further comprising incorporating a model of syllabic stress.
  - 10. The method of processing an audio signal as recited in claim 1 wherein the spectral values are computed with a 6 microsecond resolution post-interpolation.
  - 11. The method of processing an audio signal as recited in claim 1 wherein the spectral values are updated every 22 microseconds.
  - 12. The method of processing an audio signal as recited in claim 1 further comprising using the output for automatic voice-dialing for phones.
  - 13. The method of processing an audio signal as recited in claim 1 further comprising using the output for automatic command of a system or device.
  - 14. The method of processing an audio signal as recited in claim 1 further comprising using the output as an interface to a device.
  - 15. The method of processing an audio signal as recited in claim 1 wherein the output comprises a meeting transcription.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Knowles Electronics Llc (Knowles Corporation)
Original Assignee
Audience Corporation
Inventors
Watts, Lloyd
Primary Examiner(s)
Storm; Donald L.

Application Number

US10/439,284
Time in Patent Office

1,707 Days
Field of Search

None
US Class Current

704/254
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/16   using artificial neural net...

G10L 15/20   Speech recognition techniqu...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 25/18   the extracted parameters be...

Multi-source phoneme classification for noise-robust automatic speech recognition

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

15 Claims

Specification

Use Cases

Quick Links

Others

Multi-source phoneme classification for noise-robust automatic speech recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

15 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others