Systems and methods for classifying audio into broad phoneme classes

US 7,424,427 B2
Filed: 10/16/2003
Issued: 09/09/2008
Est. Priority Date: 10/17/2002
Status: Active Grant

First Claim

Patent Images

1. A method for classifying an audio signal containing speech information, the method comprising:

receiving the audio signal;

classifying a sound in the audio signal as a vowel class when a first phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define vowels;

classifying the sound in the audio signal as a fricative class when a second phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define consonants; and

classifying the sound in the audio signal based on at least one non-phoneme based model, the at least one non-phoneme based model including at least one model for classifying the sound in the audio signal based on bandwidth.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio classification system classifies sounds in an audio stream as belonging to one of a relatively small number of classes. The audio classification system includes a signal analysis component [301] and a decoder [302]. The decoder [302] includes a number of models [310-316] for performing the audio classifications. In one implementation, the possible classifications include: vowels, fricatives, narrowband, wideband, coughing, gender, and silence. The classified audio may be used to enhance speech recognition of the audio stream.

Citations

31 Claims

1. A method for classifying an audio signal containing speech information, the method comprising:
- receiving the audio signal;
  
  classifying a sound in the audio signal as a vowel class when a first phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define vowels;
  
  classifying the sound in the audio signal as a fricative class when a second phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define consonants; and
  
  classifying the sound in the audio signal based on at least one non-phoneme based model, the at least one non-phoneme based model including at least one model for classifying the sound in the audio signal based on bandwidth.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising:
    - classifying the sound in the audio signal as belonging to one of the vowel class, the fricative class, a coughing class, and a silence class;
      
      classifying the sound in the audio signal as belonging to one of a narrowband class and a wideband class after classifying the sound in the audio signal in the one of the vowel class, the fricative class, the coughing class, and the silence class; and
      
      classifying the sound in the audio signal as belonging to one of a male class and a female class after classifying the sound in the audio signal in the one of the narrowband class and the wideband class;
      
      wherein the at least one non-phoneme based model includes models for classifying the sound in the audio signal based on speaker gender.
  - 3. The method of claim 1, wherein the at least one non-phoneme based model includes a model for classifying the sound in the audio signal as silence.
  - 4. The method of claim 1, further comprising:
    - initially converting the audio signal into a frequency domain signal.
  - 5. The method of claim 1, further comprising:
    - generating cepstral features for the audio signal.
  - 6. The method of claim 1, wherein the fricative class includes phonemes that relate to fricatives and obstruents.
  - 7. The method of claim 1, wherein the first and second phoneme-based models are Hidden Markov Models.
  - 8. The method of claim 1, further comprising:
    - classifying the sound in the audio signal as a coughing class when the sound corresponds to a non-speech sound.
  - 9. The method of claim 8, wherein the non-speech sound includes at least one of coughing, laughter, breath, and lip-smack.

10. A method of training audio classification models, the method comprising:
- receiving a training audio signal;
  
  receiving phoneme classes corresponding to the training audio signal;
  
  training a first Hidden Markov Model (HMM), based on the training audio signal and the phoneme classes, to classify speech as belonging to a vowel class when the first HMM determines that the speech corresponds to a sound represented by a set of phonemes that define vowels;
  
  training a second HMM, based on the training audio signal and the phoneme classes, to classify speech as belonging to a fricative class when the second HMM determines that the speech corresponds to a sound represented by a set of phonemes that define consonants; and
  
  training at least one model to classify the sound based on a bandwidth of the sound.
- View Dependent Claims (11, 12, 13, 14, 31)
- - 11. The method of claim 10, wherein the phoneme classes include information that defines word boundaries.
  - 12. The method of claim 11, wherein the method further comprises:
    - receiving a sequence of transcribed words corresponding to the audio signal; and
      
      generating the information that defines the word boundaries based on the transcribed words.
  - 13. The method of claim 10, further comprising:
    - training at least one model to classify the sound based on gender of a speaker of the sound.
  - 14. The method of claim 10, wherein the fricative class includes phonemes that relate to fricatives and obstruents.
  - 31. The method of claim 10, further comprising:
    - training at least one model to classify the sound as belonging to one of the vowel class, the fricative class, a coughing class, and a silence class;
      
      training at least one model to classify the sound as belonging to one of a narrowband class and a wideband class after classifying the sound in the one of the vowel class, the fricative class, the coughing class, and the silence class; and
      
      training at least one model to classify the sound as belonging to one of a male class and a female class after classifying the sound in the audio signal in the one of the narrowband class and the wideband class.

15. An audio classification device comprising:
- a signal analysis component configured to receive an audio signal and process the audio signal by at least one of the converting the audio signal to the frequency domain and generating cepstral features for the audio signal; and
  
  a decoder configured to classify portions of the audio signal as belonging to at least one of the plurality of classes, the classes includinga first phoneme-based class that applies to the audio signal when a portion of the audio signal corresponds to a sound represented by a set of phonemes that define vowels,a second phoneme-based class that applies to the audio signal when a portion of the audio signal corresponds to a sound represented by a set of phonemes that define consonants, andat least one non-phoneme class;
  
  wherein the decoder determines the at least one non-phoneme class using models that classify the portions of the audio signal based on bandwidth.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The audio classification device of claim 15, wherein the second phoneme-based class includes fricative phonemes and obstruent phonemes.
  - 17. The audio classification device of claim 15, wherein the first and second phoneme-based classes are determined based on hidden Markov Models.
  - 18. The audio classification device of claim 15, wherein the decoder determines the at least one non-phoneme class using models that classify the portions of the audio signal based on speaker gender;
    - wherein the decoder is configured to classify portions of the audio signal as belonging to one of the vowel class, the fricative class, a coughing class, and a silence class;
      
      wherein the decoder is configured to classify the portions of the audio signal as belonging to one of a narrowband class and a wideband class after the decoder classifies the portions of the audio signal in one of the vowel class, fricative class coughing class, and silence class; and
      
      wherein the decoder is configured to classify the portions of the audio signal as belonging to one of a male class and a female class after the decoder classifies the portions of the audio signal in one of the narrowband class and wideband class.
  - 19. The audio classification device of claim 15, wherein the decoder determines the at least one non-phoneme class using a model that classifies the portions of the audio signal as silence.
  - 20. The audio classification device of claim 15, wherein the plurality of classes additionally include:
    - a third phoneme-based class that applies to the audio signal when a portion of the audio signal corresponds to a non-speech sound.
  - 21. The audio classification device of claim 20, wherein the non-speech sound includes at least one of the coughing, laughter, breath, and lip-smack.

22. A system comprising:
- an indexer configured to receive input audio data and generate a rich transcription from the audio data, the indexer including;
  
  audio classification logic configured to classify the input audio data into at least one of a plurality of broad audio classes, the broad audio classes including a phoneme-based vowel class, a phoneme-based fricative class, a non-phoneme based bandwidth class, and a non-phoneme based gender class,a speech recognition component configured to generate the rich transcription based on the broad audio classes determined by the audio classification logic;
  
  a memory system for storing the rich transcription; and
  
  a server configured to receive requests for documents and respond to the requests by transmitting one or more of the rich transcriptions that match the requests.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The system of claim 22, wherein the broad audio classes further include a phoneme-based coughing class;
    - wherein the audio classification logic is configured to classify the input audio data as belonging to one of the vowel class, the fricative class, the coughing class, and a silence class;
      
      wherein the audio classification logic is configured to classify the input audio data as belonging to one of a narrowband class and a wideband class after the audio classification logic classifies input audio data in one of the vowel class, fricative class coughing class, and silence class; and
      
      wherein the audio classification logic is configured to classify the input audio data as belonging to one of a male class and a female class after the audio classification logic classifies the input audio data in one of the narrowband class and wideband class.
  - 24. The system of claim 23, wherein the coughing class includes sounds relating to coughing, laughter, breath, and lip-smack.
  - 25. The system of claim 22, wherein the phoneme-based fricative class includes phonemes that define fricative or obstruent sounds.
  - 26. The system of claim 22, wherein the indexer further includes at least one of:
    - a speaker clustering component, a speaker identification component, a name spotting component, and a topic classification component.

27. A device comprising:
- means for classifying a sound in an audio signal as a vowel class when a first phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define vowels;
  
  means for classifying the sound in the audio signal as a fricative class when a second phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define consonants; and
  
  means for classifying the sound in the audio signal based on at least one non-phoneme based model, the at least one non-phoneme based model including at least one model for classifying the sound in the audio signal based on bandwidth.
- View Dependent Claims (28, 29, 30)
- - 28. The device of claim 27, further comprising:
    - means for converting the audio signal into a frequency domain signal.
  - 29. The device of claim 27, further comprising:
    - means for generating cepstral features for the audio signal.
  - 30. The device of claim 27, further comprising:
    - means for in the sound in the audio signal as belonging to one of the vowel class, the fricative class, a coughing class, and a silence class when the sound corresponds to a non-speech sound;
      
      means for classifying the sound in the audio signal as belonging to one of a narrowband class and a wideband class after classifying the sound in the audio signal in the one of the vowel class, the fricative class, the coughing class, and the silence class; and
      
      means for classifying the sound in the audio signal as belonging to one of a male class and a female class after classifying the sound in the audio signal in the one of the narrowband class and the wideband class.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verizon Patent and Licensing Incorporated (Verizon Communications Inc.)
Original Assignee
BBN Technologies Corporation (Rtx Corporation), Verizon Corporate Services Group Incorporated (Verizon Communications Inc.)
Inventors
Kubala, Francis G., Liu, Daben
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
SIEDLER, DOROTHY S

Application Number

US10/685,585
Publication Number

US 20040230432A1
Time in Patent Office

1,790 Days
Field of Search

704/206, 704/208, 704/246, 704/256.1, 704/256.2
US Class Current

704/256.1
CPC Class Codes

G10L 15/28 Constructional details of s...

G10L 15/32 Multiple recognisers used i...

Systems and methods for classifying audio into broad phoneme classes

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for classifying audio into broad phoneme classes

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links