Systems and methods for classifying audio into broad phoneme classes
First Claim
Patent Images
1. A method for classifying an audio signal containing speech information, the method comprising:
- receiving the audio signal;
classifying a sound in the audio signal as a vowel class when a first phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define vowels;
classifying the sound in the audio signal as a fricative class when a second phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define consonants; and
classifying the sound in the audio signal based on at least one non-phoneme based model, the at least one non-phoneme based model including at least one model for classifying the sound in the audio signal based on bandwidth.
8 Assignments
0 Petitions
Accused Products
Abstract
An audio classification system classifies sounds in an audio stream as belonging to one of a relatively small number of classes. The audio classification system includes a signal analysis component [301] and a decoder [302]. The decoder [302] includes a number of models [310-316] for performing the audio classifications. In one implementation, the possible classifications include: vowels, fricatives, narrowband, wideband, coughing, gender, and silence. The classified audio may be used to enhance speech recognition of the audio stream.
-
Citations
31 Claims
-
1. A method for classifying an audio signal containing speech information, the method comprising:
-
receiving the audio signal; classifying a sound in the audio signal as a vowel class when a first phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define vowels; classifying the sound in the audio signal as a fricative class when a second phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define consonants; and classifying the sound in the audio signal based on at least one non-phoneme based model, the at least one non-phoneme based model including at least one model for classifying the sound in the audio signal based on bandwidth. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of training audio classification models, the method comprising:
-
receiving a training audio signal; receiving phoneme classes corresponding to the training audio signal; training a first Hidden Markov Model (HMM), based on the training audio signal and the phoneme classes, to classify speech as belonging to a vowel class when the first HMM determines that the speech corresponds to a sound represented by a set of phonemes that define vowels; training a second HMM, based on the training audio signal and the phoneme classes, to classify speech as belonging to a fricative class when the second HMM determines that the speech corresponds to a sound represented by a set of phonemes that define consonants; and training at least one model to classify the sound based on a bandwidth of the sound. - View Dependent Claims (11, 12, 13, 14, 31)
-
-
15. An audio classification device comprising:
-
a signal analysis component configured to receive an audio signal and process the audio signal by at least one of the converting the audio signal to the frequency domain and generating cepstral features for the audio signal; and a decoder configured to classify portions of the audio signal as belonging to at least one of the plurality of classes, the classes including a first phoneme-based class that applies to the audio signal when a portion of the audio signal corresponds to a sound represented by a set of phonemes that define vowels, a second phoneme-based class that applies to the audio signal when a portion of the audio signal corresponds to a sound represented by a set of phonemes that define consonants, and at least one non-phoneme class; wherein the decoder determines the at least one non-phoneme class using models that classify the portions of the audio signal based on bandwidth. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A system comprising:
-
an indexer configured to receive input audio data and generate a rich transcription from the audio data, the indexer including; audio classification logic configured to classify the input audio data into at least one of a plurality of broad audio classes, the broad audio classes including a phoneme-based vowel class, a phoneme-based fricative class, a non-phoneme based bandwidth class, and a non-phoneme based gender class, a speech recognition component configured to generate the rich transcription based on the broad audio classes determined by the audio classification logic; a memory system for storing the rich transcription; and a server configured to receive requests for documents and respond to the requests by transmitting one or more of the rich transcriptions that match the requests. - View Dependent Claims (23, 24, 25, 26)
-
-
27. A device comprising:
-
means for classifying a sound in an audio signal as a vowel class when a first phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define vowels; means for classifying the sound in the audio signal as a fricative class when a second phoneme-based model determines that the sound corresponds to a sound represented by a set of phonemes that define consonants; and means for classifying the sound in the audio signal based on at least one non-phoneme based model, the at least one non-phoneme based model including at least one model for classifying the sound in the audio signal based on bandwidth. - View Dependent Claims (28, 29, 30)
-
Specification