Hidden markov model for speech processing with training method
First Claim
1. A computerized method of detecting non-language speech sounds in an audio signal, comprising:
- realizing with a computer a hidden Markov model comprising a plurality of states,wherein at least one of the plurality of states is associated with a non-language speech sound;
isolating a segment of the audio signal;
extracting a first feature set consisting of mel-frequency cepstral coefficients (MFCCs), pitch confidence, cepstral stationarity, and cepstral variance from the segment;
using the first feature set to associate the segment with one or more of the plurality of states of the hidden Markov model; and
classifying the segment as a language speech sound or a non-language speech sound accordingly.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.
-
Citations
27 Claims
-
1. A computerized method of detecting non-language speech sounds in an audio signal, comprising:
-
realizing with a computer a hidden Markov model comprising a plurality of states, wherein at least one of the plurality of states is associated with a non-language speech sound; isolating a segment of the audio signal; extracting a first feature set consisting of mel-frequency cepstral coefficients (MFCCs), pitch confidence, cepstral stationarity, and cepstral variance from the segment; using the first feature set to associate the segment with one or more of the plurality of states of the hidden Markov model; and classifying the segment as a language speech sound or a non-language speech sound accordingly. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computerized method of classifying sounds in an audio signal into language speech sounds and non-language speech sounds, the method comprising:
-
realizing in a computer a hidden Markov model comprising a plurality of hidden Markov states; providing a plurality of classification labels such that there is a one-to-many mapping between each of the plurality of classification labels and the plurality of hidden Markov states; training the hidden Markov model, comprising; providing a plurality of input observation sequences, wherein each of the plurality of input observation sequences comprises a plurality of input observations; providing correct classification labels for each input observation sequence, such that one correct label is assigned to each of the plurality of input observations; determining an observation sequence associated with a plurality of segments isolated from the audio signal, wherein the observation sequence comprises at least one observation for each one of the plurality of segments isolated from the audio signal; and associating the observation sequence with a sequence of hidden Markov states, whereby the one-to-many mapping determines a classification label for each one of the plurality of segments isolated from the audio signal, wherein the plurality of classification labels comprises a label for non-language speech sounds, and wherein the at least one observation consists of;
mel-frequency cepstral coefficients (MFCCs), a pitch confidence measurement, a cepstral stationarity measurement, and a cepstral variance measurement. - View Dependent Claims (24, 25)
-
-
26. An apparatus for detecting non-language speech sounds in an audio signal, comprising:
-
a programmed processor; and computer-readable media storing instructions that, when executed on the programmed processor, provide a hidden Markov model comprising a plurality of states, wherein at least one of the plurality of states is associated with a non-language speech sound; isolate a segment of an audio signal; extract a first feature set consisting of mel-frequency cepstral coefficients (MFCCs), pitch confidence, cepstral stationarity, and cepstral variance from the segment; use the first feature set to associate the segment with one or more of the plurality of states of the hidden Markov mode; and classify the segment as a language speech sound or a non-language speech sound accordingly.
-
-
27. A computerized speech recognition system for detecting non-language speech sounds comprising:
-
a pre-processor adapted to isolate a plurality of segments from an audio signal; a signal processor, the signal processor adapted to extract from each of the plurality of segments isolated from the audio signal the following feature set;
mel-frequency cepstral coefficients (MFCCs), a pitch confidence measurement, a cepstral stationarity measurement, and a cepstral variance measurement;a computerized hidden Markov model comprising a plurality of hidden Markov states and many-to-one mappings between the plurality of hidden Markov states and a plurality of classification labels, at least one of the plurality of classification labels comprising at least one non-language speech sound, whereby the computerized hidden Markov model is adapted to use the feature set to associate each of the plurality of segments with one or more of the plurality of hidden Markov states and to classify each of the plurality of segments as a language speech sound or a non-language speech sound; and a post-processor coupled to the computerized hidden Markov model.
-
Specification