Hidden Markov Model for Speech Processing with Training Method
First Claim
1. A computerized method of detecting non-language speech sounds in an audio signal, comprising:
- realizing with a computer a hidden Markov model comprising a plurality of states, wherein at least one of the plurality of states is associated with a non-language speech sound;
isolating a segment of the audio signal;
providing a first feature set consisting of mel-frequency cepstral coefficients (MFCCs), pitch confidence, cepstral stationarity, and cepstral variance;
determining a first feature from the segment, wherein the first feature belongs to the first feature set;
using the first feature to associate the segment with one or more of the plurality of states of the hidden Markov model; and
classifying the segment as a language speech sound or a non-language speech sound.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.
-
Citations
27 Claims
-
1. A computerized method of detecting non-language speech sounds in an audio signal, comprising:
-
realizing with a computer a hidden Markov model comprising a plurality of states, wherein at least one of the plurality of states is associated with a non-language speech sound; isolating a segment of the audio signal; providing a first feature set consisting of mel-frequency cepstral coefficients (MFCCs), pitch confidence, cepstral stationarity, and cepstral variance; determining a first feature from the segment, wherein the first feature belongs to the first feature set; using the first feature to associate the segment with one or more of the plurality of states of the hidden Markov model; and classifying the segment as a language speech sound or a non-language speech sound. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computerized method of classifying sounds in an audio signal, comprising:
-
realizing in a computer a hidden Markov model comprising a plurality of hidden Markov states; providing a plurality of classification labels such that there is a one-to-many mapping between each of the plurality of classification labels and the plurality of hidden Markov states; determining an observation sequence associated with a plurality of segments isolated from the audio signal, wherein the observation sequence comprises at least one observation for each one of the plurality of segments isolated from the audio signal; and associating the observation sequence with a sequence of hidden Markov states, whereby the one-to-many mapping determines a classification label for the plurality of segments isolated from the audio signal. - View Dependent Claims (24, 25)
-
-
26. An apparatus for detecting non-language speech sounds in an audio signal, comprising:
-
a programmed processor; and computer-readable media storing instructions that, when executed on the programmed processor, provide a hidden Markov model comprising a plurality of states, wherein at least one of the plurality of states is associated with a non-language speech sound; isolate a segment of an audio signal; provide a first feature set consisting of mel-frequency cepstral coefficients (MFCCs), pitch confidence, cepstral stationarity, and cepstral variance; determine a first feature from the segment, wherein the first feature belongs to the first feature set; use the first feature to associate the segment with one or more of the plurality of states of the hidden Markov mode; and classify the segment as a language speech sound or a non-language speech sound.
-
-
27. A computerized speech recognition system for detecting non-language speech sounds comprising:
-
a pre-processor adapted to isolate a plurality of segments from an audio signal; a signal processor, the signal processor adapted to extract from each of the plurality of segments isolated from the audio signal one or more of the following features;
mel-frequency cepstral coefficients (MFCCs), a pitch confidence measurement, a cepstral stationarity measurement, or a cepstral variance measurement;a computerized hidden Markov model comprising a plurality of hidden Markov states and many-to-one mappings between the plurality of hidden Markov states and a plurality of classification labels, at least one of the plurality of classification labels comprising at least one non-language speech sound, whereby the computerized hidden Markov model is adapted to use the one or more features to associate each of the plurality of segments with one or more of the plurality of hidden Markov states and to classify each of the plurality of segments as a language speech sound or a non-language speech sound; and a post-processor coupled to the computerized hidden Markov model.
-
Specification