System and method for classifying a speech signal
First Claim
1. A method of classifying a speech signal within a likely speech signal class of a plurality of speech signal classes corresponding to a plurality of stochastic models, each stochastic model including a plurality of states having state transition and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of the speech signal, the method comprising:
- extracting a frame sequence from the speech signal;
determining a state sequence for each stochastic model, each state sequence corresponding to the frame sequence and having full state segmentation wherein each state corresponds to at least one frame;
determining a likely stochastic model of the plurality of stochastic models based on the state transition and output probabilities associated with the state sequences;
determining representative frames for the state sequence of the likely stochastic model to provide speech signal time normalization; and
generating an output signal indicative of the likely speech signal class based on a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes, and further based on the likely stochastic model.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method for classifying a speech signal within a likely speech signal class of a plurality of speech signal classes are provided. Stochastic models include a plurality of states having state transitions and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of a speech signal. The method includes extracting a frame sequence, and determining a state sequence for each stochastic model with each state sequence having full state segmentation. Representative frames are determined to provide speech signal time normalization. A likely speech signal class is determined from a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes. An output signal is generated based on the likely stochastic model.
-
Citations
19 Claims
-
1. A method of classifying a speech signal within a likely speech signal class of a plurality of speech signal classes corresponding to a plurality of stochastic models, each stochastic model including a plurality of states having state transition and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of the speech signal, the method comprising:
-
extracting a frame sequence from the speech signal; determining a state sequence for each stochastic model, each state sequence corresponding to the frame sequence and having full state segmentation wherein each state corresponds to at least one frame; determining a likely stochastic model of the plurality of stochastic models based on the state transition and output probabilities associated with the state sequences; determining representative frames for the state sequence of the likely stochastic model to provide speech signal time normalization; and generating an output signal indicative of the likely speech signal class based on a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes, and further based on the likely stochastic model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for classifying a speech signal within a likely speech signal class of a plurality of speech signal classes, the system comprising:
-
stochastic model logic for representing a plurality of stochastic models, each stochastic model including a plurality of states having state transition and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of the speech signal; frame logic for extracting a frame sequence from the speech signal; segmentation logic for determining a state sequence for each stochastic model, each state sequence corresponding to the frame sequence and having full state segmentation wherein each state corresponds to at least one frame; model selection logic for determining a likely stochastic model of the plurality of stochastic models based on the state transition and output probabilities associated with the state sequences; time normalization logic for determining representative frames for the state sequence of the likely stochastic model to provide speech signal time normalization; a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes; and processing logic for generating an output signal indicative of the likely speech signal class based on the neural network outputs, and further based on the likely stochastic model. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
Specification