System and method for classifying a speech signal

US 5,924,066 A
Filed: 09/26/1997
Issued: 07/13/1999
Est. Priority Date: 09/26/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of classifying a speech signal within a likely speech signal class of a plurality of speech signal classes corresponding to a plurality of stochastic models, each stochastic model including a plurality of states having state transition and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of the speech signal, the method comprising:

extracting a frame sequence from the speech signal;

determining a state sequence for each stochastic model, each state sequence corresponding to the frame sequence and having full state segmentation wherein each state corresponds to at least one frame;

determining a likely stochastic model of the plurality of stochastic models based on the state transition and output probabilities associated with the state sequences;

determining representative frames for the state sequence of the likely stochastic model to provide speech signal time normalization; and

generating an output signal indicative of the likely speech signal class based on a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes, and further based on the likely stochastic model.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for classifying a speech signal within a likely speech signal class of a plurality of speech signal classes are provided. Stochastic models include a plurality of states having state transitions and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of a speech signal. The method includes extracting a frame sequence, and determining a state sequence for each stochastic model with each state sequence having full state segmentation. Representative frames are determined to provide speech signal time normalization. A likely speech signal class is determined from a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes. An output signal is generated based on the likely stochastic model.

Citations

19 Claims

1. A method of classifying a speech signal within a likely speech signal class of a plurality of speech signal classes corresponding to a plurality of stochastic models, each stochastic model including a plurality of states having state transition and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of the speech signal, the method comprising:
- extracting a frame sequence from the speech signal;
  
  determining a state sequence for each stochastic model, each state sequence corresponding to the frame sequence and having full state segmentation wherein each state corresponds to at least one frame;
  
  determining a likely stochastic model of the plurality of stochastic models based on the state transition and output probabilities associated with the state sequences;
  
  determining representative frames for the state sequence of the likely stochastic model to provide speech signal time normalization; and
  
  generating an output signal indicative of the likely speech signal class based on a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes, and further based on the likely stochastic model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein the stochastic models are hidden Markov models.
  - 3. The method of claim 2 wherein the hidden Markov models are first order hidden Markov models.
  - 4. The method of claim 3 wherein the hidden Markov models have left-to-right topology.
  - 5. The method of claim 2 wherein determining a state sequence for each stochastic model further comprises:
    - determining an optimal state sequence corresponding to the frame sequence; and
      
      when the optimal state sequence does not have full state segmentation, determining sub-optimal state sequences corresponding to the frame sequence to determine a state sequence corresponding to the frame sequence that has full state segmentation.
  - 6. The method of claim 1 wherein the output probabilities are defined by probability distribution functions.
  - 7. The method of claim 1 wherein determining representative frames further comprises:
    - determining the representative frame for each state of the state sequence of the likely stochastic model, wherein each component of the representative frame is a mean of corresponding components of the at least one frame for each state.
  - 8. The method of claim 1 wherein determining representative frames further comprises:
    - determining a representative frame for each state of the state sequence of the likely stochastic model; and
      
      determining at least one additional representative frame based on durational probabilities of the states.
  - 9. The method of claim 1 wherein determining a likely stochastic model further comprises:
    - determining several likely stochastic models, wherein representative frames are determined for each likely stochastic model.
  - 10. The method of claim 1 wherein the neural network further includes a plurality of neural networks, each neural network of the plurality of neural networks being configured based on a corresponding pair of the stochastic models for interclass distinction between a corresponding pair of speech signal classes, and wherein generating an output signal further comprises:
    - generating an output signal indicative of the likely speech signal class based on outputs of the plurality of neural networks and the likely stochastic model.

11. A system for classifying a speech signal within a likely speech signal class of a plurality of speech signal classes, the system comprising:
- stochastic model logic for representing a plurality of stochastic models, each stochastic model including a plurality of states having state transition and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of the speech signal;
  
  frame logic for extracting a frame sequence from the speech signal;
  
  segmentation logic for determining a state sequence for each stochastic model, each state sequence corresponding to the frame sequence and having full state segmentation wherein each state corresponds to at least one frame;
  
  model selection logic for determining a likely stochastic model of the plurality of stochastic models based on the state transition and output probabilities associated with the state sequences;
  
  time normalization logic for determining representative frames for the state sequence of the likely stochastic model to provide speech signal time normalization;
  
  a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes; and
  
  processing logic for generating an output signal indicative of the likely speech signal class based on the neural network outputs, and further based on the likely stochastic model.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The system of claim 11 wherein the stochastic models are hidden Markov models.
  - 13. The system of claim 12 wherein the hidden Markov models are first order hidden Markov models.
  - 14. The system of claim 13 wherein the hidden Markov models have left-to-right topology.
  - 15. The system of claim 12 wherein the segmentation logic further comprises:
    - segmentation logic for determining an optimal state sequence corresponding to the frame sequence, and for determining sub-optimal state sequences corresponding to the frame sequence to determine a state sequence corresponding to the frame sequence that has full state segmentation, when the optimal state sequence does not have full state segmentation.
  - 16. The system of claim 11 wherein the output probabilities are defined by probability distribution functions.
  - 17. The system of claim 11 wherein the time normalization logic further comprises:
    - time normalization logic for determining the representative frame for each state of the state sequence of the likely stochastic model, wherein each component of the representative frame is a mean of corresponding components of the at least one frame for each state.
  - 18. The system of claim 11 wherein the time normalization logic further comprises:
    - time normalization logic for determining a representative frame for each state of the state sequence of the likely stochastic model, and for determining at least one additional representative frame based on durational probabilities of the states.
  - 19. The system of claim 11 wherein the model selection logic further comprises:
    - model selection logic for determining several likely stochastic models, wherein representative frames are determined for each likely stochastic model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qwest Communications International Incorporated (Lumen Technologies, Inc.)
Original Assignee
MediaOne, Inc. (Comcast Corporation), U S West Inc.
Inventors
Kundu, Amlan
Primary Examiner(s)
Dorvil, Richemond

Application Number

US08/938,221
Time in Patent Office

655 Days
Field of Search

704/256, 704/232, 704/254, 704/249, 704/242, 704/239, 704/240, 704/245, 704/234, 704/231, 704/246, 704/250, 704/251, 704/255, 704/236
US Class Current

704/232
CPC Class Codes

G10L 15/142 Hidden Markov Models [HMMs]

G10L 15/16 using artificial neural net...

System and method for classifying a speech signal

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for classifying a speech signal

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links