Method of speech recognition using multimodal variational inference with switching state space models

US 20050159951A1
Filed: 01/20/2004
Published: 07/21/2005
Est. Priority Date: 01/20/2004
Status: Active Grant

First Claim

Patent Images

1. A method of setting posterior probability parameters for a switching state space model, the posterior probability providing the likelihood of a set of hidden states for a sequence of frames based upon input values associated with the sequence of frames, the method comprising:

defining a window containing at least two but fewer than all of the frames in the sequence of frames;

determining a separate posterior probability parameter for each frame in the window;

shifting the window so that it includes at least one subsequent frame in the sequence of frames to form a shifted window; and

determining a separate posterior probability parameter for each frame in the shifted window.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of efficiently setting posterior probability parameters for a switching state space model begins by defining a window containing at least two but fewer than all of the frames. A separate posterior probability parameter is determined for each frame in the window. The window is then shifted sequentially from left to right in time so that it includes one or more subsequent frames in the sequence of frames. A separate posterior probability parameter is then determined for each frame in the shifted window. This method closely approximates a more rigorous solution but saves computational cost by two to three orders of magnitude. Further, a method of determining the optimal discrete state sequence in the switching state space model is invented that directly exploits the observation vector on a frame-by-frame basis and operates from left to right in time.

Citations

17 Claims

1. A method of setting posterior probability parameters for a switching state space model, the posterior probability providing the likelihood of a set of hidden states for a sequence of frames based upon input values associated with the sequence of frames, the method comprising:
- defining a window containing at least two but fewer than all of the frames in the sequence of frames;
  
  determining a separate posterior probability parameter for each frame in the window;
  
  shifting the window so that it includes at least one subsequent frame in the sequence of frames to form a shifted window; and
  
  determining a separate posterior probability parameter for each frame in the shifted window.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein the shifted window includes at least one frame that was present in the window before shifting.
  - 3. The method of claim 1 wherein determining a separate posterior probability parameter for each frame in a window comprises solving a set of simultaneous equations for all of the frames in the window.
  - 4. The method of claim 3 wherein the hidden states are continuous.
  - 5. The method of claim 4 wherein determining a separate posterior probability parameter for each frame further comprises determining a separate posterior probability parameter for each of a set of discrete hidden states that are different from the continuous hidden states.
  - 6. The method of claim 4 wherein the posterior probability provides the probability of a continuous hidden state given a discrete hidden state and an input value.
  - 7. The method of claim 5 further comprising before shifting the window, using the posterior probability parameter determined for a frame to generate a path score for entering a discrete hidden state during the frame.
  - 8. The method of claim 7 wherein generating a path score comprises generating a path score as part of a Viterbi decoder.

9. A method of decoding a speech signal to identify a sequence of phonetic units, the method comprising:
- storing model parameters for a switching state space model in which there are discrete hidden states and continuous hidden states, the continuous hidden states being dependent on the discrete hidden states, converting the speech signal into a set of observation vectors, each observation vector associated with a separate frame of the speech signal;
  
  for each frame of the speech signal, determining a path score for at least one path into each discrete hidden state in the frame, ;
  
  using the path score to select a single path into each discrete hidden state of the frame.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The method of claim 9 wherein the discrete hidden states represent phonetic units.
  - 11. The method of claim 9 wherein determining a path score comprises determining a path score based on a posterior probability parameter that describes the probability of a continuous hidden state given a discrete hidden state and an observation vector.
  - 12. The method of claim 11 further comprising determining a posterior probability parameter for a discrete hidden state in the current frame.
  - 13. The method of claim 12 wherein determining a posterior probability parameter comprises defining a window of frames that contains fewer than all of the frames of the speech signal.
  - 14. The method of claim 13 further comprising determining a separate posterior probability parameter for each discrete hidden state in each frame in the window by solving a set of simultaneous equations.
  - 15. The method of claim 9 further comprising determining a path score for each path into a discrete hidden state in the current frame from the set of discrete hidden states in a previous frame.
  - 16. The method of claim 15 further comprising determining path scores for each discrete hidden state in the current frame.
  - 17. The method of claim 16 further comprising pruning at least one selected path into a state so that the path is no longer considered as part of a possible path through a sequence of discrete hidden states.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Deng, Li, Lee, Leo, Attias, Hagai

Granted Patent

US 7,480,615 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/240
CPC Class Codes

G10L 15/14 using statistical models, e...

G10L 2015/0638 Interactive procedures

Method of speech recognition using multimodal variational inference with switching state space models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method of speech recognition using multimodal variational inference with switching state space models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links