Method of speech recognition using multimodal variational inference with switching state space models
First Claim
1. A method of setting posterior probability parameters for a switching state space model, the posterior probability providing the likelihood of a set of hidden states for a sequence of frames based upon input values associated with the sequence of frames, the method comprising:
- defining a window containing at least two but fewer than all of the frames in the sequence of frames;
determining a separate posterior probability parameter for each frame in the window;
shifting the window so that it includes at least one subsequent frame in the sequence of frames to form a shifted window; and
determining a separate posterior probability parameter for each frame in the shifted window.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of efficiently setting posterior probability parameters for a switching state space model begins by defining a window containing at least two but fewer than all of the frames. A separate posterior probability parameter is determined for each frame in the window. The window is then shifted sequentially from left to right in time so that it includes one or more subsequent frames in the sequence of frames. A separate posterior probability parameter is then determined for each frame in the shifted window. This method closely approximates a more rigorous solution but saves computational cost by two to three orders of magnitude. Further, a method of determining the optimal discrete state sequence in the switching state space model is invented that directly exploits the observation vector on a frame-by-frame basis and operates from left to right in time.
-
Citations
17 Claims
-
1. A method of setting posterior probability parameters for a switching state space model, the posterior probability providing the likelihood of a set of hidden states for a sequence of frames based upon input values associated with the sequence of frames, the method comprising:
-
defining a window containing at least two but fewer than all of the frames in the sequence of frames;
determining a separate posterior probability parameter for each frame in the window;
shifting the window so that it includes at least one subsequent frame in the sequence of frames to form a shifted window; and
determining a separate posterior probability parameter for each frame in the shifted window. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of decoding a speech signal to identify a sequence of phonetic units, the method comprising:
-
storing model parameters for a switching state space model in which there are discrete hidden states and continuous hidden states, the continuous hidden states being dependent on the discrete hidden states, converting the speech signal into a set of observation vectors, each observation vector associated with a separate frame of the speech signal;
for each frame of the speech signal, determining a path score for at least one path into each discrete hidden state in the frame, ;
using the path score to select a single path into each discrete hidden state of the frame. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
Specification