Method of speech recognition using multimodal variational inference with switching state space models

US 7,480,615 B2
Filed: 01/20/2004
Issued: 01/20/2009
Est. Priority Date: 01/20/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A method of setting posterior probability means for posterior probability distributions in a switching state space model, the posterior probability providing the likelihood of a set of hidden states for a sequence of frames based upon input values associated with the sequence of frames, the method comprising:

inputting a speech signalidentifying input values of a sequence of frames from the speech signaldefining a window containing at least two but fewer than all of the frames in the sequence of frames;

determining a separate posterior probability mean for each frame in the window each posterior probability mean providing a mean value for a continuous hidden state given at least an input value wherein determining a separate posterior probability mean for each frame further comprises determining a separate posterior probability mean;

for each of a set of discrete hidden states that are different from the continuous hidden states;

shifting the window so that it includes at least one subsequent frame in the sequence of frames to form a shifted window; and

determining a separate posterior probability mean for each frame in the shifted window; and

using the posterior probability means to decode a speech signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of efficiently setting posterior probability parameters for a switching state space model begins by defining a window containing at least two but fewer than all of the frames. A separate posterior probability parameter is determined for each frame in the window. The window is then shifted sequentially from left to right in time so that it includes one or more subsequent frames in the sequence of frames. A separate posterior probability parameter is then determined for each frame in the shifted window. This method closely approximates a more rigorous solution but saves computational cost by two to three orders of magnitude. Further, a method of determining the optimal discrete state sequence in the switching state space model is invented that directly exploits the observation vector on a frame-by-frame basis and operates from left to right in time.

18 Citations

View as Search Results

7 Claims

1. A method of setting posterior probability means for posterior probability distributions in a switching state space model, the posterior probability providing the likelihood of a set of hidden states for a sequence of frames based upon input values associated with the sequence of frames, the method comprising:
- inputting a speech signalidentifying input values of a sequence of frames from the speech signaldefining a window containing at least two but fewer than all of the frames in the sequence of frames;
  
  determining a separate posterior probability mean for each frame in the window each posterior probability mean providing a mean value for a continuous hidden state given at least an input value wherein determining a separate posterior probability mean for each frame further comprises determining a separate posterior probability mean;
  
  for each of a set of discrete hidden states that are different from the continuous hidden states;
  
  shifting the window so that it includes at least one subsequent frame in the sequence of frames to form a shifted window; and
  
  determining a separate posterior probability mean for each frame in the shifted window; and
  
  using the posterior probability means to decode a speech signal.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein the shifted window includes at least one frame that was present in the window before shifting.
  - 3. The method of claim 1 wherein determining a separate posterior probability mean for each frame in a window comprises solving a set of simultaneous equations for all of the frames in the window.
  - 4. The method of claim 1 wherein the posterior probability provides the probability of a continuous hidden state given a discrete hidden state and an input value.
  - 5. The method of claim 1 further comprising before shifting the window, using the posterior probability mean determined for the frame to generate a path score for entering a discrete hidden state during the frame.
  - 6. The method of claim 5 wherein generating the path score comprises generating the path score as part of a Viterbi decoder.

7. A method of decoding a speech signal to identify a sequence of phonetic units, the method comprising:
- storing model parameters for a switching state space model in which there are discrete hidden states and continuous hidden states, the continuous hidden states being dependent on the discrete hidden states,converting the speech signal into a set of observation vectors, each observation vector associated with a separate frame of the speech signal;
  
  for each frame of the speech signal;
  
  determining a posterior probability mean for each discrete hidden state, the posterior probability mean defining a mean value for a continuos hidden state given a discrete hidden state and an observation vector wherein determining a posterior probability mean comprises defining a window of frames that contains fewer than all of the frames of the speech signal and determining a separate posterior probability mean for each discrete hidden state in each frame in the window by solving a set of simultaneous equations; and
  
  determining a path score for at least one path into each discrete hidden state in the frame based on the posterior probability mean for the respective discrete hidden state; and
  
  using the path score to select a single path into each discrete hidden state of the frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Lee, Leo, Deng, Li, Attias, Hagai
Primary Examiner(s)
Edouard; Patrick N.
Assistant Examiner(s)
SHAH, PARAS D

Application Number

US10/760,937
Publication Number

US 20050159951A1
Time in Patent Office

1,827 Days
Field of Search

704/236, 704/240, 704/256, 704/243
US Class Current

704/240
CPC Class Codes

G10L 15/14 using statistical models, e...

G10L 2015/0638 Interactive procedures

Method of speech recognition using multimodal variational inference with switching state space models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Method of speech recognition using multimodal variational inference with switching state space models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links