Factorial hidden markov model for audiovisual speech recognition
First Claim
1. A speech recognition method for audiovisual data comprising constructing a distributed state representation hidden Markov model for audiovisual data, and providing maximum likelihood training for the distributed state representation hidden Markov model to identify words.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream factorial hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
-
Citations
27 Claims
-
1. A speech recognition method for audiovisual data comprising
constructing a distributed state representation hidden Markov model for audiovisual data, and providing maximum likelihood training for the distributed state representation hidden Markov model to identify words.
-
7. A speech recognition method comprising
using an audio and a video data set that respectively provide a first data stream of speech data and a second data stream of face image data, and applying a two stream factorial hidden Markov model to the first and second data streams for speech recognition.
-
14. An article comprising a computer readable medium to store computer executable instructions, the instructions defined to cause a computer to
use an audio and a video data set that respectively provide a first data stream of speech data and a second data stream of face image data, and apply a two stream factorial hidden Markov model to the first and second data streams for speech recognition.
-
21. A speech recognition system comprising
an audiovisual capture module to capture an audio and a video data set that respectively provide a first data stream of speech data and a second data stream of face image data, and a speech recognition module that applies a two stream factorial hidden Markov model to the first and second data streams for speech recognition.
Specification