Coupled hidden Markov model for audiovisual speech recognition
First Claim
Patent Images
1. A speech recognition method comprisingobtaining a first data stream of speech data and a second data stream of face image data while a speaker is speaking;
- extracting visual features from the second data stream by masking, resizing, rotating, and normalizing a mouth region in a face image, and by using a two-dimensional discrete cosine transform; and
applying a two stream coupled hidden Markov model to the first and second data streams for speech recognition, wherein a Viterbi-based method and a segmental k-means method are used to determine parameters of the two stream coupled hidden Markov model.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
76 Citations
12 Claims
-
1. A speech recognition method comprising
obtaining a first data stream of speech data and a second data stream of face image data while a speaker is speaking; -
extracting visual features from the second data stream by masking, resizing, rotating, and normalizing a mouth region in a face image, and by using a two-dimensional discrete cosine transform; and applying a two stream coupled hidden Markov model to the first and second data streams for speech recognition, wherein a Viterbi-based method and a segmental k-means method are used to determine parameters of the two stream coupled hidden Markov model. - View Dependent Claims (2, 3, 4)
-
-
5. An article comprising a computer readable medium to store computer executable instructions, the instructions defined to cause a computer to
obtain a first data stream of speech data and a second data stream of face image data while a speaker is speaking; -
extract visual features from the second data stream by masking, resizing, rotating, and normalizing a mouth region in a face image, and by using a two-dimensional discrete cosine transform; and apply a two stream coupled hidden Markov model to the first and second data streams for speech recognition, wherein a Viterbi-based method and a segmental k-means method are used to determine parameters of the two stream coupled hidden Markov model. - View Dependent Claims (6, 7, 8)
-
-
9. A speech recognition system comprising
an audiovisual capture module to capture an audio and a video data set that respectively provide a first data stream of speech data and a second data stream of face image data, the audiovisual capture module extracting visual features from the second data stream by masking, resizing, rotating, and normalizing a mouth region in a face image, and by using a two-dimensional discrete cosine transform; - and
a speech recognition module to apply a two stream coupled hidden Markov model to the first and second data streams for speech recognition, wherein a Viterbi-based method and a segmental k-means method are used to determine parameters of the two stream coupled hidden Markov model. - View Dependent Claims (10, 11, 12)
- and
Specification