Visual feature extraction procedure useful for audiovisual continuous speech recognition
First Claim
1. A speech recognition method comprising generation of an audio vector representing detected audio data, detection of a face in a video data stream linked to audio data, discriminating a mouth region in the detected face, applying a linear support vector machine analysis to the mouth region, generating vector data for the mouth region, and fusing audio and visual vector data with a hidden Markov model.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.
70 Citations
20 Claims
-
1. A speech recognition method comprising
generation of an audio vector representing detected audio data, detection of a face in a video data stream linked to audio data, discriminating a mouth region in the detected face, applying a linear support vector machine analysis to the mouth region, generating vector data for the mouth region, and fusing audio and visual vector data with a hidden Markov model.
-
11. An article comprising a computer readable medium to store computer executable instructions, the instructions defined to cause a computer to
detect a face in video data, discriminate a mouth region in the detected face, applying a linear support vector machine analysis to the mouth region, and fuse audio and visual vector data with a hidden Markov model.
Specification