Coupled hidden Markov model for audiovisual speech recognition

US 7,165,029 B2
Filed: 05/09/2002
Issued: 01/16/2007
Est. Priority Date: 05/09/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition method comprisingobtaining a first data stream of speech data and a second data stream of face image data while a speaker is speaking;

extracting visual features from the second data stream by masking, resizing, rotating, and normalizing a mouth region in a face image, and by using a two-dimensional discrete cosine transform; and

applying a two stream coupled hidden Markov model to the first and second data streams for speech recognition, wherein a Viterbi-based method and a segmental k-means method are used to determine parameters of the two stream coupled hidden Markov model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.

76 Citations

View as Search Results

12 Claims

1. A speech recognition method comprisingobtaining a first data stream of speech data and a second data stream of face image data while a speaker is speaking;
- extracting visual features from the second data stream by masking, resizing, rotating, and normalizing a mouth region in a face image, and by using a two-dimensional discrete cosine transform; and
  
  applying a two stream coupled hidden Markov model to the first and second data streams for speech recognition, wherein a Viterbi-based method and a segmental k-means method are used to determine parameters of the two stream coupled hidden Markov model.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the audio and video data sets providing the first and second data streams are asynchronous.
  - 3. The method of claim 1, further comprising parallel processing of the first and second data streams.
  - 4. The method of claim 1, wherein extracting visual features comprises using linear discriminant analysis.

5. An article comprising a computer readable medium to store computer executable instructions, the instructions defined to cause a computer toobtain a first data stream of speech data and a second data stream of face image data while a speaker is speaking;
- extract visual features from the second data stream by masking, resizing, rotating, and normalizing a mouth region in a face image, and by using a two-dimensional discrete cosine transform; and
  
  apply a two stream coupled hidden Markov model to the first and second data streams for speech recognition, wherein a Viterbi-based method and a segmental k-means method are used to determine parameters of the two stream coupled hidden Markov model.
- View Dependent Claims (6, 7, 8)
- - 6. The article comprising a computer readable medium to store computer executable instructions of claim 5, wherein the instructions further cause a computer to process asynchronous first and second data streams.
  - 7. The article comprising a computer readable medium to store computer executable instructions of claim 5, wherein the instructions further cause a computer to process in parallel the first and second data streams.
  - 8. The article comprising a computer readable medium to store computer executable instructions of claim 5, wherein the instructions further cause a computer to provide visual feature extraction from the video data set using linear discriminant analysis.

9. A speech recognition system comprisingan audiovisual capture module to capture an audio and a video data set that respectively provide a first data stream of speech data and a second data stream of face image data, the audiovisual capture module extracting visual features from the second data stream by masking, resizing, rotating, and normalizing a mouth region in a face image, and by using a two-dimensional discrete cosine transform;
- anda speech recognition module to apply a two stream coupled hidden Markov model to the first and second data streams for speech recognition, wherein a Viterbi-based method and a segmental k-means method are used to determine parameters of the two stream coupled hidden Markov model.
- View Dependent Claims (10, 11, 12)
- - 10. The speech recognition system of claim 9, further comprising asynchronous audio and video data sets.
  - 11. The speech recognition system of claim 9, further comprising parallel processing of the first and second data streams by the speech recognition module.
  - 12. The speech recognition system of claim 9, wherein the audiovisual capture module is further adapted to extract visual features from the second data stream by using linear discriminant analysis.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Nefian, Ara V.
Primary Examiner(s)
Lerner; Martin

Application Number

US10/142,468
Publication Number

US 20030212557A1
Time in Patent Office

1,713 Days
Field of Search

704/256, 704/256.1, 704/256.2, 704/256.3, 704/256.4, 704/256.5, 704/256.6, 704/256.7, 704/256.8, 704/236, 704/270, 704/271, 382/116, 345/626, 345/649, 345/660
US Class Current

704/236
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/25   using position of the lips,...

Coupled hidden Markov model for audiovisual speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

76 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Coupled hidden Markov model for audiovisual speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

76 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links