Audio-visual feature fusion and support vector machine useful for continuous speech recognition

US 7,472,063 B2
Filed: 12/19/2002
Issued: 12/30/2008
Est. Priority Date: 12/19/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method for recognizing speech by fusing audio and visual features, comprisinggenerating an audio vector representing detected audio data of a speech utterance,detecting a face in a video data stream linked to the audio data of the speech utterance,applying a cascade of linear support vector machine classifiers to the detected face to locate a mouth region,generating vector data for the mouth region,training a hidden Markov model (HMM) by fusing audio and visual vector data with the HMM, andrecognizing an input speech by extracting audio and visual features and by comparing the extracted audio and visual features with HMMs obtained at least in part through audio and visual fusion.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.

Citations

20 Claims

1. A method for recognizing speech by fusing audio and visual features, comprisinggenerating an audio vector representing detected audio data of a speech utterance,detecting a face in a video data stream linked to the audio data of the speech utterance,applying a cascade of linear support vector machine classifiers to the detected face to locate a mouth region,generating vector data for the mouth region,training a hidden Markov model (HMM) by fusing audio and visual vector data with the HMM, andrecognizing an input speech by extracting audio and visual features and by comparing the extracted audio and visual features with HMMs obtained at least in part through audio and visual fusion.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the linear support vector machine classifiers are trained using positive and negative training sets.
  - 3. The method of claim 1, further comprising use of a mouth and beard classifier.
  - 4. The method of claim 1, wherein a cascade algorithm operates on a multidimensional mouth feature space obtained through principal component analysis decomposition.
  - 5. The method of claim 1, further comprising mirroring, rotating, rescaling, and normalizing the mouth region.
  - 6. The method of claim 1, further comprising applying a smoothing filter to the mouth region.
  - 7. The method of claim 1, further comprising fusing audio and visual vector data using a two stream coupled hidden Markov model.
  - 8. The method of claim 1, further comprising using asynchronous audio and video data.
  - 9. The method of claim 1, further comprising applying a graph decoder and a Viterbi beam search.
  - 10. The method of claim 1, further comprising modifying audio and video probabilities as a function of noise.

11. An article comprising a computer readable medium to store computer executable instructions, the instructions defined to cause a computer to recognize speech by fusing audio and visual features via operations including:
- generating an audio vector representing detected audio data of a speech utterance,detecting a face in a video data stream linked to the audio data of the speech utterance,applying a cascade of linear support vector machine classifiers to the detected face to locate a mouth region,generating vector data for the mouth region,training a hidden Markov model (HMM) by fusing audio and visual vector data with the HMM, andrecognizing an input speech by extracting audio and visual features and by comparing the extracted audio and visual features with HMMs obtained at least in part through audio and visual fusion.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The article of claim 11, wherein the linear support vector machine classifiers are trained using positive and negative training sets.
  - 13. The article of claim 11, wherein the operations further comprise use of a mouth and beard classifier.
  - 14. The article of claim 11, wherein a cascade algorithm operates on a multidimensional mouth feature space obtained through principal component analysis decomposition.
  - 15. The article of claim 11, wherein the operations further comprise mirroring, rotating, rescaling, and normalizing the mouth region.
  - 16. The article of claim 11, wherein the operations further comprise applying a smoothing filter to the mouth region.
  - 17. The article of claim 11, wherein the operations further comprise fusing audio and visual vector data using a two stream coupled hidden Markov model.
  - 18. The article of claim 11, wherein the operations further comprise using asynchronous audio and video data.
  - 19. The article of claim 11, wherein the operations further comprise applying a graph decoder and a Viterbi beam search.
  - 20. The article of claim 11, wherein the operations further comprise modifying audio and video probabilities as a function of noise.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Liang, Luhong, Nefian, Ara V., Zhao, Yibao, Liu, Xiaoxing, Pi, Xiaobo
Primary Examiner(s)
Smits; Talivaldis Ivars

Application Number

US10/326,368
Publication Number

US 20040122675A1
Time in Patent Office

2,203 Days
Field of Search

None
US Class Current

704/256.1
CPC Class Codes

G06F 18/256 of results relating to diff...

G10L 15/25 using position of the lips,...

Audio-visual feature fusion and support vector machine useful for continuous speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Audio-visual feature fusion and support vector machine useful for continuous speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links