Visual feature extraction procedure useful for audiovisual continuous speech recognition

US 20040122675A1
Filed: 12/19/2002
Published: 06/24/2004
Est. Priority Date: 12/19/2002
Status: Active Grant

First Claim

Patent Images

1. A speech recognition method comprising generation of an audio vector representing detected audio data, detection of a face in a video data stream linked to audio data, discriminating a mouth region in the detected face, applying a linear support vector machine analysis to the mouth region, generating vector data for the mouth region, and fusing audio and visual vector data with a hidden Markov model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.

70 Citations

View as Search Results

20 Claims

1. A speech recognition method comprising generation of an audio vector representing detected audio data, detection of a face in a video data stream linked to audio data, discriminating a mouth region in the detected face, applying a linear support vector machine analysis to the mouth region, generating vector data for the mouth region, and fusing audio and visual vector data with a hidden Markov model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The speech recognition method of claim 1, wherein the linear support vector machine is trained using positive and negative training sets.
  - 3. The speech recognition method of claim 1, further comprising use of a mouth and beard classifier.
  - 4. The speech recognition method of claim 1, wherein a cascade algorithm operates on a multidimensional mouth feature space obtained through principal component analysis decomposition.
  - 5. The speech recognition method of claim 1, further comprising mirroring, rotating, and rescaling normalizing the mouth region.
  - 6. The speech recognition method of claim 1, further comprising application of a smoothing filter to the mouth region.
  - 7. The speech recognition method of claim 1, further comprising fusing audio and visual vector data using a two stream coupled hidden Markov model.
  - 8. The speech recognition method of claim 1, further comprising use of asynchronous audio and video data.
  - 9. The speech recognition method of claim 1, further comprising application of a graph decoder:
    - and a Viterbi beam search.
  - 10. The speech recognition method of claim 1, further comprising modification of audio and video probabilities as a function of noise.

11. An article comprising a computer readable medium to store computer executable instructions, the instructions defined to cause a computer to detect a face in video data, discriminate a mouth region in the detected face, applying a linear support vector machine analysis to the mouth region, and fuse audio and visual vector data with a hidden Markov model.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The article comprising a computer readable medium to store computer executable instructions of claim 11, wherein a linear support vector machine is trained using positive and negative training sets.
  - 13. The article comprising a computer readable medium to store computer executable instructions of claim 11, further comprising use of a mouth and beard classifier.
  - 14. The article comprising a computer readable medium to store computer executable instructions of claim 11, wherein a cascade algorithm operates on a multidimensional mouth feature space obtained through principal component analysis decomposition.
  - 15. The article comprising a computer readable medium to store computer executable instructions of claim 11, further comprising mirroring, rotating, and resealing normalizing the mouth region.
  - 16. The article comprising a computer readable medium to store computer executable instructions of claim 11, further comprising application of a smoothing filter to the mouth region.
  - 17. The article comprising a computer readable medium to store computer executable instructions of claim 11, further comprising fusing audio and visual vector data using a two stream coupled hidden Markov model.
  - 18. The article comprising a computer readable medium to store computer executable instructions of claim 11, further comprising use of asynchronous audio and video data.
  - 19. The article comprising a computer readable medium to store computer executable instructions of claim 11, further comprising application of a graph decoder and a Viterbi beam search.
  - 20. The article comprising a computer readable medium to store computer executable instructions of claim 11, further comprising modification of audio and video probabilities as a function of noise.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Liang, Luhong, Zhao, Yibao, Liu, Xiaoxing, Nefian, Ara Victor, Pi, Xiaobo

Granted Patent

US 7,472,063 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/276
CPC Class Codes

G06F 18/256 of results relating to diff...

G10L 15/25 using position of the lips,...

Visual feature extraction procedure useful for audiovisual continuous speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

70 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Visual feature extraction procedure useful for audiovisual continuous speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

70 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links