System and method for microphone activation using visual speech cues

US 6,754,373 B1
Filed: 07/14/2000
Issued: 06/22/2004
Est. Priority Date: 07/14/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A system for activating a microphone based on visual speech cues, comprising,a feature tracker coupled to an image acquisition device, the feature tracker for tracking features in an image of a user;

a region of interest extractor coupled to the feature tracker, the region of interest extractor for extracting a region of interest from the image of the user, wherein the region of interest comprises a mouth portion of the image of the user;

a visual speech activity detector coupled to the region of interest extractor for measuring changes in the region of interest to determine if a visual speech cue has been generated by the user; and

a microphone turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for activating a microphone based on visual speech cues, in accordance with the invention, includes a feature tracker coupled to an image acquisition device. The feature tracker tracks features in an image of a user. A region of interest extractor is coupled to the feature tracker. The region of interest extractor extracts a region of interest from the image of the user. A visual speech activity detector is coupled to the region of interest extractor and measures changes in the region of interest to determine if a visual speech cue has been generated by the user. A microphone is turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector. Methods for activating a microphone based on visual speech cues are also included.

Citations

32 Claims

1. A system for activating a microphone based on visual speech cues, comprising,a feature tracker coupled to an image acquisition device, the feature tracker for tracking features in an image of a user;
- a region of interest extractor coupled to the feature tracker, the region of interest extractor for extracting a region of interest from the image of the user, wherein the region of interest comprises a mouth portion of the image of the user;
  
  a visual speech activity detector coupled to the region of interest extractor for measuring changes in the region of interest to determine if a visual speech cue has been generated by the user; and
  
  a microphone turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system as recited in claim 1, wherein the feature tracker tracks facial features of the user, the feature tracker including a feature detector for detecting facial features of the user.
  - 3. The system as recited in claim 1, wherein the visual speech cue includes movement between successive images of one of a mouth region and eyelids of the user.
  - 4. The system as recited in claim 1, wherein the visual speech cue is determined in image space.
  - 5. The system as recited in claim 1, wherein the visual speech activity detector includes a threshold value such that the visual speech cue is determined by a standard deviation calculation between regions of interest in successive images which exceeds the threshold value.
  - 6. The system as recited in claim 1, wherein the visual speech cue is determined in feature vector space.
  - 7. The system as recited in claim 1, wherein the visual speech activity detector provides a feature vector describing the extracted region of interest and includes a classifier for classifying the feature vector as a visual speech cue.
  - 8. The system as recited in claim 7, wherein the feature vector is determined by a discrete wavelet transform.
  - 9. The system as recited in claim 7, wherein the classifier includes a Guassian mixture model classifier.
  - 10. The system as recited in claim 1, further comprising an image difference operator coupled to the image acquisition device for receiving image data and detecting whether an image has changed.

11. A system for activating a microphone based on visual speech cues, comprising,a camera for acquiring images of a user;
- an image difference operator coupled to the camera for receiving image data from the camera and detecting whether a change in the image has occurred;
  
  a feature tracker coupled to the image difference operator, the feature tracker being activated if a change in the image is detected by the image difference operator to track facial features in an image of a user;
  
  a region of interest extractor coupled to the feature tracker and the image difference operator, the region of interest extractor for extracting a region of interest from the image of the user;
  
  a visual speech activity detector coupled to the region of interest extractor for measuring changes in the region of interest to determine if a visual speech cue has been generated by the user; and
  
  a microphone turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 12. The system as recited in claim 11, wherein the feature tracker tracks facial features of the user, the feature tracker including a feature detector for detecting facial features of the user.
  - 13. The system as recited in claim 11, wherein the region of interest extractor extracts a mouth portion of the image of the user.
  - 14. The system as recited in claim 11, wherein the visual speech cue includes movement between successive images of one of a mouth region and eyelids of the user.
  - 15. The system as recited in claim 11, wherein the visual speech cue is determined in image space.
  - 16. The system as recited in claim 11, wherein the visual speech activity detector includes a threshold value such that the visual speech cue is determined by a standard deviation calculation between regions of interest in successive images which exceeds the threshhold value.
  - 17. The system as recited in claim 11, wherein the visual speech cue is determined in feature vector space.
  - 18. The system as recited in claim 11, wherein the visual speech activity detector provides a feature vector describing the extracted region of interest and includes a classifier for classifying the feature vector as a visual speech cue.
  - 19. The system as recited in claim 18, wherein the feature vector is determined by a discrete wavelet transform.
  - 20. The system as recited in claim 18, wherein the classifier includes a Guassian mixture model classifier.
  - 21. The system as recited in claim 11, further comprising a microphone logic circuit for turning the microphone on when the visual speech cue is determined and turning the microphone off when no speech is determined.

22. A method for activating a microphone based on visual speech cues, comprising the steps of:
- acquiring a current image of a face;
  
  updating face parameters when the current image of the face indicates a change from a previous image of the face;
  
  extracting a region of interest from the current image as dictated by the face parameters;
  
  computing visual speech activity based on the extracted region of interest; and
  
  activating a microphone for inputting speech when the visual speech activity has been determined.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 23. The method as recited in claim 22, wherein the step of updating face parameters includes the step of invoking a feature tracker to detect and track facial features of the user.
  - 24. The method as recited in claim 22, wherein the region of interest includes a mouth portion of the image of the user.
  - 25. The method as recited in claim 22, wherein the step of computing visual speech activity includes calculating movement between successive images of one of a mouth region and eyelids of the user.
  - 26. The method as recited in claim 22, wherein the visual speech activity is computed in image space.
  - 27. The method as recited in claim 22, wherein the step of computing visual speech activity includes:
28. The method as recited in claim 22, wherein the visual speech activity is computed in feature vector space.
29. The method as recited in claim 22, wherein the step of computing visual speech activity includes:
- determining a feature vector based on the region of interest in the current image; and
  
  classifying the feature vector to determine if visual speech activity is present.
30. The method as recited in claim 29, wherein the feature vector is determined by a discrete wavelet transform.
31. The method as recited in claim 22, wherein the step of activating a microphone for inputting speech when the visual speech activity has been determined includes:
- marking an event when the visual speech activity is determined; and
  
  activating the microphone in accordance with the event.
32. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps as recited in claim 22.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Uniloc 2017 LLC (FIG LLC (d/b/a Fortress Investment Group LLC))
Original Assignee
International Business Machines Corporation
Inventors
Iyengar, Giridharan R., Potamianos, Gerasimos, Neti, Chalapathy V., de Cuetos, Philippe
Primary Examiner(s)
Mehta, Bhavesh M.
Assistant Examiner(s)
CARTER, AARON W

Application Number

US09/616,229
Time in Patent Office

1,439 Days
Field of Search

382/118, 382/190-208, 704/275, 704/235, 704/271
US Class Current

382/118
CPC Class Codes

G06V 40/20   Movements or behaviour, e.g...

G10L 15/24   Speech recognition using no...

G10L 25/78   Detection of presence or ab...

System and method for microphone activation using visual speech cues

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for microphone activation using visual speech cues

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links