System and method for microphone activation using visual speech cues
First Claim
1. A system for activating a microphone based on visual speech cues, comprising,a feature tracker coupled to an image acquisition device, the feature tracker for tracking features in an image of a user;
- a region of interest extractor coupled to the feature tracker, the region of interest extractor for extracting a region of interest from the image of the user, wherein the region of interest comprises a mouth portion of the image of the user;
a visual speech activity detector coupled to the region of interest extractor for measuring changes in the region of interest to determine if a visual speech cue has been generated by the user; and
a microphone turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector.
5 Assignments
0 Petitions
Accused Products
Abstract
A system for activating a microphone based on visual speech cues, in accordance with the invention, includes a feature tracker coupled to an image acquisition device. The feature tracker tracks features in an image of a user. A region of interest extractor is coupled to the feature tracker. The region of interest extractor extracts a region of interest from the image of the user. A visual speech activity detector is coupled to the region of interest extractor and measures changes in the region of interest to determine if a visual speech cue has been generated by the user. A microphone is turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector. Methods for activating a microphone based on visual speech cues are also included.
-
Citations
32 Claims
-
1. A system for activating a microphone based on visual speech cues, comprising,
a feature tracker coupled to an image acquisition device, the feature tracker for tracking features in an image of a user; -
a region of interest extractor coupled to the feature tracker, the region of interest extractor for extracting a region of interest from the image of the user, wherein the region of interest comprises a mouth portion of the image of the user;
a visual speech activity detector coupled to the region of interest extractor for measuring changes in the region of interest to determine if a visual speech cue has been generated by the user; and
a microphone turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for activating a microphone based on visual speech cues, comprising,
a camera for acquiring images of a user; -
an image difference operator coupled to the camera for receiving image data from the camera and detecting whether a change in the image has occurred;
a feature tracker coupled to the image difference operator, the feature tracker being activated if a change in the image is detected by the image difference operator to track facial features in an image of a user;
a region of interest extractor coupled to the feature tracker and the image difference operator, the region of interest extractor for extracting a region of interest from the image of the user;
a visual speech activity detector coupled to the region of interest extractor for measuring changes in the region of interest to determine if a visual speech cue has been generated by the user; and
a microphone turned on by the visual speech activity detector when a visual speech cue has been determined by the visual speech activity detector. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for activating a microphone based on visual speech cues, comprising the steps of:
-
acquiring a current image of a face;
updating face parameters when the current image of the face indicates a change from a previous image of the face;
extracting a region of interest from the current image as dictated by the face parameters;
computing visual speech activity based on the extracted region of interest; and
activating a microphone for inputting speech when the visual speech activity has been determined. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
determining a standard deviation between regions of interest in the current image and the previous image; and
comparing the standard deviation to a threshold valve such that if the threshold value is exceeded, visual speech activity is determined.
-
-
28. The method as recited in claim 22, wherein the visual speech activity is computed in feature vector space.
-
29. The method as recited in claim 22, wherein the step of computing visual speech activity includes:
-
determining a feature vector based on the region of interest in the current image; and
classifying the feature vector to determine if visual speech activity is present.
-
-
30. The method as recited in claim 29, wherein the feature vector is determined by a discrete wavelet transform.
-
31. The method as recited in claim 22, wherein the step of activating a microphone for inputting speech when the visual speech activity has been determined includes:
-
marking an event when the visual speech activity is determined; and
activating the microphone in accordance with the event.
-
-
32. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps as recited in claim 22.
Specification