SYSTEM AND METHOD FOR ENHANCING SPEECH ACTIVITY DETECTION USING FACIAL FEATURE DETECTION
First Claim
1. A method comprising:
- monitoring, via a processor of a computing device, an image feed of a user interacting with the computing device;
identifying an audio start event in the image feed based on face detection of the user looking at the computing device; and
based on the audio start event, initiating processing of a received audio signal.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing audio. A system configured to practice the method monitors, via a processor of a computing device, an image feed of a user interacting with the computing device and identifies an audio start event in the image feed based on face detection of the user looking at the computing device or a specific region of the computing device. The image feed can be a video stream. The audio start event can be based on a head size, orientation or distance from the computing device, eye position or direction, device orientation, mouth movement, and/or other user features. Then the system initiates processing of a received audio signal based on the audio start event. The system can also identify an audio end event in the image feed and end processing of the received audio signal based on the end event.
101 Citations
20 Claims
-
1. A method comprising:
-
monitoring, via a processor of a computing device, an image feed of a user interacting with the computing device; identifying an audio start event in the image feed based on face detection of the user looking at the computing device; and based on the audio start event, initiating processing of a received audio signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
a processor; a display; a microphone; a camera; and a memory storing instructions for controlling the processor to perform steps comprising; monitoring an image feed of a user received via the camera; identifying an audio start event in the image feed based on face detection of the user looking at the display; and initiating processing of an audio signal received via the microphone based on the audio start event. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
-
monitoring an image feed of a user interacting with the computing device; identifying an audio start event in the image feed based on face detection of the user looking at the computing device; and initiating processing of a received audio signal based on the audio start event. - View Dependent Claims (18, 19, 20)
-
Specification