System and method for enhancing speech activity detection using facial feature detection
First Claim
1. A method for detecting use of a mobile computing device, comprising:
- identifying, a processor of a remote computing device, in an image feed from a surveillance camera;
a user; and
the mobile computing device;
identifying, the remote computing device and within the image feed, an interaction between the user and the mobile computing device;
receiving, at the remote computing device and while monitoring the image feed, an audio signal having a first voice signal associated with the user and a second voice signal associated with a non-user;
identifying, at the remote computing device, an audio start event of the first voice signal based on a distance of the user to a screen of the mobile computing device and on mouth movement of the user in the image feed; and
based on the audio start event, initiating processing of the first voice signal by the processor of the remote computing device.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing audio. A system configured to practice the method monitors, via a processor of a computing device, an image feed of a user interacting with the computing device and identifies an audio start event in the image feed based on face detection of the user looking at the computing device or a specific region of the computing device. The image feed can be a video stream. The audio start event can be based on a head size, orientation or distance from the computing device, eye position or direction, device orientation, mouth movement, and/or other user features. Then the system initiates processing of a received audio signal based on the audio start event. The system can also identify an audio end event in the image feed and end processing of the received audio signal based on the end event.
-
Citations
21 Claims
-
1. A method for detecting use of a mobile computing device, comprising:
-
identifying, a processor of a remote computing device, in an image feed from a surveillance camera; a user; and the mobile computing device; identifying, the remote computing device and within the image feed, an interaction between the user and the mobile computing device; receiving, at the remote computing device and while monitoring the image feed, an audio signal having a first voice signal associated with the user and a second voice signal associated with a non-user; identifying, at the remote computing device, an audio start event of the first voice signal based on a distance of the user to a screen of the mobile computing device and on mouth movement of the user in the image feed; and based on the audio start event, initiating processing of the first voice signal by the processor of the remote computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 19)
-
-
11. A system for detecting use of a mobile computing device comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; identifying, by a remote computing device, in an image feed from a surveillance camera; a user; and the mobile computing device; identifying, by the remote computing device and within the image feed, an interaction between the user and the mobile computing device; receiving, at the remote computing device and while monitoring the image feed, an audio signal having a first voice signal associated with the user and a second voice signal associated with a non-user; identifying, at the remote computing device, an audio start event of the first voice signal based on a distance of the user to a screen of the mobile computing device and on mouth movement of the user in the image feed; and based on the audio start event, initiating processing of the first voice signal by the remote computing device. - View Dependent Claims (12, 13, 14, 20)
-
-
15. A computer-readable storage device having instructions stored for detecting use of a mobile computing device which, when executed by a computing device, cause the computing device to perform operations comprising:
-
identifying, a remote computing device, in an image feed from a surveillance camera; a user; and a mobile computing device; identifying, the remote computing device and within the image feed, an interaction between the user and the mobile computing device; receiving, at the remote computing device and while monitoring the image feed, an audio signal having a first voice signal associated with the user and a second voice signal associated with a non-user; identifying, at the remote computing device, an audio start event of the first voice signal based on a distance of the user to a screen of the mobile computing device and on mouth movement of the user in the image feed; and based on the audio start event, initiating processing of the first voice signal by the remote computing device. - View Dependent Claims (16, 17, 18, 21)
-
Specification