METHOD AND SYSTEM FOR VOICE CAPTURE USING FACE DETECTION IN NOISY ENVIRONMENTS
First Claim
1. An automated method of audio signal acquisition, said method comprising:
- detecting a subject of interest within an environment using computer-implemented face detection procedures applied to image data captured by a camera system;
determining a face direction associated with said subject of interest relative to said camera system within a 3 dimensional space using said image data associated with said subject of interest; and
producing an output audio signal using an audio capture arrangement by focusing an audio beam of said audio capture arrangement in said face direction, wherein said output audio signal enhances audio originating from said subject of interest relative to other audio of said environment.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the present invention are capable of determining a face direction associated with a detected subject (or multiple detected subjects) of interest within a 3D space using face detection procedures, while simultaneously avoiding the pick up of other environmental sounds. In addition, if more than one face is detected, embodiments of the present invention can automatically detect an active speaker based on the recognition of facial movements consistent with the performance of providing audio (e.g., tracking mouth movements) by those subjects whose faces were detected. Once determinations are made regarding face direction of the detected subject, embodiments of the present invention may dynamically adjust the audio acquisition capabilities of the audio capture device (e.g., microphone devices) relative to the location of the detected subject using beamforming techniques for instance. As such, embodiments of the present invention can detect the direction of the “talking object” and guide the audio subsystem to filter out any sound not coming from that direction.
87 Citations
21 Claims
-
1. An automated method of audio signal acquisition, said method comprising:
-
detecting a subject of interest within an environment using computer-implemented face detection procedures applied to image data captured by a camera system; determining a face direction associated with said subject of interest relative to said camera system within a 3 dimensional space using said image data associated with said subject of interest; and producing an output audio signal using an audio capture arrangement by focusing an audio beam of said audio capture arrangement in said face direction, wherein said output audio signal enhances audio originating from said subject of interest relative to other audio of said environment. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system of audio signal acquisition, said system comprising:
-
an image capture module operable to detect a subject of interest using computer-implemented face detection procedures applied to image data, wherein said image capture module is operable to determine a face direction associated with said subject of interest relative to a camera system within a 3 dimensional space using said image data associated with said subject of interest; a directional audio capture arrangement operable to produce an output audio signal using a directional audio beam; and a beamforming module operable to direct said audio beam in said face direction, wherein said audio signal enhances audio originating from said subject of interest relative to other audio. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method of audio signal acquisition, said method comprising:
-
detecting a plurality of subjects of interest using computer-implemented face detection procedures applied to image data; determining a respective face direction associated with each subject of said plurality of subjects of interest relative to a camera system within a 3 dimensional space using said image data associated with said plurality of subjects of interest; and producing a respective output audio signal for each subject of said plurality of subjects of interest using a directional audio capture arrangement by focusing a plurality of audio beams in said face directions of said plurality of subjects of interest, wherein said audio output signals enhance audio originating from said plurality of subjects of interest relative to other audio. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification