METHOD AND SYSTEM FOR VOICE CAPTURE USING FACE DETECTION IN NOISY ENVIRONMENTS

US 20150022636A1
Filed: 07/19/2013
Published: 01/22/2015
Est. Priority Date: 07/19/2013
Status: Abandoned Application

First Claim

Patent Images

1. An automated method of audio signal acquisition, said method comprising:

detecting a subject of interest within an environment using computer-implemented face detection procedures applied to image data captured by a camera system;

determining a face direction associated with said subject of interest relative to said camera system within a 3 dimensional space using said image data associated with said subject of interest; and

producing an output audio signal using an audio capture arrangement by focusing an audio beam of said audio capture arrangement in said face direction, wherein said output audio signal enhances audio originating from said subject of interest relative to other audio of said environment.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the present invention are capable of determining a face direction associated with a detected subject (or multiple detected subjects) of interest within a 3D space using face detection procedures, while simultaneously avoiding the pick up of other environmental sounds. In addition, if more than one face is detected, embodiments of the present invention can automatically detect an active speaker based on the recognition of facial movements consistent with the performance of providing audio (e.g., tracking mouth movements) by those subjects whose faces were detected. Once determinations are made regarding face direction of the detected subject, embodiments of the present invention may dynamically adjust the audio acquisition capabilities of the audio capture device (e.g., microphone devices) relative to the location of the detected subject using beamforming techniques for instance. As such, embodiments of the present invention can detect the direction of the “talking object” and guide the audio subsystem to filter out any sound not coming from that direction.

87 Citations

View as Search Results

21 Claims

1. An automated method of audio signal acquisition, said method comprising:
- detecting a subject of interest within an environment using computer-implemented face detection procedures applied to image data captured by a camera system;
  
  determining a face direction associated with said subject of interest relative to said camera system within a 3 dimensional space using said image data associated with said subject of interest; and
  
  producing an output audio signal using an audio capture arrangement by focusing an audio beam of said audio capture arrangement in said face direction, wherein said output audio signal enhances audio originating from said subject of interest relative to other audio of said environment.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of audio signal acquisition as described in claim 1, wherein said detecting further comprises automatically selecting an actively speaking subject as said subject of interest from a plurality of subjects based on recorded images of facial movements performed by said actively speaking subject.
  - 3. The method of audio signal acquisition as described in claim 1, wherein said face direction comprises an angle and a depth.
  - 4. The method of audio signal acquisition as described in claim 3, wherein said determining a face direction further comprises using camera system focusing features to locate said subject of interest.
  - 5. The method of audio signal acquisition as described in claim 1, wherein said determining a face direction further comprises determining a 3 dimensional coordinate position for said subject of interest using stereoscopic cameras.
  - 6. The method of audio signal acquisition as described in claim 1, wherein said focusing further comprises electronically steering said audio beam to filter out directionally inapposite audio received relative to said face direction using beamforming procedures.
  - 7. The method of audio signal acquisition as described in claim 1, wherein said audio capture arrangement comprises an array of microphones.

8. A system of audio signal acquisition, said system comprising:
- an image capture module operable to detect a subject of interest using computer-implemented face detection procedures applied to image data, wherein said image capture module is operable to determine a face direction associated with said subject of interest relative to a camera system within a 3 dimensional space using said image data associated with said subject of interest;
  
  a directional audio capture arrangement operable to produce an output audio signal using a directional audio beam; and
  
  a beamforming module operable to direct said audio beam in said face direction, wherein said audio signal enhances audio originating from said subject of interest relative to other audio.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of audio signal acquisition as described in claim 8, wherein said image capture module is further operable to automatically select an actively speaking subject as said subject of interest from a plurality of subjects based on recorded images of facial movements performed by said actively speaking subject.
  - 10. The system of audio signal acquisition as described in claim 8, wherein said face direction comprises an angle and a depth.
  - 11. The system of audio signal acquisition as described in claim 10, wherein said image capture module is further operable to determine said depth using camera system focusing features to focus on said subject of interest.
  - 12. The system of audio signal acquisition as described in claim 8, wherein said image capture module is further operable to determine a 3 dimensional coordinate position for said subject of interest using stereoscopic cameras.
  - 13. The system of audio signal acquisition as described in claim 8, wherein said directional audio capture arrangement is further operable to filter out directionally inapposite audio received relative to said face direction using beamforming procedures.
  - 14. The system of audio signal acquisition as described in claim 8, wherein said directional audio capture arrangement comprises an array of microphones.

15. A method of audio signal acquisition, said method comprising:
- detecting a plurality of subjects of interest using computer-implemented face detection procedures applied to image data;
  
  determining a respective face direction associated with each subject of said plurality of subjects of interest relative to a camera system within a 3 dimensional space using said image data associated with said plurality of subjects of interest; and
  
  producing a respective output audio signal for each subject of said plurality of subjects of interest using a directional audio capture arrangement by focusing a plurality of audio beams in said face directions of said plurality of subjects of interest, wherein said audio output signals enhance audio originating from said plurality of subjects of interest relative to other audio.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The method of audio signal acquisition as described in claim 15, wherein said detecting further comprises automatically selecting an actively speaking subject as said subject of interest based on recorded images of facial movements performed by said actively speaking subject.
  - 17. The method of audio signal acquisition as described in claim 15, wherein said detecting further comprises automatically detecting said plurality of subjects of interest using computer-implemented facial recognition procedures that recognize eye and nose positions.
  - 18. The method of audio signal acquisition as described in claim 15, wherein said determining further comprises using camera system focusing features to locate said plurality of subjects of interest.
  - 19. The method of audio signal acquisition as described in claim 15, wherein said determining a face direction further comprises determining a respective 3 dimensional coordinate position for each subject of said plurality of subjects of interest using stereoscopic cameras.
  - 20. The method of audio signal acquisition as described in claim 15, wherein said focusing further comprises electronically steering said plurality of audio beams to filter out directionally inapposite audio received relative to said respective face direction of each subject of said plurality of subjects of interest using beamforming procedures.
  - 21. The method of audio signal acquisition as described in claim 15, wherein said directional audio capture arrangement comprises an array of microphones.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NVIDIA Corporation
Original Assignee
NVIDIA Corporation
Inventors
SAVRANSKY, Guillermo

Application Number

US13/946,383
Publication Number

US 20150022636A1
Time in Patent Office

Days
Field of Search
US Class Current

348/46
CPC Class Codes

G06V 40/161   Detection; Localisation; No...

H04R 2430/20   Processing of the output si...

H04R 2499/11   Transducers incorporated or...

H04R 3/00   Circuits for transducers , ...

H04R 3/005   for combining the signals o...

METHOD AND SYSTEM FOR VOICE CAPTURE USING FACE DETECTION IN NOISY ENVIRONMENTS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

87 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR VOICE CAPTURE USING FACE DETECTION IN NOISY ENVIRONMENTS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

87 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links