Speaker and Person Backlighting For Improved AEC and AGC

US 20090322915A1
Filed: 06/27/2008
Published: 12/31/2009
Est. Priority Date: 06/27/2008
Status: Active Grant

First Claim

Patent Images

1. A method to be executed at least in part in a computing device for improving image quality of a selected region in a video frame, the method comprising:

receiving a captured video frame;

determining a region of interest based on input through at least one from a set of;

sound source localization, multi-person detection, and active speaker detection;

automatically adjusting at least one of an exposure parameter and a gain parameter for the determined region of interest such that the image quality of the region of interest is improved; and

encoding the video frame for at least one of transmission and storage.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Regions of interest in video image capture for communication purposes are selected based on one or more inputs based on sound source localization, multi-person detection, and active speaker detection using audio and/or visual cues. Exposure and/or gain for the selected region are automatically enhanced for improved video quality focusing on people or inanimate objects of interest.

Citations

20 Claims

1. A method to be executed at least in part in a computing device for improving image quality of a selected region in a video frame, the method comprising:
- receiving a captured video frame;
  
  determining a region of interest based on input through at least one from a set of;
  
  sound source localization, multi-person detection, and active speaker detection;
  
  automatically adjusting at least one of an exposure parameter and a gain parameter for the determined region of interest such that the image quality of the region of interest is improved; and
  
  encoding the video frame for at least one of transmission and storage.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising:
    - prior to determining the region of interest, dividing the video frame into at least two backlighting bands;
      
      assigning different weight factors to the backlighting bands; and
      
      adjusting a backlighting of each band based on the assigned weight factors.
  - 3. The method of claim 2, further comprising:
    - dividing the video frame into the backlighting bands based on an expected position of one of persons and objects of interest within the video frame, wherein the weight factors are assigned such that a backlighting band containing one of the persons and objects of interest is rendered more prominent than other bands.
  - 4. The method of claim 2, wherein the backlighting bands are horizontal bands in a video frame captured by one of:
    - a wide field-of-view camera and a 360-deg panorama camera.
  - 5. The method of claim 1, wherein automatically adjusting at least one of the exposure parameter and the gain parameter includes:
    - computing an image statistical distribution based on pixel values of the region of interest weighted based on backlighting and the input from the sound source localization, multi-person detection, and active speaker detection;
      
      comparing the computed image statistical distribution to a threshold value; and
      
      adjusting at least one of a gain and an exposure for processing the pixel value based on the comparison.
  - 6. The method of claim 5, wherein the image statistical distribution includes one of:
    - an image mean and an image median.
  - 7. The method of claim 1, wherein the sound source localization provides input for determination of a region of interest based on audio features, and the multi-person detection provides input based on image features.
  - 8. The method of claim 1, wherein the active speaker detection provides input based on a weighted comparison of at least one from a set of audio feature detection through a plurality of microphones and image feature detection.
  - 9. The method of claim 1, further comprising determining a region of interest containing an inanimate object based on the inputs from sound source localization, multi-person detection, and active speaker detection.

10. A computing device for improving image quality of a region of interest in a video communication application, comprising:
- a memorya video capture device configured to capture frames of video;
  
  a processor coupled to the memory and the video capture device, and configured to execute a video processing application, the video processing application comprising;
  
  a pre-processing module for;
  
  receiving a captured video frame;
  
  a selection module for;
  
  determining the region of interest based on input through at least one from a set of;
  
  sound source localization, multi-person detection, and active speaker detection;
  
  an automatic gain/exposure control module for;
  
  adjusting at least one of a gain and an exposure for a portion of the video frame containing the region of interest by computing an image mean for pixel values of the portion of the video frame weighted based on fixed backlighting for the portion of the video frame and comparing the computed image mean to a threshold value for determining at least one of a new gain parameter and a new exposure parameter; and
  
  an encoding module for;
  
  encoding the processed video frame for subsequent transmission to a video rendering application; and
  
  a communication device configured to transmit encoded frames to another computing device over a network for one of rendering and storage.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The computing device of claim 10, wherein the image mean is computed by adding a mean of a product of the pixel values for the whole video frame with a fixed backlighting value and a mean of the pixel values of the portion of the video frame in a weighted manner.
  - 12. The computing device of claim 11, wherein the new gain parameter and the new exposure parameter are determined by:
    - if the computed image mean is greater than a sum of a target value and the threshold value and if an original gain parameter is greater than zero, setting the new gain parameter as the original gain parameter minus a predefined value; and
      
      if the computed image mean is greater than the sum of the target value and the threshold value, the original gain parameter is less than zero, and an original exposure parameter is greater than zero, setting the new exposure parameter as the original exposure parameter minus the predefined value.
  - 13. The computing device of claim 12, wherein the new gain parameter and the new exposure parameter are further determined by:
    - if the computed image mean is less than a difference of the target value and the threshold value and if the original gain parameter is less than a maximum gain value, setting the new gain parameter as the original gain parameter plus the predefined value; and
      
      if the computed image mean is less than the sum of the target value and the threshold value, the original exposure parameter is less than a maximum exposure value, setting the new exposure parameter as the original exposure parameter plus the predefined value.
  - 14. The computing device of claim 13, wherein the predefined value is determined dynamically based on video capture device characteristics.
  - 15. The computing device of claim 10, wherein the sound source localization, multi-person detection, and active speaker detection are performed by a combination of hardware and software external to the computing device.
  - 16. The computing device of claim 10, wherein the selection module is an integral part of the automatic gain/exposure control module.
  - 17. The computing device of claim 10, wherein the pre-processing module is further configured to perform filtering of the captured video frame.

18. A computer-readable storage medium with instructions stored thereon for improving image quality of a selected person in a video conference application, the instructions comprising:
- receiving a captured video frame;
  
  dividing the video frame into at least two backlighting bands;
  
  assigning different weight factors to the backlighting bands; and
  
  adjusting a backlighting of each band based on the assigned weight factors such that the backlighting band containing at least the selected person is rendered more prominent than other bands;
  
  determining the selected person based on input through at least one from a set of;
  
  sound source localization, multi-person detection, and active speaker detection;
  
  determining at least one of a new gain parameter and a new exposure parameter for a portion of the video frame containing the selected person by computing an image mean for pixel values of the video frame and the portion of the video frame weighted based on corresponding backlighting bands and comparing the computed image mean to a target value and threshold value such that the selected person becomes prominent within the video frame based on the application of the new gain parameter and the new exposure parameter;
  
  encoding the processed video frame; and
  
  transmitting encoded frames to another computing device over a network for one of rendering and storage.
- View Dependent Claims (19, 20)
- - 19. The computer-readable storage medium of claim 18, wherein the instructions further comprise:
    - if a new person is selected during the video conference based on new input through at least one from the set of;
      
      sound source localization, multi-person detection, and active speaker detection, re-determining a new gain parameter and a new exposure parameter for another portion of the video frame containing the newly selected person.
  - 20. The computer-readable storage medium of claim 18, wherein video frame is an omni-directional image captured by a panoramic video conferencing camera.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Cutler, Ross G.

Granted Patent

US 8,130,257 B2
Time in Patent Office

Days
Field of Search
US Class Current

348/251
CPC Class Codes

H04N 23/611 where the recognised object...

Speaker and Person Backlighting For Improved AEC and AGC

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker and Person Backlighting For Improved AEC and AGC

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links