Multiple video camera processing for teleconferencing

US 8,358,328 B2
Filed: 11/20/2008
Issued: 01/22/2013
Est. Priority Date: 11/20/2008
Status: Active Grant

First Claim

Patent Images

1. An apparatus comprising:

a plurality of video cameras each configured to capture a respective camera view of at least some participants of a conference, the camera views together including at least one view of each participant;

a plurality of microphones;

an audio processing module coupled to the plurality of microphones and configured to generate audio data and direction information indicative of the direction of sound received at the microphones;

a face detection element coupled to the video cameras and configured to determine the location of each participant'"'"'s face in each camera view and to determine which one or more faces are in more than one camera view;

a composition element coupled to the video cameras and the face detection element and configured to generate one or more candidate people views, each people view being of an area enclosing a head and shoulders view of at least one participant; and

a video director element coupled to the composition element and to the audio processing module and configured to make a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints,wherein each camera view is not necessarily a people view,wherein each people view provides an image of a size and layout and includes at least one participant,wherein each participant may appear in more than one people view, andwherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, an apparatus, and a storage medium with executable code to execute a method including accepting camera views of at least some participants of a teleconference, each view from a corresponding video camera, with the camera views together including at least one view of each participant. The method includes accepting audio from a plurality of microphones, and processing the audio from the plurality of microphones to generate audio data and direction information indicative of the direction of sound received at the microphones. The method further includes generating one or more candidate people views, with each people view being of an area enclosing a head and shoulders view of at least one participant. The method also includes making a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints.

Citations

20 Claims

1. An apparatus comprising:
- a plurality of video cameras each configured to capture a respective camera view of at least some participants of a conference, the camera views together including at least one view of each participant;
  
  a plurality of microphones;
  
  an audio processing module coupled to the plurality of microphones and configured to generate audio data and direction information indicative of the direction of sound received at the microphones;
  
  a face detection element coupled to the video cameras and configured to determine the location of each participant'"'"'s face in each camera view and to determine which one or more faces are in more than one camera view;
  
  a composition element coupled to the video cameras and the face detection element and configured to generate one or more candidate people views, each people view being of an area enclosing a head and shoulders view of at least one participant; and
  
  a video director element coupled to the composition element and to the audio processing module and configured to make a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints,wherein each camera view is not necessarily a people view,wherein each people view provides an image of a size and layout and includes at least one participant,wherein each participant may appear in more than one people view, andwherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. An apparatus as recited in claim 1,
- 3. An apparatus as recited in claim 1, further comprisingan electronic pan-tilt-zoom element coupled to the video director and to the video cameras and configured to generate, according to the selected view information, video corresponding to the selected at least one of the candidate views for compression and transmission to one or more remote endpoints.
- 4. An apparatus as recited in claim 3, wherein each participant appears in only one people view.
- 5. An apparatus as recited in claim 3, wherein each participant may appear in more than one people view, and wherein the composition element includes a first composition element configured to compose people views, and a second composition element configured to select the candidate people views from the composed people view, such that each participant appears in only one candidate people view.
- 6. An apparatus as recited in claim 3, wherein the electronic pan-tilt-zoom element jointly with the composition element is further configured to construct head-on people views including correcting for at least some of the distortions that occur because the camera view corresponding to each people view does not include (a) head-on view(s) of the participant(s) in the people view.
- 7. An apparatus as recited in claim 3, wherein the composition element is further configured to carry out perspective correction.

8. A method of operating a processing system, the method comprising:
- accepting a plurality of camera views of at least some participants of a conference, each camera view from a corresponding video camera, the camera views together including at least one view of each participant;
  
  accepting audio from a plurality of microphones;
  
  processing the audio from the plurality of microphones to generate audio data and direction information indicative of the direction of sound received at the microphones;
  
  detecting any faces in the camera views and determining the location of each detected face in each camera view, and also determining which face or faces is or are in more than one camera view;
  
  generating one or more candidate people views, each people view being of an area enclosing a head and shoulders view of at least one participant; and
  
  making a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints,wherein each camera view is not necessarily a people view,wherein each people view provides an image of a size and layout and includes at least one participant,wherein each participant may appear in more than one people view, andwherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. A method as recited in claim 8,
- 10. A method as recited in claim 8, further comprising:
  - generating according to the selected view information, video corresponding to the selected at least one of the candidate views for compression and transmission to one or more remote endpoints.
- 11. A method as recited in claim 10, wherein each participant appears in only one people view.
- 12. A method as recited in claim 10, wherein each participant may appear in more than one people view, the method further comprising:
  - composing possible people views, andselecting the candidate people views from the composed possible people view, such that each participant appears in only one candidate people view.
- 13. A method as recited in claim 10,wherein the generating according to the selected view information including correcting for at least some of the distortions that occur because the camera view corresponding to each people view does not include (a) head-on view(s) of the participant(s) in the people view.
- 14. A method as recited in claim 10, wherein the generating according to the selected view information includes perspective correction.

15. A method of operating a processing system comprising:
- for a plurality of camera views from corresponding video cameras in a room, detecting any faces in the camera view;
  
  determining the location of participants in the room;
  
  determining which face or faces is or are in more than one camera view;
  
  for each subgroup of one or more adjacent faces, composing a people view, each people view being of an area enclosing a head and shoulders view of at least one participant;
  
  selecting respective people views for each respective participant;
  
  mapping each people view to one or more determined voice directions, such that each determined voice direction is associated with one of the people views; and
  
  selecting one or more people views for transmission to remote endpoints,wherein each camera view is not necessarily a people view,wherein each people view provides an image of a size and layout and includes at least one participant,wherein each participant may appear in more than one people view, andwherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated,such that video for the people views selected for transmission can be formed.
- View Dependent Claims (16, 17, 18, 19)
- - 16. A method as recited in claim 15, further comprising when a voice direction changes, switching between people views according to the sound direction.
  - 17. A method as recited in claim 15, wherein the face detecting includes determining the position of each face within the camera view, and a measure of the size of the face.
  - 18. A method as recited in claim 17, wherein the face detecting includes at least one of eye detection and/or fitting respective elliptical shapes to edges detected in the camera views corresponding to a face, and wherein in the case that only eye detection is used, the measure of size of the face is determined by the distance between the detected eyes of the face, and wherein in the case only elliptical shape fitting is used, the measure of the face is determined from properties of the elliptical shape fitted to the edges of a face.
  - 19. A method as recited in claim 17, wherein each camera location is pre-determined, and wherein the method comprises determining each face'"'"'s approximate distance from the pre-determined camera positions.

20. A non-transitory computer-readable medium having encoded thereon executable instructions that when executed by at least one processor of a processing system cause carrying out a method comprising:
- for a plurality of camera views from corresponding video cameras in a room, detecting any faces in the camera views;
  
  determining the location of participants in the room;
  
  determining which face or faces is or are in more than one camera view;
  
  for each subgroup of one or more adjacent faces, composing a people view, each people view being of an area enclosing a head and shoulders view of at least one participant;
  
  selecting respective people views for each respective participant;
  
  mapping each people view to one or more determined voice directions, such that each determined voice direction is associated with one of the people views; and
  
  selecting one or more people views for transmission to remote endpoints,wherein each camera view is not necessarily a people view,wherein each people view provides an image of a size and layout and includes at least one participant,wherein each participant may appear in more than one people view, andwherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated,such that video for the people views selected for transmission can be formed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Friel, Joseph T., Mauchly, J. William
Primary Examiner(s)
WOO, STELLA L

Application Number

US12/275,119
Publication Number

US 20100123770A1
Time in Patent Office

1,524 Days
Field of Search

348/14.08, 348/14.09, 348/14.01
US Class Current

348/14.08
CPC Class Codes

H04N 7/15 Conference systems

Multiple video camera processing for teleconferencing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multiple video camera processing for teleconferencing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links