Multiple video camera processing for teleconferencing
First Claim
1. An apparatus comprising:
- a plurality of video cameras each configured to capture a respective camera view of at least some participants of a conference, the camera views together including at least one view of each participant;
a plurality of microphones;
an audio processing module coupled to the plurality of microphones and configured to generate audio data and direction information indicative of the direction of sound received at the microphones;
a face detection element coupled to the video cameras and configured to determine the location of each participant'"'"'s face in each camera view and to determine which one or more faces are in more than one camera view;
a composition element coupled to the video cameras and the face detection element and configured to generate one or more candidate people views, each people view being of an area enclosing a head and shoulders view of at least one participant; and
a video director element coupled to the composition element and to the audio processing module and configured to make a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints,wherein each camera view is not necessarily a people view,wherein each people view provides an image of a size and layout and includes at least one participant,wherein each participant may appear in more than one people view, andwherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, an apparatus, and a storage medium with executable code to execute a method including accepting camera views of at least some participants of a teleconference, each view from a corresponding video camera, with the camera views together including at least one view of each participant. The method includes accepting audio from a plurality of microphones, and processing the audio from the plurality of microphones to generate audio data and direction information indicative of the direction of sound received at the microphones. The method further includes generating one or more candidate people views, with each people view being of an area enclosing a head and shoulders view of at least one participant. The method also includes making a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints.
-
Citations
20 Claims
-
1. An apparatus comprising:
-
a plurality of video cameras each configured to capture a respective camera view of at least some participants of a conference, the camera views together including at least one view of each participant; a plurality of microphones; an audio processing module coupled to the plurality of microphones and configured to generate audio data and direction information indicative of the direction of sound received at the microphones; a face detection element coupled to the video cameras and configured to determine the location of each participant'"'"'s face in each camera view and to determine which one or more faces are in more than one camera view; a composition element coupled to the video cameras and the face detection element and configured to generate one or more candidate people views, each people view being of an area enclosing a head and shoulders view of at least one participant; and a video director element coupled to the composition element and to the audio processing module and configured to make a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints, wherein each camera view is not necessarily a people view, wherein each people view provides an image of a size and layout and includes at least one participant, wherein each participant may appear in more than one people view, and wherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated. - View Dependent Claims (2, 3, 4, 5, 6, 7)
wherein the cameras are set to each generate a candidate people view, wherein the composition element is configured to make a selection of which at least one camera views is to be transmitted to the one or more remote endpoints according to the direction information, and wherein the apparatus further comprises: a video selector element coupled to the video director and to the video cameras and configured to switch in, according to the selection by the video director, at least one of the camera views for compression and transmission to one or more remote endpoints.
-
3. An apparatus as recited in claim 1, further comprising
an electronic pan-tilt-zoom element coupled to the video director and to the video cameras and configured to generate, according to the selected view information, video corresponding to the selected at least one of the candidate views for compression and transmission to one or more remote endpoints. -
4. An apparatus as recited in claim 3, wherein each participant appears in only one people view.
-
5. An apparatus as recited in claim 3, wherein each participant may appear in more than one people view, and wherein the composition element includes a first composition element configured to compose people views, and a second composition element configured to select the candidate people views from the composed people view, such that each participant appears in only one candidate people view.
-
6. An apparatus as recited in claim 3, wherein the electronic pan-tilt-zoom element jointly with the composition element is further configured to construct head-on people views including correcting for at least some of the distortions that occur because the camera view corresponding to each people view does not include (a) head-on view(s) of the participant(s) in the people view.
-
7. An apparatus as recited in claim 3, wherein the composition element is further configured to carry out perspective correction.
-
-
8. A method of operating a processing system, the method comprising:
-
accepting a plurality of camera views of at least some participants of a conference, each camera view from a corresponding video camera, the camera views together including at least one view of each participant; accepting audio from a plurality of microphones; processing the audio from the plurality of microphones to generate audio data and direction information indicative of the direction of sound received at the microphones; detecting any faces in the camera views and determining the location of each detected face in each camera view, and also determining which face or faces is or are in more than one camera view; generating one or more candidate people views, each people view being of an area enclosing a head and shoulders view of at least one participant; and making a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints, wherein each camera view is not necessarily a people view, wherein each people view provides an image of a size and layout and includes at least one participant, wherein each participant may appear in more than one people view, and wherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated. - View Dependent Claims (9, 10, 11, 12, 13, 14)
wherein the accepted camera views are each a candidate people view, the method further comprising: in response to the made selection, switching in at least one of the accepted camera views for compression and transmission to one or more remote endpoints.
-
10. A method as recited in claim 8, further comprising:
generating according to the selected view information, video corresponding to the selected at least one of the candidate views for compression and transmission to one or more remote endpoints.
-
11. A method as recited in claim 10, wherein each participant appears in only one people view.
-
12. A method as recited in claim 10, wherein each participant may appear in more than one people view, the method further comprising:
-
composing possible people views, and selecting the candidate people views from the composed possible people view, such that each participant appears in only one candidate people view.
-
-
13. A method as recited in claim 10,
wherein the generating according to the selected view information including correcting for at least some of the distortions that occur because the camera view corresponding to each people view does not include (a) head-on view(s) of the participant(s) in the people view. -
14. A method as recited in claim 10, wherein the generating according to the selected view information includes perspective correction.
-
-
15. A method of operating a processing system comprising:
-
for a plurality of camera views from corresponding video cameras in a room, detecting any faces in the camera view; determining the location of participants in the room; determining which face or faces is or are in more than one camera view; for each subgroup of one or more adjacent faces, composing a people view, each people view being of an area enclosing a head and shoulders view of at least one participant; selecting respective people views for each respective participant; mapping each people view to one or more determined voice directions, such that each determined voice direction is associated with one of the people views; and selecting one or more people views for transmission to remote endpoints, wherein each camera view is not necessarily a people view, wherein each people view provides an image of a size and layout and includes at least one participant, wherein each participant may appear in more than one people view, and wherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated, such that video for the people views selected for transmission can be formed. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium having encoded thereon executable instructions that when executed by at least one processor of a processing system cause carrying out a method comprising:
-
for a plurality of camera views from corresponding video cameras in a room, detecting any faces in the camera views; determining the location of participants in the room; determining which face or faces is or are in more than one camera view; for each subgroup of one or more adjacent faces, composing a people view, each people view being of an area enclosing a head and shoulders view of at least one participant; selecting respective people views for each respective participant; mapping each people view to one or more determined voice directions, such that each determined voice direction is associated with one of the people views; and selecting one or more people views for transmission to remote endpoints, wherein each camera view is not necessarily a people view, wherein each people view provides an image of a size and layout and includes at least one participant, wherein each participant may appear in more than one people view, and wherein each participant appears in only one candidate people view and, when displayed remotely on a remote display screen, is displayed life size and facing the expected audience in the remote location at which the remote display screen is situated, such that video for the people views selected for transmission can be formed.
-
Specification