Image processing apparatus
First Claim
1. Image processing apparatus, comprising:
- an image data receiver for receiving image data recorded by a plurality of cameras showing the movements of a plurality of people;
a speaker identifier for determining which of the people is speaking;
a speech recipient identifier for determining at whom the speaker is looking;
a position calculator for determining the position of the speaker and the position of the person at whom the speaker is looking; and
camera selection means for selecting image data from the received image data on the basis of the determined positions of the speaker and the person at whom the speaker is looking, said camera selection means being arranged to select image data in which both the speaker and the person at whom the speaker is looking appear, and wherein the camera selection means is arranged to generate quality values representing a quality of the views that at least some of the cameras have of the speaker and the person at whom the speaker is looking, and to select the image data on the basis of which camera has the quality value representing the highest quality.
1 Assignment
0 Petitions
Accused Products
Abstract
Image data from a plurality of cameras 2-1, 2-2, 2-3 showing the movements of a number of people, for example in a meeting, and sound data from a directional microphone array 4 is processed by a computer processing apparatus 24 to archive the data in a meeting archive database 60. The image data is processed to determine the three-dimensional position and orientation of each person'"'"'s head and to determine at whom each person is looking. The sound data is processed to determine the direction from which the sound came. Processing is carried out to determine who is speaking by determining which person has his head in a position corresponding to the direction from which the sound came. Having determined which person is speaking, the personal speech recognition parameters for that person are selected and used to convert the sound data to text data. Image data to be archived is chosen by selecting the camera which best shows the speaking participant and the participant to whom he is speaking. Image data, sound data, text data and data defining at whom each person is looking is stored in the meeting archive database 60.
86 Citations
33 Claims
-
1. Image processing apparatus, comprising:
-
an image data receiver for receiving image data recorded by a plurality of cameras showing the movements of a plurality of people;
a speaker identifier for determining which of the people is speaking;
a speech recipient identifier for determining at whom the speaker is looking;
a position calculator for determining the position of the speaker and the position of the person at whom the speaker is looking; and
camera selection means for selecting image data from the received image data on the basis of the determined positions of the speaker and the person at whom the speaker is looking, said camera selection means being arranged to select image data in which both the speaker and the person at whom the speaker is looking appear, and wherein the camera selection means is arranged to generate quality values representing a quality of the views that at least some of the cameras have of the speaker and the person at whom the speaker is looking, and to select the image data on the basis of which camera has the quality value representing the highest quality. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method of processing image data recorded by a plurality of cameras showing the movements of a plurality of people to select image data for storage, the method comprising:
-
a speaker identification step of determining which of the people is speaking;
a step of determining at whom the speaker is looking;
a step of determining the position of the speaker and the position of the person at whom the speaker is looking; and
a camera selection step for selecting image data on the basis of the determined positions of the speaker and the person at whom the speaker is looking, wherein, in the camera selection step, image data is selected in which both the speaker and the person at whom the speaker is looking appear, quality values are generated representing a quality of the views that at least some of the cameras have of the speaker and the person at whom the speaker is looking, and the image data is selected on the basis of which camera has the quality value representing the highest quality. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification