Multipoint video conference system
First Claim
1. A multipoint video conference system comprising a plurality of conference terminals, said conference terminals located at remote points which correspond to participants in a conference, for transmitting/receiving speech and image signals to/from the participants, and a video conference controller for controlling transmitting/receiving of speech utterance signals and image signals through said conference terminals, said video conference controller comprising:
- speech synthesizing means for forming a synthetic speech signal by synthesizing speech utterance signals received from said conference terminals;
identifying means for detecting said speech utterance signals from said conference terminals, identifying said conference terminal through which one of said participants has made a speech utterance and producing an identification result;
wherein said identifying means comprisesutterance detecting means for detecting the presence/absence of a speech utterance signal from each of said conference terminals by monitoring said speech utterance signal therefrom;
utterance time measuring means for measuring an utterance time of the speech utterance of said one of said participants whose utterance is detected by said utterance detecting means;
memory means for storing utterance times measured by said utterance time measuring means in correspondence with said conference terminals; and
non-utterance time measuring means for measuring a non-utterance time of said one of said participants whose speech utterance has been detected by said utterance detecting means and for subtracting the measured non-utterance time from the utterance time stored in said memory means, wherein said non-utterance time measuring means measures said non-utterance time when said non-utterance time is greater than a predetermined time period, and wherein said multi-image signal contains image signals from said conference terminals which are selected in the order of increasing utterance times stored in said memory means;
frame selecting means for forming a multi-image signal, said multi-image signal comprising a number of image signals equal to a number of multiple frames contained in the image signals of said conference terminals, said number of image signals corresponding to said identification result so that image signals from conference terminals through which said one of said participants making a speech utterance is included in said multi-image signal; and
transmission/reception means for receiving the speech utterance signals and image signals from said conference terminals and transmitting the synthetic speech signal from said speech synthesizing means and the multi-image signal from said frame selecting means to said conference terminals.
1 Assignment
0 Petitions
Accused Products
Abstract
A multipoint video conference system includes a plurality of conference terminals, placed at remote points in correspondence with participants in a conference, for transmitting/receiving speech and image signals from/to the participants, and a video conference controller for transmitting/receiving speech and image signals to/from the conference terminals. The video conference controller includes an audio mixer, an identifying section, a frame selecting section, and multiplexers. The audio mixer forms a synthetic speech signal by synthesizing speech signals from the conference terminals. The identifying section detects speech signals from the conference terminals and identifies the conference terminal through which the participant has made utterance. The frame selecting section forms a multi-image signal by selecting image signals equal in number to multiple frames from the image signals from the conference terminals on the basis of the identification result obtained by the identifying section. The multiplexers receive the speech and image signals from the conference terminals and transmit the synthetic speech signal from the audio mixer and the multi-image signal from the frame selecting section to the conference terminals.
-
Citations
10 Claims
-
1. A multipoint video conference system comprising a plurality of conference terminals, said conference terminals located at remote points which correspond to participants in a conference, for transmitting/receiving speech and image signals to/from the participants, and a video conference controller for controlling transmitting/receiving of speech utterance signals and image signals through said conference terminals, said video conference controller comprising:
-
speech synthesizing means for forming a synthetic speech signal by synthesizing speech utterance signals received from said conference terminals; identifying means for detecting said speech utterance signals from said conference terminals, identifying said conference terminal through which one of said participants has made a speech utterance and producing an identification result; wherein said identifying means comprises utterance detecting means for detecting the presence/absence of a speech utterance signal from each of said conference terminals by monitoring said speech utterance signal therefrom; utterance time measuring means for measuring an utterance time of the speech utterance of said one of said participants whose utterance is detected by said utterance detecting means; memory means for storing utterance times measured by said utterance time measuring means in correspondence with said conference terminals; and non-utterance time measuring means for measuring a non-utterance time of said one of said participants whose speech utterance has been detected by said utterance detecting means and for subtracting the measured non-utterance time from the utterance time stored in said memory means, wherein said non-utterance time measuring means measures said non-utterance time when said non-utterance time is greater than a predetermined time period, and wherein said multi-image signal contains image signals from said conference terminals which are selected in the order of increasing utterance times stored in said memory means; frame selecting means for forming a multi-image signal, said multi-image signal comprising a number of image signals equal to a number of multiple frames contained in the image signals of said conference terminals, said number of image signals corresponding to said identification result so that image signals from conference terminals through which said one of said participants making a speech utterance is included in said multi-image signal; and transmission/reception means for receiving the speech utterance signals and image signals from said conference terminals and transmitting the synthetic speech signal from said speech synthesizing means and the multi-image signal from said frame selecting means to said conference terminals. - View Dependent Claims (2, 3)
-
-
4. A multipoint video conference system comprising a plurality of conference terminals, located at remote points which correspond to participants in a conference, for transmitting/receiving speech and image signals to/from the participants, and a video conference controller for controlling transmitting/receiving of speech utterance signals and image signals through said conference terminals, said video conference controller comprising:
-
speech synthesizing means for forming a synthetic speech signal by synthesizing speech utterance signals from said conference terminals; image selecting means for selecting a plurality of image signals for constituent frames from said image signals, the multiple frames being constituted by a plurality of constituent frames obtained by dividing a single frame; image synthesizing means for forming a multi-image signal from multiple frames by using the plurality of image signals selected by said image selecting means; control means for controlling said image selecting means in accordance with a length of a speech utterance made by one of said participants through one of said conference terminals; and multiplexing means for receiving and separating multiplexed speech and image signals from said conference terminals, and transmitting said synthetic speech signal from said speech synthesizing means and said multi-image signal from said frame selecting means to said conference terminals upon multiplexing the signals;
wherein said control means comprises,utterance detecting means for detecting the presence/absence of speech utterances from the participants, utterance time measuring means for measuring an utterance time of the speech utterance of one of said participants who has made said speech utterance, in accordance with an utterance detection output from said utterance detecting means, and a non-utterance time measuring means for measuring a non-utterance time of the one of said participants who has made utterance, in accordance with a non-utterance detection output from said utterance detecting means, and wherein said control means controls said image selecting means in accordance with said utterance time and said non-utterance time. - View Dependent Claims (5)
-
-
6. A multipoint video conference system comprising a plurality of conference terminals, located at remote points which correspond to participants in a conference, for transmitting/receiving speech and image signals to/from the participants, and a video conference controller for controlling transmitting/receiving of speech utterance signals and image signals through said conference terminals, said video conference controller comprising:
-
speech synthesizing means for forming a synthetic speech signal by synthesizing speech utterance signals received from said conference terminals; identifying means for detecting said speech utterance signals from said conference terminals, identifying said conference terminal through which one of said participants has made a speech utterance and producing an identification result; measuring means for measuring an utterance time of every participant in a conference, from the beginning of the conference to a current time, and measuring a non-utterance time of every participant from the beginning of the conference to the current time; frame selecting means for forming a multi-image signal by selecting a number of image signals equal to a number of multiple frames from the image signals of said conference terminals in order of value obtained by subtracting a measured non-utterance time from a measured utterance time; and transmission/reception means for receiving the speech utterance signals and image signals from said conference terminals and transmitting the synthetic speech signal from said speech synthesizing means and the multi-image signal from said frame selecting means to said conference terminals. - View Dependent Claims (7, 10)
-
-
8. A multipoint video conference system comprising a plurality of conference terminals, located at remote points which correspond to participants in a conference, for transmitting/receiving speech and image signals to/from the participants, and a video conference controller for controlling transmitting/receiving of speech utterance signals and image signals through said conference terminals, said video conference controller comprising:
-
speech synthesizing means for forming a synthetic speech signal by synthesizing speech utterance signals from said conference terminals; measuring means for measuring an utterance time of every participant in a conference, from the beginning of the conference to a current time, and measuring a non-utterance time of every participant from the beginning of the conference to the current time; image selecting means for selecting a plurality of image signals for constituent frames from said image signals, multiple frames being constituted by a plurality of constituent frames obtained by dividing a single frame; image synthesizing means for forming a multi-image signal from multiple frames by using the plurality of image signals selected by said image selecting means; control means for controlling said image selecting means in accordance with said utterance time of one of said participants through one of said conference terminals; and
multiplexing means for receiving and separating multiplexed speech and image signals from said conference terminals, and transmitting said synthetic speech signal from said speech synthesizing means and said multi-image signal from said frame selecting means to said conference terminals upon multiplexing the signals. - View Dependent Claims (9)
-
Specification