METHOD AND APPARATUS FOR VIDEO CONFERENCING HAVING DYNAMIC LAYOUT BASED ON KEYWORD DETECTION
First Claim
1. A method of conferencing comprising:
- connecting at least two sites to a conference;
receiving at least two video signals and two audio signals from the connected sites;
consecutively analyzing the audio data from the at least two sites connected in the conference by converting at least a part of the audio data to acoustical features and extracting keywords and speech parameters from the acoustical features using speech recognition;
comparing said extracted keywords to predefined words, and deciding if said extracted keywords are to be considered a call for attention based on said speech parameters;
defining an image layout based on said decision;
processing the received video signals to provide a video signal according to the defined image layout; and
transmitting the processed video signal to at least one of the at least two connected sites.
1 Assignment
0 Petitions
Accused Products
Abstract
In particular, the present invention provides a method and system for conferencing, including the steps of connecting at least two sites to a conference, receiving at least two video signals and two audio signals from the connected sites, consecutively analyzing the audio data from the at least two sites connected in the conference by converting at least a part of the audio data to acoustical features and extracting keywords and speech parameters from the acoustical features using speech recognition, and comparing said extracted keywords to predefined words, then deciding if said extracted predefined keywords are to be considered a call for attention based on said speech parameters, and further, defining an image layout based on said decision, and processing the received video signals to provide a video signal according to the defined image layout, and transmitting the composite video signal to at least one of the at least two connected sites.
-
Citations
14 Claims
-
1. A method of conferencing comprising:
-
connecting at least two sites to a conference;
receiving at least two video signals and two audio signals from the connected sites;
consecutively analyzing the audio data from the at least two sites connected in the conference by converting at least a part of the audio data to acoustical features and extracting keywords and speech parameters from the acoustical features using speech recognition;
comparing said extracted keywords to predefined words, and deciding if said extracted keywords are to be considered a call for attention based on said speech parameters;
defining an image layout based on said decision;
processing the received video signals to provide a video signal according to the defined image layout; and
transmitting the processed video signal to at least one of the at least two connected sites. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for conferencing comprising:
-
an interface unit for receiving at least audio and video signals from at least two sites connected in a conference;
a speech recognition unit for analyzing the audio data from the at least two sites connected in the conference by converting at least a part of the audio data to acoustical features and extracting keywords and speech parameters from the acoustical features using speech recognition;
a processing unit configured to compare said extracted keywords to predefined words, and deciding if said extracted keywords are to be considered a call for attention based on said speech parameters;
a control processor for dynamically defining an image layout based on said decision;
a video processor for processing the received video signals to provide a processed video signal according to the defined image layout. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification