System and process for adding high frame-rate current speaker data to a low frame-rate video using audio watermarking techniques
First Claim
1. A computer-implemented process for facilitating the identification of a current speaker in each frame of a low frame-rate video, comprising using a computer to perform the following process actions:
- obtaining audio and video of an event having multiple people in attendance;
transmitting the video of the event at a prescribed frame rate to a client computing device;
tracking the movements of the attendees and recording their positions when each video frame is transmitted and their subsequent positions between the transmission of the video frames;
periodically identifying which of the attendees is currently speaking at a rate significantly faster than the prescribed video frame rate;
periodically generating an indicator which comprises the location of the attendee who is currently speaking as depicted in the last-transmitted video frame regardless of their current position;
embedding the indicators as they are generated into the obtained audio of the event using an audio watermarking technique; and
continuously transmitting the audio of the event with the embedded indicators to the client computing device for use in highlighting a region in the last-transmitted video frame depicting the attendee at the location specified in the last-embedded indicator.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and process for highlighting the current speaker on an on-going basis in each frame of a low frame-rate video of an event having multiple people in attendance, such as a video teleconference, is presented. In general, this is accomplished by periodically identifying an attendee that is currently speaking at a rate substantially faster than the video frame rate, and for each frame of the video updating the frame to highlight the attendee currently speaking. More particularly, an A/V source provides video and audio data streams to the client computing device, with current speaker data embedded into the audio stream via audio watermarking techniques. The client device extracts the current speaker data from the audio stream, and then renders and displays the video while using the current speaker data to periodically update the frame being displayed to highlight the current speaker.
-
Citations
21 Claims
-
1. A computer-implemented process for facilitating the identification of a current speaker in each frame of a low frame-rate video, comprising using a computer to perform the following process actions:
-
obtaining audio and video of an event having multiple people in attendance;
transmitting the video of the event at a prescribed frame rate to a client computing device;
tracking the movements of the attendees and recording their positions when each video frame is transmitted and their subsequent positions between the transmission of the video frames;
periodically identifying which of the attendees is currently speaking at a rate significantly faster than the prescribed video frame rate;
periodically generating an indicator which comprises the location of the attendee who is currently speaking as depicted in the last-transmitted video frame regardless of their current position;
embedding the indicators as they are generated into the obtained audio of the event using an audio watermarking technique; and
continuously transmitting the audio of the event with the embedded indicators to the client computing device for use in highlighting a region in the last-transmitted video frame depicting the attendee at the location specified in the last-embedded indicator. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for facilitating the identification of a current speaker in each frame of a low frame-rate video, comprising:
-
a general purpose computing device;
at least one video camera;
at least one microphone; and
a computer program comprising program modules executable by the computing device, comprising, a video stream creation module which generates a data stream of video frames at a prescribed frame rate, a current speaker detection module which, periodically identifies the current speaker among the persons depicted in each video frame of the video stream at a rate substantially faster than the video frame rate, and tracks the movements of the persons depicted in each video frame between the generation of said frames so as to equate their current location with their original location when the video frame was generated;
a current speaker data module which generates indicators, each of which specifies, the location of a person depicted in a video frame associated with the indicator, and whether the person whose location is specified is currently speaking or not, an audio data stream creation module which generates a continuous stream of audio data, and an audio watermark encoder module which embeds said indicators as they are generated into the audio data stream using an audio watermarking technique. a transmission module which transmits the video stream and indicator-embedded audio stream to a client computing device or transfers the streams to storage for later transmission to the client computing device. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer-implemented process for highlighting the current speaker in each frame of a low frame-rate video of an event having multiple people in attendance, comprising using a computer to perform the following process actions:
-
obtaining the low frame-rate video of the event;
obtaining a continuous audio stream of the event which has embedded therein via a watermarking technique periodically generated indicators, each of which comprises the location of the attendee who is currently speaking in a last-obtained video frame, wherein said indicators are generated at a rate significantly faster than the video frame rate;
synchronizing the audio and video streams; and
extracting each indicator from the audio stream as it is obtained and highlighting a region in last-obtained video frame based on the location of the current speaker specified in the indicator under consideration, wherein said highlighting visually distinguishes a current speaker from all other attendees depicted in the last-obtained video frame. - View Dependent Claims (16)
-
-
17. A system for highlighting the current speaker in each frame of a low frame-rate video of an event having multiple people in attendance, comprising:
-
a general purpose computing device;
a computer program comprising program modules executable by the computing device, comprising, a video input module which obtains the low frame-rate video of the event, an audio input module which obtains a continuous audio stream of the event that has embedded therein via a watermarking technique periodically generated indicators, each of which comprises the location of an attendee depicted in the last-obtained video frame and indicates if that attendee is currently speaking or not, wherein said indicators are generated at a rate significantly faster than the video frame rate, a synchronizer module which synchronizes the audio and video streams, an audio watermark detector module which extracts each indicator from the audio stream as it is obtained, a speaker highlighting module which highlights a region in the last-obtained video frame that is associated with an attendee that a last-extracted indicator specifies is currently speaking, based on the location of the attendee specified in that indicator, wherein said highlighting visually distinguishes a current speaker from all other attendees depicted in the last-obtained video frame that are not currently speaking. - View Dependent Claims (18, 19, 20, 21)
-
Specification