System and process for adding high frame-rate current speaker data to a low frame-rate video
First Claim
1. A computer-implemented process for facilitating the identification of a current speaker in each frame of a low frame-rate video, comprising using a computer to perform the following process actions:
- obtaining audio and video of an event having multiple people in attendance;
transmitting the video of the event at a prescribed frame rate to a client computing device;
continuously transmitting the audio of the event to the client computing device;
tracking the movements of the attendees and recording their positions when each video frame is transmitted and their subsequent positions between the transmission of the video frames;
periodically identifying which of the attendees is currently speaking at a rate significantly faster than the prescribed video frame rate;
periodically generating an indicator which comprises the location of the attendee who is currently speaking as depicted in the last-transmitted video frame regardless of their current position;
transmitting each of said indicators to the client computing device for use in highlighting a region in the last-transmitted video frame depicting the attendee at the location specified in the last-transmitted indicator.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and process for highlighting the current speaker on an on-going basis in each frame of a low frame-rate video of an event having multiple people in attendance, such as a video teleconference, is presented. In general, this is accomplished by periodically identifying an attendee that is currently speaking at a rate substantially faster than the video frame rate, and for each frame of the video updating the frame to highlight the attendee currently speaking. More particularly, an audio/visual (A/V) source provides separate video, audio, and current speaker data streams to a client computing device. The client device then uses these data streams to render and display the video and to periodically update the frame being displayed to highlight the current speaker depicted therein.
99 Citations
33 Claims
-
1. A computer-implemented process for facilitating the identification of a current speaker in each frame of a low frame-rate video, comprising using a computer to perform the following process actions:
-
obtaining audio and video of an event having multiple people in attendance; transmitting the video of the event at a prescribed frame rate to a client computing device; continuously transmitting the audio of the event to the client computing device; tracking the movements of the attendees and recording their positions when each video frame is transmitted and their subsequent positions between the transmission of the video frames; periodically identifying which of the attendees is currently speaking at a rate significantly faster than the prescribed video frame rate; periodically generating an indicator which comprises the location of the attendee who is currently speaking as depicted in the last-transmitted video frame regardless of their current position; transmitting each of said indicators to the client computing device for use in highlighting a region in the last-transmitted video frame depicting the attendee at the location specified in the last-transmitted indicator. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for facilitating the identification of a current speaker in each frame of a low frame-rate video, comprising:
-
a general purpose computing device; at least one video camera; at least one microphone; and a computer program comprising program modules executable by the computing device, comprising, a video stream creation module which generates a data stream of video frames at a prescribed frame rate, an audio data stream creation module which generates a continuous stream of audio data; a current speaker detection module which, periodically identifies the current speaker among the persons depicted in each video frame of the video stream at a rate substantially faster than the video frame rate, and tracks the movements of the persons depicted in each video frame between the generation of said frames so as to equate their current location with their original location when the video frame was generated; a current speaker data module which generates a data stream comprising current speaker indicators, each of which specifies, the location of a person depicted in a video frame associated with the indicator, and whether the person whose location is specified is currently speaking or not. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-implemented process for highlighting the current speaker in each frame of a low frame-rate video of an event having multiple people in attendance, comprising using a computer to perform the following process actions:
-
obtaining the low frame-rate video of the event; obtaining a continuous audio stream of the event; obtaining periodically generated indicators, each of which comprises the location of the attendee who is currently speaking in a last-obtained video frame, wherein said indicators are available at a rate significantly faster than the video frame rate; for each indicator obtained which relates to the last-obtained video frame, highlighting a region in that video frame based on the location of the current speaker specified in the indicator under consideration, wherein said highlighting visually distinguishes a current speaker from all other attendees depicted in the last-obtained video frame. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A system for highlighting the current speaker in each frame of a low frame-rate video of an event having multiple people in attendance, comprising:
-
a general purpose computing device; a computer program comprising program modules executable by the computing device, comprising, a video input module which obtains the low frame-rate video of the event, an audio input module which obtains a continuous audio stream of the event, a current speaker data input module which obtains periodically generated indicators, each of which comprises the location of the attendee depicted in the last-obtained video frame and indicates if that attendee is currently speaking or not, wherein said indicators are available at a rate significantly faster than the video frame rate, and a highlighting module which highlights a region in the last-obtained video frame that is associated with an attendee that an obtained indicator applicable to the last-obtained video frame specifies is currently speaking, based on the location of the attendee specified in that indicator, wherein said highlighting visually distinguishes a current speaker from all other attendees depicted in the last-obtained video frame that are not currently speaking. - View Dependent Claims (29, 30, 31, 32, 33)
-
Specification