Catch-up video buffering
First Claim
Patent Images
1. A method, comprising:
- receiving audio-visual (AV) data as part of a video conference call;
detecting a face of a participant of the video conference call using an imaging device associated with a device of the participant;
determining that the participant'"'"'s face is oriented in a direction toward a display associated with the device of the participant by applying image processing to a first image captured by a camera;
outputting live content from the video conference call to the display at substantially a same time as the first content is received based on determining that the participant'"'"'s face is oriented in the direction toward the display;
determining, at a first time, that the participant is no longer observing the conference call based on image processing of a second image captured by the camera failing to detect that the participant is facing the display;
determining, at a second time, that the participant is again observing the conference call by determining that the participant is facing the display, based on image processing of a third image captured by the camera, wherein the second time is after the first time;
storing content from the AV data after the first time;
performing speech-recognition processing on an audio portion of the AV data;
identifying one or more sections of the audio portion of the AV data, the one or more sections comprising one or more of;
silences, pauses, spoken filler words, non-lexical utterances, and false starts;
outputting stored content to the display, wherein the outputting occurs after the second time and wherein the stored content is output at an accelerated rate until the stored content reaches the live content at a third time, and wherein the one or more sections are omitted when the stored content is output at the accelerated rate; and
outputting the live content at a normal rate after the third time.
1 Assignment
0 Petitions
Accused Products
Abstract
A system determines if someone watching a live video feed looks or moves away from a display screen, and when their attention is back on the display, provides an accelerated recap of the content that they missed. The video component of the feed may be shown as a series of selected still images or clips from the original feed, while audio and/or text captioning is output at an accelerated rate. The rate may be adaptively adjusted to maintain a consistent speed, and superfluous content may be omitted. When the recap catches up to the live feed, output returns to regular speed.
50 Citations
18 Claims
-
1. A method, comprising:
-
receiving audio-visual (AV) data as part of a video conference call; detecting a face of a participant of the video conference call using an imaging device associated with a device of the participant; determining that the participant'"'"'s face is oriented in a direction toward a display associated with the device of the participant by applying image processing to a first image captured by a camera; outputting live content from the video conference call to the display at substantially a same time as the first content is received based on determining that the participant'"'"'s face is oriented in the direction toward the display; determining, at a first time, that the participant is no longer observing the conference call based on image processing of a second image captured by the camera failing to detect that the participant is facing the display; determining, at a second time, that the participant is again observing the conference call by determining that the participant is facing the display, based on image processing of a third image captured by the camera, wherein the second time is after the first time; storing content from the AV data after the first time; performing speech-recognition processing on an audio portion of the AV data; identifying one or more sections of the audio portion of the AV data, the one or more sections comprising one or more of;
silences, pauses, spoken filler words, non-lexical utterances, and false starts;outputting stored content to the display, wherein the outputting occurs after the second time and wherein the stored content is output at an accelerated rate until the stored content reaches the live content at a third time, and wherein the one or more sections are omitted when the stored content is output at the accelerated rate; and outputting the live content at a normal rate after the third time. - View Dependent Claims (2, 3)
-
-
4. A computing system, comprising:
-
at least one processor; a data buffer; and at least one memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor to; receive audio-visual (AV) data; output live content from the AV data at a first rate; detect, at a first time, a negative paying attention signature based upon processing of captured sensor data; determine, at a second time, a positive paying attention signature based upon processing of subsequent captured sensor data, wherein the second time is after the first time; store content after the first time; output the stored content after the second time, the stored content being output at an accelerated rate until the stored content coincides with the live content at a third time, wherein the third time is after the second time; perform speech processing on an audio portion of the AV data; and determine an adjusted accelerated rate based on the speech processing to maintain a consistent rate of output of the audio portion in terms of words-per-unit-of-time or phonemes-per-unit-of-time. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising program code to configure the computing device to:
-
receive audio-visual (AV) data; output live content from the AV data at a first rate; detect, at a first time, a negative paying attention signature based upon processing of captured sensor data; determine, at a second time, a positive paying attention signature based upon processing of subsequent captured sensor data, wherein the second time is after the first time; store content after the first time; output the stored content after the second time, the stored content being output at an accelerated rate until the stored content coincides with the live content at a third time, wherein the third time is after the second time select one-or-more video frames received after the first time and before the third time based upon one or more of; relative maxima of motion activity in comparison to a range of antecedent and succedent video frames, motion activity exceeding a first threshold, relative maxima of color histogram changes in comparison to a range of antecedent and succedent video frames, color histogram changes exceeding a second threshold, appearance of a new object, and a failure to detect a previously detected object; and sequentially output each of the one-or-more selected frames as a still image or video clip via a display while the stored content is output at the accelerated rate, wherein the stored content outputted at the accelerated rate comprises one or more of an audio portion of the AV data or text related to the audio portion of the AV data. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
Specification