Systems and methods for queryable graph representations of videos
First Claim
1. A computer-implemented method, comprising:
- receiving video data for a first video;
deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;
an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;
performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video;
performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video;
generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;
generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;
generating a global graph representation of a plurality of videos that includes the first video, wherein nodes of the global graph representation are derived from respective local graph representations of respective videos of the plurality of videos;
generating a global atomic unit comprising the local graph representation and attributes derived from the local graph representation, wherein the global graph representation includes a first node corresponding to a global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos;
receiving a query, from a user, of the global graph representation for information associated with content of the plurality of videos; and
producing, in response to the query and by analyzing the global graph representation, a response for the user, the response including the information associated with the content of the plurality of videos.
6 Assignments
0 Petitions
Accused Products
Abstract
In one aspect, the present disclosure relates to a method which, in one embodiment, includes: receiving video data for a first video and deconstructing the video data of the first video into a plurality of context windows; performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video; performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video; generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units; and generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units.
-
Citations
24 Claims
-
1. A computer-implemented method, comprising:
-
receiving video data for a first video; deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of; an image frame of a segment of the first video from the video data, and an audio frame of a segment the first video from the video data; performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video; performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video; generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units; generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units; generating a global graph representation of a plurality of videos that includes the first video, wherein nodes of the global graph representation are derived from respective local graph representations of respective videos of the plurality of videos; generating a global atomic unit comprising the local graph representation and attributes derived from the local graph representation, wherein the global graph representation includes a first node corresponding to a global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos; receiving a query, from a user, of the global graph representation for information associated with content of the plurality of videos; and producing, in response to the query and by analyzing the global graph representation, a response for the user, the response including the information associated with the content of the plurality of videos. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system, comprising:
-
one or more processors; a memory device operatively coupled to the one or more processors and storing instructions which, cause the system to perform functions that comprise; receiving video data for a first video; deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of; an image frame of a segment of the first video from the video data, and an audio frame of a segment the first video from the video data; performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video; performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video; generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units; generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units; generating a global graph representation of a plurality of videos that includes the first video, wherein nodes of the global graph representation are derived from respective local graph representations of respective videos of the plurality of videos; generating a global atomic unit comprising the local graph representation and attributes derives from the local graph representation, wherein the global graph representation includes a first node corresponding to a global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos; receiving a query, from user, of the global graph representation for information associated with content of the plurality of videos; and producing, in response to the query and by analyzing the global graph representation, a response for the user, the response including the information associated with the content of the plurality of videos.
-
-
24. A non-transitory computer-readable medium storing instructions which, when executed by one or more processors, cause one or more computing devices to perform functions that comprise:
-
receiving video data for a first video; deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of; an image frame of a segment of the first video from the video data, and an audio frame of a segment the first video from the video data; performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video; performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video; generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units; generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units; generating a global graph representation of a plurality of videos that includes the first video, wherein nodes of the global graph representation are derived from respective local graph representations of respective videos of the plurality of videos; generating a global atomic unit comprising the local graph representation and attributes derived from the local graph representation, wherein the global graph representation includes a first node corresponding to a global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos; receiving a query, from a user, of the global graph representation for information associated with content of the plurality of videos; and producing, in response to the query and by analyzing the global graph representation, a response for the user, the response including the information associated with the content of the plurality of videos.
-
Specification