×

Systems and methods for queryable graph representations of videos

  • US 9,858,340 B1
  • Filed: 04/11/2017
  • Issued: 01/02/2018
  • Est. Priority Date: 04/11/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • receiving video data for a first video;

    deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;

    an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;

    performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video, wherein performing the video analytic function on the image frame comprises utilizing a neural-network based analysis to perform at least one of object detection, object localization, caption generation, and segmentation;

    performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video, wherein performing the video analytic function on the audio frame comprises utilizing a neural-network based analysis to perform at least one of language detection, transcription, speaker diarization, and tonal analysis;

    generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;

    generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;

    generating a global graph representation of a plurality of videos that includes the first video;

    receiving a query of the global graph representation for information associated with content of the plurality of videos; and

    producing, in response to the query and by analyzing the global graph representation, a response including the information associated with the content of the plurality of videos.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×