Systems and methods for queryable graph representations of videos

US 10,108,709 B1
Filed: 12/01/2017
Issued: 10/23/2018
Est. Priority Date: 04/11/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving video data for a first video;

deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;

an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;

performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video;

performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video;

generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;

generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;

generating a global graph representation of a plurality of videos that includes the first video, wherein nodes of the global graph representation are derived from respective local graph representations of respective videos of the plurality of videos;

generating a global atomic unit comprising the local graph representation and attributes derived from the local graph representation, wherein the global graph representation includes a first node corresponding to a global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos;

receiving a query, from a user, of the global graph representation for information associated with content of the plurality of videos; and

producing, in response to the query and by analyzing the global graph representation, a response for the user, the response including the information associated with the content of the plurality of videos.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one aspect, the present disclosure relates to a method which, in one embodiment, includes: receiving video data for a first video and deconstructing the video data of the first video into a plurality of context windows; performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video; performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video; generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units; and generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units.

Citations

24 Claims

1. A computer-implemented method, comprising:
- receiving video data for a first video;
  
  deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;
  
  an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;
  
  performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video;
  
  performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video;
  
  generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;
  
  generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;
  
  generating a global graph representation of a plurality of videos that includes the first video, wherein nodes of the global graph representation are derived from respective local graph representations of respective videos of the plurality of videos;
  
  generating a global atomic unit comprising the local graph representation and attributes derived from the local graph representation, wherein the global graph representation includes a first node corresponding to a global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos;
  
  receiving a query, from a user, of the global graph representation for information associated with content of the plurality of videos; and
  
  producing, in response to the query and by analyzing the global graph representation, a response for the user, the response including the information associated with the content of the plurality of videos.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The method of claim 1,wherein generating the global graph representation of the plurality of videos comprises applying global graph edges connecting the plurality of nodes to each other, the global graph edges representing relationships between the connected nodes based, at least in part, on the attributes of the corresponding global atomic units.
  - 3. The method of claim 1, wherein at least one of the plurality of context windows that comprises an image frame is comprised of:
    - a plurality of image frames from a continuous portion of the first video, ora plurality of image frames from discontinuous portions of the first video.
  - 4. The method of claim 1, wherein at least one of the plurality of context windows that comprises an audio frame is comprised of:
    - an audio frame from a continuous portion of the first video, ora plurality of audio frames from discontinuous portions of the first video.
  - 5. The method of claim 1, wherein performing the video analytic function comprises:
    - converting the image frames to first number vectors and converting the audio frames into separate, second number vectors; and
      
      combining the first number vectors and second number vectors into a model for processing a context window.
  - 6. The method of claim 1, whereinperforming the video analytic function on the image frame comprises a neural-network based analysis to determine the content of the image frame, andperforming the video analytic function on the audio frame comprises performing a neural-network based analysis to determine the content of the audio frame.
  - 7. The method of claim 1, wherein at least one of the local graph edges representing relationships between one local atomic unit with another local atomic unit corresponds to a relationship wherein:
    - a same person or object is detected in both a context window corresponding to the one local atomic unit and a context window corresponding to the other local atomic unit,a context window corresponding to the one local atomic unit occurs prior in time to a context window corresponding to the other local atomic unit, orthe same language is being spoken in a context window corresponding to the one local atomic unit and a context window corresponding to the other local atomic unit.
  - 8. The method of claim 1, wherein receiving the video data comprises simultaneously receiving, from a plurality of media input channels, at least one of image data corresponding to the image frames and audio data corresponding to the audio frames.
  - 9. The method of claim 1, wherein producing the response comprises:
    - updating, in response to determining that the global graph representation cannot answer the query, the global graph representation to include at least one additional characteristic of at least one video of the plurality of videos; and
      
      producing the response by analyzing the updated global graph representation.
  - 10. The method of claim 9, wherein updating the global graph representation comprises:
    - performing, on at least one context window of the plurality of context windows of the first video, at least one additional video analytic function to identify one or more additional characteristics of the at least one context window;
      
      adding, to at least one respective local atomic unit corresponding to the at least one context window, one or more additional attributes corresponding to the identified one or more additional characteristics; and
      
      updating the local graph edges of the local graph representation of the first video, based at least in part on the one or more additional attributes.
  - 11. The method of claim 9, wherein updating the global graph representation comprises:
    - deconstructing the video data of the first video into at least one additional context window;
      
      performing, on the at least one additional context window, the video analytic to identify one or more characteristics of the at least one additional context window;
      
      generating, for each of the at least one additional context window, a respective local atomic unit comprising attributes corresponding to the identified one or more characteristics of the respective additional context window to form at least one additional local atomic units;
      
      adding, to the local graph representation of the first video, at least one additional node corresponding to the at least one additional local atomic units; and
      
      updating the local graph edges of the local graph representation connecting the plurality of nodes to each other based, at least in part, on the attributes of the at least one additional local atomic units.
  - 12. The method of claim 9, wherein updating the global graph representation comprises:
    - performing, on a characteristic of at least one context window of the plurality of context windows of the first video, at least one additional video analytic function to derive one or more additional characteristics of the at least one context window;
      
      adding, to at least one respective local atomic unit corresponding to the at least one context window, one or more additional attributes corresponding to the identified one or more additional characteristics; and
      
      updating the local graph edges of the local graph representation of the first video, based at least in part on the one or more additional attributes.
  - 13. The method of claim 1, wherein performing the video analytic function comprises performing caption generation based on the image frame.
  - 14. The method of claim 1, wherein performing the video analytic function comprises performing, on the image frame, identification of an object within the image frame.
  - 15. The method of claim 1, wherein performing the video analytic function comprises performing, on the image frame, image classification.
  - 16. The method of claim 15, wherein performing the image classification comprises utilizing a neural-network based analysis.
  - 17. The method of claim 1, wherein performing the video analytic function comprises performing, on the audio frame, at least one of transcription of speech, speaker identification, and analysis of background noise.
  - 18. The method of claim 1, wherein performing the video analytic function comprises performing, on the audio frame, identification of at least one of a language spoken, a speaker, an accent of a speaker, an emotion associated with speech, censorship, and ambient noise.
  - 19. The method of claim 1, wherein performing the video analytic function comprises performing, on the audio frame, identification of one language spoken in one time interval and another language spoken in another, different time interval.
  - 20. The method of claim 1, wherein performing the video analytic function comprises performing, on the audio frame, at least one of analysis of tone in speech and analysis of context of speech.
  - 21. The method of claim 20, wherein performing at least one of analysis of tone in speech and analysis of context in speech comprises utilizing a neural-network based analysis.
  - 22. The method of claim 1, further comprising cleaning or filtering out background noise to generate a cleaner audio frame, using a neural network-based model.

23. A system, comprising:
- one or more processors;
  
  a memory device operatively coupled to the one or more processors and storing instructions which, cause the system to perform functions that comprise;
  
  receiving video data for a first video;
  
  deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;
  
  an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;
  
  performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video;
  
  performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video;
  
  generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;
  
  generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;
  
  generating a global graph representation of a plurality of videos that includes the first video, wherein nodes of the global graph representation are derived from respective local graph representations of respective videos of the plurality of videos;
  
  generating a global atomic unit comprising the local graph representation and attributes derives from the local graph representation, wherein the global graph representation includes a first node corresponding to a global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos;
  
  receiving a query, from user, of the global graph representation for information associated with content of the plurality of videos; and
  
  producing, in response to the query and by analyzing the global graph representation, a response for the user, the response including the information associated with the content of the plurality of videos.

24. A non-transitory computer-readable medium storing instructions which, when executed by one or more processors, cause one or more computing devices to perform functions that comprise:
- receiving video data for a first video;
  
  deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;
  
  an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;
  
  performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video;
  
  performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video;
  
  generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;
  
  generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;
  
  generating a global graph representation of a plurality of videos that includes the first video, wherein nodes of the global graph representation are derived from respective local graph representations of respective videos of the plurality of videos;
  
  generating a global atomic unit comprising the local graph representation and attributes derived from the local graph representation, wherein the global graph representation includes a first node corresponding to a global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos;
  
  receiving a query, from a user, of the global graph representation for information associated with content of the plurality of videos; and
  
  producing, in response to the query and by analyzing the global graph representation, a response for the user, the response including the information associated with the content of the plurality of videos.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Digital Reasoning Systems, Inc (Smarsh Incorporated)
Original Assignee
Digital Reasoning Systems, Inc (Smarsh Incorporated)
Inventors
Frey, John, Whitaker, James, Russell, Matthew
Primary Examiner(s)
Couso, Jose

Application Number

US15/829,055
Time in Patent Office

326 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/738   Presentation of query results

G06F 16/743   a collection of video files...

G06F 16/7834   using audio features

G06F 16/7837   using objects detected or r...

G06F 18/29   Graphical models, e.g. Baye...

G06V 10/84   using probabilistic graphic...

G06V 20/42   of sport video content

G06V 20/49   Segmenting video sequences,...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/005   Language recognition

G10L 15/20   Speech recognition techniqu...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 25/30   using neural networks

Systems and methods for queryable graph representations of videos

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for queryable graph representations of videos

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links