Systems and methods for queryable graph representations of videos

US 9,858,340 B1
Filed: 04/11/2017
Issued: 01/02/2018
Est. Priority Date: 04/11/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving video data for a first video;

deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;

an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;

performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video, wherein performing the video analytic function on the image frame comprises utilizing a neural-network based analysis to perform at least one of object detection, object localization, caption generation, and segmentation;

performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video, wherein performing the video analytic function on the audio frame comprises utilizing a neural-network based analysis to perform at least one of language detection, transcription, speaker diarization, and tonal analysis;

generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;

generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;

generating a global graph representation of a plurality of videos that includes the first video;

receiving a query of the global graph representation for information associated with content of the plurality of videos; and

producing, in response to the query and by analyzing the global graph representation, a response including the information associated with the content of the plurality of videos.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one aspect, the present disclosure relates to a method which, in one embodiment, includes: receiving video data for a first video; deconstructing the video data of the first video into a plurality of context windows; performing, on each context window that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window; performing, on each context window that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window; generating, for each context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units; and generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units.

88 Citations

View as Search Results

21 Claims

1. A computer-implemented method, comprising:
- receiving video data for a first video;
  
  deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;
  
  an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;
  
  performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video, wherein performing the video analytic function on the image frame comprises utilizing a neural-network based analysis to perform at least one of object detection, object localization, caption generation, and segmentation;
  
  performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video, wherein performing the video analytic function on the audio frame comprises utilizing a neural-network based analysis to perform at least one of language detection, transcription, speaker diarization, and tonal analysis;
  
  generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;
  
  generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;
  
  generating a global graph representation of a plurality of videos that includes the first video;
  
  receiving a query of the global graph representation for information associated with content of the plurality of videos; and
  
  producing, in response to the query and by analyzing the global graph representation, a response including the information associated with the content of the plurality of videos.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 21)
- - 2. The method of claim 1, further comprising:
    - generating a global atomic unit comprising the local graph representation and attributes derived from the local graph representation,wherein the global graph representation includes a first node corresponding to the global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos, and wherein generating the global graph representation of the plurality of videos comprises applying global graph edges connecting the plurality of nodes to each other, the global graph edges representing relationships between the connected nodes based, at least in part, on the attributes of the corresponding global atomic units.
  - 3. The method of claim 1, wherein the query is for information on videos of the plurality of videos wherein a particular object or person is present.
  - 4. The method of claim 1, wherein the query is for information on videos of the plurality of videos wherein a particular language is being spoken.
  - 5. The method of claim 1, wherein at least one of the plurality of context windows that comprises an image frame is comprised of:
    - a plurality of image frames from a continuous portion of the first video, ora plurality of image frames from discontinuous portions of the first video.
  - 6. The method of claim 1, wherein at least one of the plurality of context windows that comprises an audio frame is comprised of:
    - an audio frame from a continuous portion of the first video, ora plurality of audio frames from discontinuous portions of the first video.
  - 7. The method of claim 1, wherein the video analytic function comprises performing, on the image frame, at least one of image classification, object identification, and text detection.
  - 8. The method of claim 1, wherein the video analytic function comprises performing, on the audio frame, at least one of speech-to-text transcription, noise analysis, generating a cleaner audio frame, and speaker identification.
  - 9. The method of claim 1, wherein performing the video analytic function comprises:
    - converting the image frames to first number vectors and converting the audio frames into separate, second number vectors; and
      
      combining the first number vectors and second number vectors into a model for processing a context window.
  - 10. The method of claim 1, whereinperforming the video analytic function on the image frame comprises a neural-network based analysis to determine the content of the image frame, andperforming the video analytic function on the audio frame comprises performing a neural-network based analysis to determine the content of the audio frame.
  - 11. The method of claim 1, wherein at least one of the local graph edges representing relationships between one local atomic unit with another local atomic unit corresponds to a relationship wherein:
    - a same person or object is detected in both a context window corresponding to the one local atomic unit and a context window corresponding to the other local atomic unit,a context window corresponding to the one local atomic unit occurs prior in time to a context window corresponding to the other local atomic unit, orthe same language is being spoken in a context window corresponding to the one local atomic unit and a context window corresponding to the other local atomic unit.
  - 12. The method of claim 1, wherein receiving the video data comprises simultaneously receiving, from a plurality of media input channels, at least one of image data corresponding to the image frames and audio data corresponding to the audio frames.
  - 13. The method of claim 1, wherein producing the response comprises:
    - generating a global atomic unit comprising attributes derived from the local graph representation, wherein the global graph representation includes a first node corresponding to the global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos; and
      
      updating, in response to determining that the global graph representation cannot answer the query, the global graph representation to include at least one additional characteristic of at least one video of the plurality of videos; and
      
      producing the response by analyzing the updated global graph representation.
  - 14. The method of claim 13, wherein updating the global graph representation comprises:
    - performing, on at least one context window of the plurality of context windows of the first video, at least one additional video analytic function to identify one or more additional characteristics of the at least one context window;
      
      adding, to at least one respective local atomic unit corresponding to the at least one context window, one or more additional attributes corresponding to the identified one or more additional characteristics; and
      
      updating the local graph edges of the local graph representation of the first video, based at least in part on the one or more additional attributes.
  - 15. The method of claim 13, wherein the updating the global graph representation comprises:
    - deconstructing the video data of the first video into at least one additional context window;
      
      performing, on the at least one additional context window, the video analytic to identify one or more characteristics of the at least one additional context window;
      
      generating, for each of the at least one additional context window, a respective local atomic unit comprising attributes corresponding to the identified one or more characteristics of the respective additional context window to form at least one additional local atomic units;
      
      adding, to the local graph representation of the first video, at least one additional node corresponding to the at least one additional local atomic units; and
      
      updating the local graph edges of the local graph representation connecting the plurality of nodes to each other based, at least in part, on the attributes of the at least one additional local atomic units.
  - 16. The method of claim 13, wherein the updating the global graph representation comprises:
    - performing, on a characteristic of at least one context window of the plurality of context windows of the first video, at least one additional video analytic function to derive one or more additional characteristics of the at least one context window;
      
      adding, to at least one respective local atomic unit corresponding to the at least one context window, one or more additional attributes corresponding to the identified one or more additional characteristics; and
      
      updating the local graph edges of the local graph representation of the first video, based at least in part on the one or more additional attributes.
  - 21. The method of claim 1, wherein at least one of the local graph edges representing relationships between one local atomic unit with another local atomic unit corresponds to a relationship wherein:
    - a context window corresponding to the one local atomic unit occurs prior in time to a context window corresponding to the other local atomic unit,a context window corresponding to the one local atomic unit occurs subsequent in time to a context window corresponding to the other local atomic unit,a context window corresponding to the one local atomic unit is a sub-portion of a context window corresponding to the other local atomic unit, ora context window corresponding to the one local atomic unit overlaps a context window corresponding to the other local atomic unit.

17. A system, comprising:
- one or more processors;
  
  a memory device operatively coupled to the one or more processors and storing instructions which, cause the system to perform functions that comprise;
  
  receiving video data for a first video;
  
  deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;
  
  an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;
  
  performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video;
  
  performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video;
  
  generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;
  
  generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;
  
  generating a global graph representation of a plurality of videos that includes the first video, wherein nodes of the global graph representation are derived from respective local graph representations of respective videos of the plurality of videos;
  
  receiving a query of the global graph representation for information associated with content of the plurality of videos; and
  
  producing, in response to the query and by analyzing the global graph representation, a response including the information associated with the content of the plurality of videos.
- View Dependent Claims (18)
- - 18. The system of claim 17, wherein the functions performed by the system further comprise:
    - generating a global atomic unit comprising attributes derived from the local graph representation,wherein the global graph representation includes a first node corresponding to the global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos, and wherein generating the global graph representation of the plurality of videos comprises applying global graph edges connecting the plurality of nodes to each other, the global graph edges representing relationships between the connected nodes based, at least in part, on the attributes of the corresponding global atomic units.

19. A non-transitory computer-readable medium storing instructions which, when executed by one or more processors, cause one or more computing devices to perform functions that comprise:
- receiving video data for a first video;
  
  deconstructing the video data of the first video into a plurality of context windows, wherein each of the context windows comprises at least one of;
  
  an image frame of a segment of the first video from the video data, andan audio frame of a segment the first video from the video data;
  
  performing, on each context window of the plurality of context windows that includes an image frame, a video analytic function on the image frame to identify one or more characteristics of the context window that are associated with image-related content of the first video;
  
  performing, on each context window of the plurality of context windows that includes an audio frame, a video analytic function on the audio frame to identify one or more characteristics of the context window that are associated with audio-related content of the first video;
  
  generating, for each of the plurality of context windows, a respective local atomic unit comprising attributes derived from the identified one or more characteristics of the respective context window, to form a plurality of local atomic units;
  
  generating a local graph representation of the first video, comprising a plurality of nodes corresponding to the plurality of local atomic units, wherein generating the local graph representation comprises applying local graph edges connecting the plurality of nodes to each other, wherein the local graph edges represent relationships between the connected nodes based, at least in part, on the attributes of the corresponding local atomic units;
  
  generating a global graph representation of a plurality of videos that includes the first video, wherein one node of the global graph representation is derived from the local graph representation of the first video;
  
  receiving a query of the global graph representation for information associated with content of the plurality of videos; and
  
  producing, in response to the query and by analyzing the global graph representation, a response including the information associated with the content of the plurality of videos.
- View Dependent Claims (20)
- - 20. The non-transitory computer-readable medium of claim 19, wherein the functions performed by the one or more computing devices further comprise:
    - generating a global atomic unit comprising attributes derived from the local graph representation,wherein the global graph representation includes a first node corresponding to the global atomic unit corresponding to the first video and a plurality of second nodes corresponding to respective global atomic units of respective second videos of the plurality of videos, and wherein generating the global graph representation of the plurality of videos comprises applying global graph edges connecting the plurality of nodes to each other, the global graph edges representing relationships between the connected nodes based, at least in part, on the attributes of the corresponding global atomic units.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Digital Reasoning Systems, Inc (Smarsh Incorporated)
Original Assignee
Digital Reasoning Systems, Inc (Smarsh Incorporated)
Inventors
Frey, John, Whitaker, James, Russell, Matthew
Primary Examiner(s)
COUSO, JOSE L

Application Number

US15/484,406
Time in Patent Office

266 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/738   Presentation of query results

G06F 16/743   a collection of video files...

G06F 16/7834   using audio features

G06F 16/7837   using objects detected or r...

G06F 18/29   Graphical models, e.g. Baye...

G06V 10/84   using probabilistic graphic...

G06V 20/42   of sport video content

G06V 20/49   Segmenting video sequences,...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/005   Language recognition

G10L 15/20   Speech recognition techniqu...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 25/30   using neural networks

Systems and methods for queryable graph representations of videos

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

88 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for queryable graph representations of videos

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

88 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links