Method and system for embedding visual intelligence

US 9,129,158 B1
Filed: 03/05/2012
Issued: 09/08/2015
Est. Priority Date: 03/05/2012
Status: Active Grant

First Claim

Patent Images

1. A system for embedding visual intelligence, the system comprising:

one or more processors and a memory having instructions such that when the instructions are executed, the one or more processors perform operations of;

receiving an input video comprising input video pixels representing at least one action and at least one object having a location;

processing at least one input query to elicit information regarding the input video;

generating microactions from the input video using a set of motion sensitive filters derived from a series of filtering and max operations in repeating layers;

learning of a relationship between the input video pixels and the microactions in both unsupervised and supervised manners;

learning, from the microactions, at least one concept, comprising spatio-temporal patterns, and a set of causal relationships between the spatio-temporal patterns in an automatic, unsupervised manner using form and structure learning techniques;

learning to acquire new knowledge from the spatio-temporal patterns using mental imagery models in an unsupervised manner;

andpresenting a visual output to a user based on the learned set of spatio-temporal patterns and the new knowledge to aid the user in visually comprehending the at least one action in the input video.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is a method and system for embedding unsupervised learning into three critical processing stages of the spatio-temporal visual stream. The system first receives input video comprising input video pixels representing at least one action and at least one object having a location. Microactions are generated from the input image using a set of motion sensitive filters. A relationship between the input video pixels and the microactions is then learned, and a set of spatio-temporal concepts is learned from the microactions. The system then learns to acquire new knowledge from the spatio-temporal concepts using mental imagery processes. Finally, a visual output is presented to a user based on the learned set of spatio-temporal concepts and the new knowledge to aid the user in visually comprehending the at least one action in the input video.

42 Citations

View as Search Results

22 Claims

1. A system for embedding visual intelligence, the system comprising:
- one or more processors and a memory having instructions such that when the instructions are executed, the one or more processors perform operations of;
  
  receiving an input video comprising input video pixels representing at least one action and at least one object having a location;
  
  processing at least one input query to elicit information regarding the input video;
  
  generating microactions from the input video using a set of motion sensitive filters derived from a series of filtering and max operations in repeating layers;
  
  learning of a relationship between the input video pixels and the microactions in both unsupervised and supervised manners;
  
  learning, from the microactions, at least one concept, comprising spatio-temporal patterns, and a set of causal relationships between the spatio-temporal patterns in an automatic, unsupervised manner using form and structure learning techniques;
  
  learning to acquire new knowledge from the spatio-temporal patterns using mental imagery models in an unsupervised manner;
  
  andpresenting a visual output to a user based on the learned set of spatio-temporal patterns and the new knowledge to aid the user in visually comprehending the at least one action in the input video.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 22)
- - 2. The system for embedding visual intelligence as set forth in claim 1, wherein the visual output is at least one of a video and a textual description.
  - 3. The system for embedding visual intelligence as set forth in claim 2, further comprising:
    - a spatio-temporal representations module for capturing event-invariant information in the input video using a series of filtering and max operations in repeating layers;
      
      an attention model module for generating video masks to focus attention of the spatio-temporal representations module to specific areas of the input video in order to generate the microactions; and
      
      a concept learning module for stringing together the microactions to compose full actions and learning of a set of relationships between the spatio-temporal patterns through form and structure learning.
  - 4. The system for embedding visual intelligence as set forth in claim 3, further comprising:
    - a visual object recognition module for determining the location of the at least one object in the input video; and
      
      a hypothesis module for generating at least one hypothesis of the at least one action based on known concepts and the at least one object in the input video.
  - 5. The system for embedding visual intelligence as set forth in claim 4, further comprising:
    - a visual inspection module for comparing the at least one hypothesis with the input video;
      
      a validation module for validating the at least one hypothesis using feedback from the visual inspection module; and
      
      an envisionment module for generating envisioned imagery of the at least one hypothesis to reason and gain new knowledge.
  - 6. The system for embedding visual intelligence as set forth in claim 5, further comprising:
    - a knowledgebase module for storing domain knowledge, the set of relationships between the spatio-temporal patterns from the concept learning module, and knowledge generated from reasoning on the envisioned imagery;
      
      a dialog processing module for parsing at least one input text query; and
      
      a symbolic reasoning module for locating answers to the at least one input text query in the knowledgebase module and outputting a textual description of the at least one input text query.
  - 7. The system for embedding visual intelligence as set forth in claim 6, wherein the set of relationships between the spatio-temporal patterns comprises a plurality of nodes, where each node represents a cluster of microactions.
  - 22. A video processing subsystem for a taskable smart camera system to be utilized with the system set forth in claim 1, comprising:
    - a video processor module;
      
      a camera module separate from the video processor module; and
      
      a common interface between the video processor module and the camera module.

8. A computer-implemented method for embedding visual intelligence, comprising acts of:
- receiving an input video comprising input video pixels representing at least one action and at least one object having a location;
  
  processing at least one input query to elicit information regarding the input video;
  
  generating microactions from the input video using a set of motion sensitive filters derived from a series of filtering and max operations in repeating layers;
  
  learning of a relationship between the input video pixels and the microactions in both unsupervised and supervised manners;
  
  learning, from the microactions, at least one concept, comprising spatio-temporal patterns, and a set of causal relationships between the spatio-temporal patterns in an automatic, unsupervised manner using form and structure learning techniques;
  
  learning to acquire new knowledge from the spatio-temporal patterns using mental imagery models in an unsupervised manner;
  
  andpresenting a visual output to a user based on the learned set of spatio-temporal patterns and the new knowledge to aid the user in visually comprehending the at least one action in the input video.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method for embedding visual intelligence as set forth in claim 8, wherein the visual output is at least one of a video and a textual description.
  - 10. The method for embedding visual intelligence as set forth in claim 9, further comprising acts of:
    - a spatio-temporal representations module for capturing event-invariant information in the input video using a series of filtering and max operations in repeating layers;
      
      an attention model module for generating video masks to focus attention of the spatio-temporal representations module to specific areas of the input video in order to generate the microactions; and
      
      a concept learning module for stringing together the microactions to compose full actions and learning of a set of relationships between the spatio-temporal patterns through form and structure learning.
  - 11. The method for embedding visual intelligence as set forth in claim 10, further comprising acts of:
    - determining the location of the at least one object in the input video within a visual object recognition module; and
      
      generating at least one hypothesis of the at least one action based on known concepts and the at least one object in the input video within a hypothesis module.
  - 12. The method for embedding visual intelligence as set forth in claim 11, further comprising acts of:
    - comparing the at least one hypothesis with the input video within a visual inspection module;
      
      validating the at least one hypothesis using feedback from the visual inspection module within a validation module; and
      
      generating envisioned imagery of the at least one hypothesis to reason and gain new knowledge within an envisionment module.
  - 13. The method for embedding visual intelligence as set forth in claim 12, further comprising acts of:
    - a knowledgebase module for storing domain knowledge, the set of relationships between the spatio-temporal patterns from the concept learning module, and knowledge generated from reasoning on the envisioned imagery;
      
      a dialog processing module for parsing at least one input text query; and
      
      a symbolic reasoning module for locating answers to the at least one input text query in the knowledgebase module and outputting a textual description of the at least one input text query.
  - 14. The method for embedding visual intelligence as set forth in claim 13, wherein the set of relationships between the spatio-temporal patterns comprises a plurality of nodes, where each node represents a cluster of microactions.

15. A computer program product for embedding visual intelligence, the computer program product comprising:
- computer-readable instruction means stored on a non-transitory computer-readable medium that are executable by a computer having a processor for causing the processor to perform operations of;
  
  receiving an input video comprising input video pixels representing at least one action and at least one object having a location;
  
  processing at least one input query to elicit information regarding the input video;
  
  generating microactions from the input video using a set of motion sensitive filters derived from a series of filtering and max operations in repeating layers;
  
  learning of a relationship between the input video pixels and the microactions in both unsupervised and supervised manners;
  
  learning, from the microactions, at least one concept, comprising spatio-temporal patterns, and a set of causal relationships between the spatio-temporal action patterns in an automatic, unsupervised manner using form and structure learning techniques;
  
  learning to acquire new knowledge from the spatio-temporal patterns using mental imagery models in an unsupervised manner;
  
  andpresenting a visual output to a user based on the learned set of spatio-temporal patterns and the new knowledge to aid the user in visually comprehending the at least one action in the input video.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer program product for embedding visual intelligence as set forth in claim 15, wherein the visual output is at least one of a video and a textual description.
  - 17. The computer program product for embedding visual intelligence as set forth in claim 16, further comprising instruction means for causing the processor to perform operations of:
    - a spatio-temporal representations module for capturing event-invariant information in the input video using a series of filtering and max operations in repeating layers;
      
      an attention model module for generating video masks to focus attention of the spatio-temporal representations module to specific areas of the input video in order to generate the microactions; and
      
      a concept learning module for stringing together the microactions to compose full actions and learning of a set of relationships between the spatio-temporal patterns through form and structure learning.
  - 18. The computer program product for embedding visual intelligence as set forth in claim 17, further comprising instruction means for causing the processor to perform operations of:
    - determining the location of the at least one object in the input video within a visual object recognition module; and
      
      generating at least one hypothesis of the at least one action based on known concepts and the at least one object in the input video within a hypothesis module.
  - 19. The computer program product for embedding visual intelligence as set forth in claim 18, further comprising instruction means for causing the processor to perform operations of:
    - comparing the at least one hypothesis with the input video within a visual inspection module;
      
      validating the at least one hypothesis using feedback from the visual inspection module within a validation module; and
      
      generating envisioned imagery of the at least one hypothesis to reason and gain new knowledge within an envisionment module.
  - 20. The computer program product for embedding visual intelligence as set forth in claim 19, further comprising instruction means for causing the processor to perform operations of:
    - a knowledgebase module for storing domain knowledge, the set of relationships between the spatio-temporal patterns from the concept learning module, and knowledge generated from reasoning on the envisioned imagery;
      
      a dialog processing module for parsing at least one input text query; and
      
      a symbolic reasoning module for locating answers to the at least one input text query in the knowledgebase module and outputting a textual description of the at least one input text query.
  - 21. The computer program product for embedding visual intelligence as set forth in claim 20, wherein the set of relationships between the spatio-temporal patterns comprises a plurality of nodes, where each node represents a cluster of microactions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
HRL Laboratories LLC (The Boeing Co.)
Original Assignee
HRL Laboratories LLC (The Boeing Co.)
Inventors
Medasani, Swarup, Chelian, Suhas E., Cheng, Shinko Y., Sundareswara, Rashmi N., Neely, Howard III
Primary Examiner(s)
Vu, Kim
Assistant Examiner(s)
BLOOM, NATHAN J

Application Number

US13/412,527
Time in Patent Office

1,282 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 18/2178   based on feedback of a supe...

G06F 18/231   Hierarchical techniques, i....

G06F 18/29   Graphical models, e.g. Baye...

G06T 2207/10016   Video; Image sequence

G06V 20/41   Higher-level, semantic clus...

G06V 40/20   Movements or behaviour, e.g...

G06V 40/23   Recognition of whole body m...

Method and system for embedding visual intelligence

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

42 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for embedding visual intelligence

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links