Method and system for embedding visual intelligence
First Claim
1. A system for embedding visual intelligence, the system comprising:
- one or more processors and a memory having instructions such that when the instructions are executed, the one or more processors perform operations of;
receiving an input video comprising input video pixels representing at least one action and at least one object having a location;
processing at least one input query to elicit information regarding the input video;
generating microactions from the input video using a set of motion sensitive filters derived from a series of filtering and max operations in repeating layers;
learning of a relationship between the input video pixels and the microactions in both unsupervised and supervised manners;
learning, from the microactions, at least one concept, comprising spatio-temporal patterns, and a set of causal relationships between the spatio-temporal patterns in an automatic, unsupervised manner using form and structure learning techniques;
learning to acquire new knowledge from the spatio-temporal patterns using mental imagery models in an unsupervised manner;
andpresenting a visual output to a user based on the learned set of spatio-temporal patterns and the new knowledge to aid the user in visually comprehending the at least one action in the input video.
1 Assignment
0 Petitions
Accused Products
Abstract
Described is a method and system for embedding unsupervised learning into three critical processing stages of the spatio-temporal visual stream. The system first receives input video comprising input video pixels representing at least one action and at least one object having a location. Microactions are generated from the input image using a set of motion sensitive filters. A relationship between the input video pixels and the microactions is then learned, and a set of spatio-temporal concepts is learned from the microactions. The system then learns to acquire new knowledge from the spatio-temporal concepts using mental imagery processes. Finally, a visual output is presented to a user based on the learned set of spatio-temporal concepts and the new knowledge to aid the user in visually comprehending the at least one action in the input video.
42 Citations
22 Claims
-
1. A system for embedding visual intelligence, the system comprising:
-
one or more processors and a memory having instructions such that when the instructions are executed, the one or more processors perform operations of; receiving an input video comprising input video pixels representing at least one action and at least one object having a location; processing at least one input query to elicit information regarding the input video; generating microactions from the input video using a set of motion sensitive filters derived from a series of filtering and max operations in repeating layers; learning of a relationship between the input video pixels and the microactions in both unsupervised and supervised manners; learning, from the microactions, at least one concept, comprising spatio-temporal patterns, and a set of causal relationships between the spatio-temporal patterns in an automatic, unsupervised manner using form and structure learning techniques; learning to acquire new knowledge from the spatio-temporal patterns using mental imagery models in an unsupervised manner; and presenting a visual output to a user based on the learned set of spatio-temporal patterns and the new knowledge to aid the user in visually comprehending the at least one action in the input video. - View Dependent Claims (2, 3, 4, 5, 6, 7, 22)
-
-
8. A computer-implemented method for embedding visual intelligence, comprising acts of:
-
receiving an input video comprising input video pixels representing at least one action and at least one object having a location; processing at least one input query to elicit information regarding the input video; generating microactions from the input video using a set of motion sensitive filters derived from a series of filtering and max operations in repeating layers; learning of a relationship between the input video pixels and the microactions in both unsupervised and supervised manners; learning, from the microactions, at least one concept, comprising spatio-temporal patterns, and a set of causal relationships between the spatio-temporal patterns in an automatic, unsupervised manner using form and structure learning techniques; learning to acquire new knowledge from the spatio-temporal patterns using mental imagery models in an unsupervised manner; and presenting a visual output to a user based on the learned set of spatio-temporal patterns and the new knowledge to aid the user in visually comprehending the at least one action in the input video. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product for embedding visual intelligence, the computer program product comprising:
- computer-readable instruction means stored on a non-transitory computer-readable medium that are executable by a computer having a processor for causing the processor to perform operations of;
receiving an input video comprising input video pixels representing at least one action and at least one object having a location; processing at least one input query to elicit information regarding the input video; generating microactions from the input video using a set of motion sensitive filters derived from a series of filtering and max operations in repeating layers; learning of a relationship between the input video pixels and the microactions in both unsupervised and supervised manners; learning, from the microactions, at least one concept, comprising spatio-temporal patterns, and a set of causal relationships between the spatio-temporal action patterns in an automatic, unsupervised manner using form and structure learning techniques; learning to acquire new knowledge from the spatio-temporal patterns using mental imagery models in an unsupervised manner; and presenting a visual output to a user based on the learned set of spatio-temporal patterns and the new knowledge to aid the user in visually comprehending the at least one action in the input video. - View Dependent Claims (16, 17, 18, 19, 20, 21)
- computer-readable instruction means stored on a non-transitory computer-readable medium that are executable by a computer having a processor for causing the processor to perform operations of;
Specification