END-TO-END VISUAL RECOGNITION SYSTEM AND METHODS
First Claim
1. A visual recognition apparatus for identifying objects captured in a video stream having a captured time period, the apparatus comprising:
- an image sensor configured for capturing a video stream;
a computer processor; and
programming for processing said video stream to perform visual recognition by performing steps comprising;
capturing the video stream from said image sensor;
associating each frame in an image with a corresponding frame in temporally adjacent images, or in images taken from nearby vantage points; and
temporally aggregating statistics computed at one or more collections of temporally corresponding frames, into a descriptor.
1 Assignment
0 Petitions
Accused Products
Abstract
We describe an end-to-end visual recognition system, where “end-to-end” refers to the ability of the system of performing all aspects of the system, from the construction of “maps” of scenes, or “models” of objects from training data, to the determination of the class, identity, location and other inferred parameters from test data. Our visual recognition system is capable of operating on a mobile hand-held device, such as a mobile phone, tablet or other portable device equipped with sensing and computing power. Our system employs a video based feature descriptor, and we characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system scores objects in the field of view based on their ranking.
64 Citations
37 Claims
-
1. A visual recognition apparatus for identifying objects captured in a video stream having a captured time period, the apparatus comprising:
-
an image sensor configured for capturing a video stream; a computer processor; and programming for processing said video stream to perform visual recognition by performing steps comprising; capturing the video stream from said image sensor; associating each frame in an image with a corresponding frame in temporally adjacent images, or in images taken from nearby vantage points; and temporally aggregating statistics computed at one or more collections of temporally corresponding frames, into a descriptor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A visual recognition method for identifying objects captured in a video stream having a captured time period, the method comprising:
-
capturing the video stream on an electronic device; enabling the user to select a target object or scene for training; capturing the video stream from said image sensor; associating each frame in an image with a corresponding frame in temporally adjacent images, or in images taken from nearby vantage points; and temporally aggregating statistics computed at one or more collections of temporally corresponding frames, into a descriptor. - View Dependent Claims (13, 14, 15)
-
-
16. A visual recognition apparatus for identifying objects captured in a video stream having a captured time period, the apparatus comprising:
-
an image sensor configured for capturing a video stream; a computer processor; and programming for processing said video stream to perform visual recognition by performing steps comprising; capturing the video stream from said image sensor; associating each frame in an image with a corresponding frame in temporally adjacent images, or in images taken from nearby vantage points; temporally aggregating statistics computed at one or more collections of temporally corresponding frames, into a descriptor; spatially aggregating such statistics into a representation that is insensitive to nuisance factor and distinctive; exploiting such a representation within a classification scheme to enable the detection, localization, recognition and categorization of objects and scenes in video; and displaying the result of the classification scheme by overlaying information on the live video stream, optionally localized and overlaid on the object of interest. - View Dependent Claims (18, 19, 23, 24, 25, 26)
-
-
17. A visual recognition apparatus for identifying objects captured in a video stream having a captured time period, the apparatus comprising:
-
an image sensor configured for capturing a video stream; a computer processor; and programming for processing said video stream to perform visual recognition by performing steps comprising; capturing the video stream from said image sensor; optionally selecting a plurality of features corresponding to translational, similarity, affine or more general reference frames from the video stream for objects in a field of view of the video stream; performing such a selection at a plurality of scales, and using topological consistency across scale as a criterion for propagating said general reference frames across different scales; associating each frame in an image with a corresponding frame in temporally adjacent images, or in images taken from nearby vantage points; temporally aggregating statistics computed at one or more collections of temporally corresponding frames, into a descriptor; spatially aggregating such statistics into a representation that is insensitive to nuisance factor and distinctive; exploiting such a representation within a classification scheme to enable the detection, localization, recognition and categorization of objects and scenes in video; and displaying the result of the classification scheme by overlaying information on the live video stream, optionally localized and overlaid on the object of interest. - View Dependent Claims (20, 21, 22)
-
-
27. A visual recognition method for identifying objects captured in a video stream having a captured time period, the method comprising:
-
capturing the video stream on an electronic device; enabling the user to select a target object or scene for training; capturing the video stream from said image sensor; associating each frame in an image with a corresponding frame in temporally adjacent images, or in images taken from nearby vantage points; temporally aggregating statistics computed at one or more collections of temporally corresponding frames, into a descriptor; spatially aggregating such statistics into a representation that is insensitive to nuisance factor and distinctive; exploiting such a representation within a classification scheme to enable the detection, localization, recognition and categorization of objects and scenes in video; and displaying the result of the classification scheme by overlaying information on the live video stream, optionally localized and overlaid on the object of interest. - View Dependent Claims (29, 30, 34, 35, 36, 37)
-
-
28. A visual recognition method for identifying objects captured in a video stream having a captured time period, the method comprising:
-
capturing the video stream on an electronic device; enabling the user to select a target object or scene for training; capturing the video stream from said image sensor; optionally selecting a plurality of features corresponding to translational, similarity, affine or more general reference frames from the video stream for objects in a field of view of the video stream; performing such a selection at a plurality of scales, and using topological consistency across scale as a criterion for propagating said general reference frames across different scales; associating each frame in an image with a corresponding frame in temporally adjacent images, or in images taken from nearby vantage points; temporally aggregating statistics computed at one or more collections of temporally corresponding frames, into a descriptor; spatially aggregating such statistics into a representation that is insensitive to nuisance factor and distinctive; exploiting such a representation within a classification scheme to enable the detection, localization, recognition and categorization of objects and scenes in video; and displaying the result of the classification scheme by overlaying information on the live video stream, optionally localized and overlaid on the object of interest. - View Dependent Claims (31, 32, 33)
-
Specification