Method and apparatus for sparse associative recognition and recall for visual media reasoning
First Claim
1. A system for visual media reasoning, the system comprising:
- one or more processors and a non-transitory memory having instructions encoded thereon such that when the instructions are executed, the one or more processors perform operations of;
filtering an input image having input data using a non-linear sparse coding module and a first series of sparse coding filter kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes;
performing object recognition on the set of sparse codes by using a neurally-inspired vision module to generate object and semantic labels for the set of sparse codes;
performing pattern completion on the object and semantic labels by using a spatiotemporal associative memory module to recall relevant meta-data in the input image;
fusing data related to the input image with the relevant meta-data using bi-directional feedback between the non-linear sparse coding module, the neurally-inspired vision module, and the spatiotemporal associative memory module; and
generating an annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured.
1 Assignment
0 Petitions
Accused Products
Abstract
Described is system and method for visual media reasoning. An input image is filtered using a first series of kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes. Object recognition is performed on the set of sparse codes to generate object and semantic labels for the set of sparse codes. Pattern completion is performed on the object and semantic labels to recall relevant meta-data in the input image. Bi-directional feedback is used to fuse the input data with the relevant meta-data. An annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured is generated.
9 Citations
18 Claims
-
1. A system for visual media reasoning, the system comprising:
one or more processors and a non-transitory memory having instructions encoded thereon such that when the instructions are executed, the one or more processors perform operations of; filtering an input image having input data using a non-linear sparse coding module and a first series of sparse coding filter kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes; performing object recognition on the set of sparse codes by using a neurally-inspired vision module to generate object and semantic labels for the set of sparse codes; performing pattern completion on the object and semantic labels by using a spatiotemporal associative memory module to recall relevant meta-data in the input image; fusing data related to the input image with the relevant meta-data using bi-directional feedback between the non-linear sparse coding module, the neurally-inspired vision module, and the spatiotemporal associative memory module; and generating an annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A computer-implemented method for visual media reasoning, comprising:
an act of causing one or more processors to execute instructions stored on a non-transitory memory such that upon execution, the one or more processors perform operations of; filtering an input image having input data using a non-linear sparse coding module and a first series of sparse coding filter kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes; performing object recognition on the set of sparse codes by using a neurally-inspired vision module to generate object and semantic labels for the set of sparse codes; performing pattern completion on the object and semantic labels by using a spatiotemporal associative memory module to recall relevant meta-data in the input image; fusing data related to the input image with the relevant meta-data using bi-directional feedback between the non-linear sparse coding module, the neurally-inspired vision module, and the spatiotemporal associative memory module; and generating an annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured. - View Dependent Claims (8, 9, 10, 11, 12)
-
13. A computer program product for visual media reasoning, the computer program product comprising computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having a processor for causing the processor to perform operations of:
-
filtering an input image having input data using a non-linear sparse coding module and a first series of sparse coding filter kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes; performing object recognition on the set of sparse codes by using a neurally-inspired vision module to generate object and semantic labels for the set of sparse codes; performing pattern completion on the object and semantic labels by using a spatiotemporal associative memory module to recall relevant meta-data in the input image; fusing data related to the input image with the relevant meta-data using bi-directional feedback between the non-linear sparse coding module, the neurally-inspired vision module, and the spatiotemporal associative memory module; and generating an annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification