Method and apparatus for sparse associative recognition and recall for visual media reasoning

US 10,176,382 B1
Filed: 09/14/2016
Issued: 01/08/2019
Est. Priority Date: 10/01/2015
Status: Active Grant

First Claim

Patent Images

1. A system for visual media reasoning, the system comprising:

one or more processors and a non-transitory memory having instructions encoded thereon such that when the instructions are executed, the one or more processors perform operations of;

filtering an input image having input data using a non-linear sparse coding module and a first series of sparse coding filter kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes;

performing object recognition on the set of sparse codes by using a neurally-inspired vision module to generate object and semantic labels for the set of sparse codes;

performing pattern completion on the object and semantic labels by using a spatiotemporal associative memory module to recall relevant meta-data in the input image;

fusing data related to the input image with the relevant meta-data using bi-directional feedback between the non-linear sparse coding module, the neurally-inspired vision module, and the spatiotemporal associative memory module; and

generating an annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is system and method for visual media reasoning. An input image is filtered using a first series of kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes. Object recognition is performed on the set of sparse codes to generate object and semantic labels for the set of sparse codes. Pattern completion is performed on the object and semantic labels to recall relevant meta-data in the input image. Bi-directional feedback is used to fuse the input data with the relevant meta-data. An annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured is generated.

9 Citations

View as Search Results

18 Claims

1. A system for visual media reasoning, the system comprising:
- one or more processors and a non-transitory memory having instructions encoded thereon such that when the instructions are executed, the one or more processors perform operations of;
  
  filtering an input image having input data using a non-linear sparse coding module and a first series of sparse coding filter kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes;
  
  performing object recognition on the set of sparse codes by using a neurally-inspired vision module to generate object and semantic labels for the set of sparse codes;
  
  performing pattern completion on the object and semantic labels by using a spatiotemporal associative memory module to recall relevant meta-data in the input image;
  
  fusing data related to the input image with the relevant meta-data using bi-directional feedback between the non-linear sparse coding module, the neurally-inspired vision module, and the spatiotemporal associative memory module; and
  
  generating an annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system as set forth in claim 1, wherein a user can provide input by selectively activating specific object or semantic label neurons in the neurally-inspired vision module.
  - 3. The system as set forth in claim 1, wherein the one or more processors further perform an operation of using novel activation patterns to direct user attention to areas of the input image considered relevant, which are recalled from stored knowledge in the spatiotemporal associative memory module.
  - 4. The system as set forth in claim 1, wherein the non-linear sparse coding module comprises a hierarchical chain of a plurality of base model layers, wherein within each base model layer there is a sparse-coding process and a saliency-weighted pooling process, and wherein an output of each base model layer is used as an input to the next base model layer in the hierarchy.
  - 5. The system as set forth in claim 1, wherein bidirectional feedback comprises signals for a specialization feedback process, an attentional feedback process, and a pattern refinement and retrieval process, and wherein user input may be provided to any of the processes.
  - 6. The system as set forth in claim 5, wherein the one or more processors further perform operations of:
    - generating predictions on ambiguous or missing information in the input data;
      
      retrieving previously observed patterns that are similar to patterns in the input data; and
      
      refining the fusion of the input data with the relevant meta-data.

7. A computer-implemented method for visual media reasoning, comprising:
- an act of causing one or more processors to execute instructions stored on a non-transitory memory such that upon execution, the one or more processors perform operations of;
  
  filtering an input image having input data using a non-linear sparse coding module and a first series of sparse coding filter kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes;
  
  performing object recognition on the set of sparse codes by using a neurally-inspired vision module to generate object and semantic labels for the set of sparse codes;
  
  performing pattern completion on the object and semantic labels by using a spatiotemporal associative memory module to recall relevant meta-data in the input image;
  
  fusing data related to the input image with the relevant meta-data using bi-directional feedback between the non-linear sparse coding module, the neurally-inspired vision module, and the spatiotemporal associative memory module; and
  
  generating an annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method as set forth in claim 7, wherein a user can provide input by selectively activating specific object or semantic label neurons in the neurally-inspired vision module.
  - 9. The method as set forth in claim 7, wherein the one or more processors further perform an operation of using novel activation patterns to direct user attention to areas of the input image considered relevant, which are recalled from stored knowledge in the spatiotemporal associative memory module.
  - 10. The method as set forth in claim 7, wherein the non-linear sparse coding module comprises a hierarchical chain of a plurality of base model layers, wherein within each base model layer there is a sparse-coding process and a saliency-weighted pooling process, and wherein an output of each base model layer is used as an input to the next base model layer in the hierarchy.
  - 11. The method as set forth in claim 7, wherein bidirectional feedback comprises signals for a specialization feedback process, an attentional feedback process, and a pattern refinement and retrieval process, and wherein user input may be provided to any of the processes.
  - 12. The method as set forth in claim 11, wherein the one or more processors further performs operations of:
    - generating predictions on ambiguous or missing information in the input data;
      
      retrieving previously observed patterns that are similar to patterns in the input data; and
      
      refining the fusion of the input data with the relevant meta-data.

13. A computer program product for visual media reasoning, the computer program product comprising computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having a processor for causing the processor to perform operations of:
- filtering an input image having input data using a non-linear sparse coding module and a first series of sparse coding filter kernels tuned to represent objects of general categories, followed by a second series of sparse coding filter kernels tuned to represent objects of specialized categories, resulting in a set of sparse codes;
  
  performing object recognition on the set of sparse codes by using a neurally-inspired vision module to generate object and semantic labels for the set of sparse codes;
  
  performing pattern completion on the object and semantic labels by using a spatiotemporal associative memory module to recall relevant meta-data in the input image;
  
  fusing data related to the input image with the relevant meta-data using bi-directional feedback between the non-linear sparse coding module, the neurally-inspired vision module, and the spatiotemporal associative memory module; and
  
  generating an annotated image with information related to who is in the input image, what is in the input image, when the input image was captured, and where the input image was captured.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer program product as set forth in claim 13, wherein a user can provide input by selectively activating specific object or semantic label neurons in the neurally-inspired vision module.
  - 15. The computer program product as set forth in claim 13, further comprising instructions for causing the one or more processors to perform an operation of using novel activation patterns to direct user attention to areas of the input image considered relevant, which are recalled from stored knowledge in the spatiotemporal associative memory module.
  - 16. The computer program product as set forth in claim 13, wherein the non-linear sparse coding module comprises a hierarchical chain of a plurality of base model layers, wherein within each base model layer there is a sparse-coding process and a saliency-weighted pooling process, and wherein an output of each base model layer is used as an input to the next base model layer in the hierarchy.
  - 17. The computer program product as set forth in claim 13, whereinbidirectional feedback comprises signals for a specialization feedback process, an attentional feedback process, and a pattern refinement and retrieval process, and wherein user input may be provided to any of the processes.
  - 18. The computer program product as set forth in claim 17, further comprising instructions for causing the processor to perform operations of:
    - generating predictions on ambiguous or missing information in the input data;
      
      retrieving previously observed patterns that are similar to patterns in the input data; and
      
      refining the fusion of the input data with the relevant meta-data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
HRL Laboratories LLC (The Boeing Co.)
Original Assignee
HRL Laboratories LLC (The Boeing Co.)
Inventors
Owechko, Yuri, Rao, Shanka R., Cheng, Shinko Y., Chelian, Suhas E., Bhattacharyya, Rajan, Howard, Michael D.
Primary Examiner(s)
Werner, Brian

Application Number

US15/265,819
Time in Patent Office

846 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/2136   based on sparsity criteria,...

G06F 18/2413   based on distances to train...

G06F 18/251   of input or preprocessed data

G06T 2207/20016   Hierarchical, coarse-to-fin...

G06T 2207/20024   Filtering details

G06T 2207/20084   Artificial neural networks ...

G06T 2210/12   Bounding box

G06T 7/12   Edge-based segmentation

G06T 7/168   involving transform domain ...

G06V 10/513   Sparse representations

G06V 10/764   using classification, e.g. ...

G06V 10/7715   Feature extraction, e.g. by...

G06V 10/803   of input or preprocessed data

G06V 10/82   using neural networks

G06V 20/41   Higher-level, semantic clus...

Method and apparatus for sparse associative recognition and recall for visual media reasoning

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

9 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for sparse associative recognition and recall for visual media reasoning

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links