Extracting salient features from video using a neurosynaptic system

US 9,355,331 B2
Filed: 09/10/2015
Issued: 05/31/2016
Est. Priority Date: 04/29/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving an input sequence of image frames, wherein each image frame comprises at least one pixel channel representing a dimension of the input sequence of image frames; and

utilizing one or more neurosynaptic core circuits to estimate visual saliency for the input sequence of image frames, wherein the one or more neurosynaptic core circuits perform operations including;

for each pixel channel of each image frame;

generating a corresponding multi-scale data structure by spatially subsampling corresponding neural spiking data representing pixel intensity of each pixel of the pixel channel at different subsampling scales;

generating at least one corresponding saliency map by extracting at least one salient feature from the corresponding multi-scale data structure;

normalizing resolution of each corresponding saliency map;

applying a Gaussian smoothing operator to each corresponding saliency map to suppress speckles and enhance centers indicating salient features; and

merging each saliency map corresponding to each pixel channel into a combined saliency map representing estimated visual saliency for the input sequence of image frames.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the invention provide a method of visual saliency estimation comprising receiving an input sequence of image frames. Each image frame has one or more channels, and each channel has one or more pixels. The method further comprises, for each channel of each image frame, generating corresponding neural spiking data based on a pixel intensity of each pixel of the channel, generating a corresponding multi-scale data structure based on the corresponding neural spiking data, and extracting a corresponding map of features from the corresponding multi-scale data structure. The multi-scale data structure comprises one or more data layers, wherein each data layer represents a spike representation of pixel intensities of a channel at a corresponding scale. The method further comprises encoding each map of features extracted as neural spikes.

67 Citations

View as Search Results

20 Claims

1. A method comprising:
- receiving an input sequence of image frames, wherein each image frame comprises at least one pixel channel representing a dimension of the input sequence of image frames; and
  
  utilizing one or more neurosynaptic core circuits to estimate visual saliency for the input sequence of image frames, wherein the one or more neurosynaptic core circuits perform operations including;
  
  for each pixel channel of each image frame;
  
  generating a corresponding multi-scale data structure by spatially subsampling corresponding neural spiking data representing pixel intensity of each pixel of the pixel channel at different subsampling scales;
  
  generating at least one corresponding saliency map by extracting at least one salient feature from the corresponding multi-scale data structure;
  
  normalizing resolution of each corresponding saliency map;
  
  applying a Gaussian smoothing operator to each corresponding saliency map to suppress speckles and enhance centers indicating salient features; and
  
  merging each saliency map corresponding to each pixel channel into a combined saliency map representing estimated visual saliency for the input sequence of image frames.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the operation further include:
    - for each pixel channel of each image frame;
      
      encoding salient features extracted from a corresponding multi-scale data structure as neural spikes.
  - 3. The method of claim 1, wherein each saliency map is a retinotopic map of salient features.
  - 4. The method of claim 1, wherein the input sequence of image frames constitutes a video of one or more frames that are not necessarily related to each other.
  - 5. The method of claim 1, wherein:
    - for each pixel channel of each image frame;
      
      a corresponding multi-scale data structure comprises multiple data layers;
      
      each data layer of the corresponding multi-scale data structure corresponds to a subsampling scale of different subsampling scales; and
      
      the corresponding multi-scale data structure represents a distinct multi-scale pyramidal representation of the pixel channel.
  - 6. The method of claim 1, wherein:
    - for each pixel channel of each image frame;
      
      spatially subsampling corresponding neural spiking data at different subsampling scales comprises determining a convolution of the corresponding neural spiking data by convolving the neural spiking data with a two-dimensional smoothing kernel.
  - 7. The method of claim 1, wherein the operations further include:
    - for each pixel channel of each image frame;
      
      converting pixel intensity of each pixel of the pixel channel to neural spikes based on a temporal coding scheme and a spatial coding scheme.
  - 8. The method of claim 1, further comprising:
    - utilizing the one or more neurosynaptic core circuits to estimate motion saliency for the input sequence of image frames, wherein the one or more neurosynaptic core circuits further perform operations including;
      
      for each image frame;
      
      detecting one or more salient image regions of the image frame by identifying one or more pixel subsets where one or more changes have occurred over time.
  - 9. The method of claim 1, wherein:
    - for each pixel channel of each image frame;
      
      salient features extracted from a corresponding multi-scale data structure comprise at least one of the following;
      
      one or more mathematically defined features, and one or more learned features;
      
      the one or more mathematically defined features include at least one of the following;
      
      one or more edge extraction operators operating on luminance and color channels, one or more texture extraction operators for extracting high frequency spatial activity, and one or more local averaging operations; and
      
      the one or more learned features include at least one feature learned from training data using one or more of the following algorithms;
      
      k-means clustering, and input/desired output covariance.

10. A system comprising a computer processor, a computer-readable hardware storage medium, and program code embodied with the computer-readable hardware storage medium for execution by the computer processor to implement a method comprising:
- receiving an input sequence of image frames, wherein each image frame comprises at least one pixel channel representing a dimension of the input sequence of image frames; and
  
  utilizing one or more neurosynaptic core circuits to estimate visual saliency for the input sequence of image frames, wherein the one or more neurosynaptic core circuits perform operations including;
  
  for each pixel channel of each image frame;
  
  generating a corresponding multi-scale data structure by spatially subsampling corresponding neural spiking data representing pixel intensity of each pixel of the pixel channel at different subsampling scales;
  
  generating at least one corresponding saliency map by extracting at least one salient feature from the corresponding multi-scale data structure;
  
  normalizing resolution of each corresponding saliency map;
  
  applying a Gaussian smoothing operator to each corresponding saliency map to suppress speckles and enhance centers indicating salient features; and
  
  merging each saliency map corresponding to each pixel channel into a combined saliency map representing estimated visual saliency for the input sequence of image frames.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The system of claim 10, wherein the operation further include:
    - for each pixel channel of each image frame;
      
      encoding salient features extracted from a corresponding multi-scale data structure as neural spikes.
  - 12. The system of claim 10, wherein each saliency map is a retinotopic map of salient features.
  - 13. The system of claim 10, wherein the input sequence of image frames constitutes a video of one or more frames that are not necessarily related to each other.
  - 14. The system of claim 10, wherein:
    - for each pixel channel of each image frame;
      
      a corresponding multi-scale data structure comprises multiple data layers;
      
      each data layer of the corresponding multi-scale data structure corresponds to a subsampling scale of different subsampling scales; and
      
      the corresponding multi-scale data structure represents a distinct multi-scale pyramidal representation of the pixel channel.
  - 15. The system of claim 10, wherein:
    - for each pixel channel of each image frame;
      
      spatially subsampling corresponding neural spiking data at different subsampling scales comprises determining a convolution of the corresponding neural spiking data by convolving the neural spiking data with a two-dimensional smoothing kernel.
  - 16. The system of claim 10, wherein the operations further include:
    - for each pixel channel of each image frame;
      
      converting pixel intensity of each pixel of the pixel channel to neural spikes based on a temporal coding scheme and a spatial coding scheme.
  - 17. The system of claim 10, further comprising:
    - utilizing the one or more neurosynaptic core circuits to estimate motion saliency for the input sequence of image frames, wherein the one or more neurosynaptic core circuits further perform operations including;
      
      for each image frame;
      
      detecting one or more salient image regions of the image frame by identifying one or more pixel subsets where one or more changes have occurred over time.
  - 18. The system of claim 10, wherein:
    - for each pixel channel of each image frame;
      
      salient features extracted from a corresponding multi-scale data structure comprise at least one of the following;
      
      one or more mathematically defined features, and one or more learned features;
      
      the one or more mathematically defined features include at least one of the following;
      
      one or more edge extraction operators operating on luminance and color channels, one or more texture extraction operators for extracting high frequency spatial activity, and one or more local averaging operations; and
      
      the one or more learned features include at least one feature learned from training data using one or more of the following algorithms;
      
      k-means clustering, and input/desired output covariance.

19. A computer program product comprising a computer-readable hardware storage medium having program code embodied therewith, the program code being executable by a computer to implement a method comprising:
- receiving an input sequence of image frames, wherein each image frame comprises at least one pixel channel representing a dimension of the input sequence of image frames; and
  
  utilizing one or more neurosynaptic core circuits to estimate visual saliency for the input sequence of image frames, wherein the one or more neurosynaptic core circuits perform operations including;
  
  for each pixel channel of each image frame;
  
  generating a corresponding multi-scale data structure by spatially subsampling corresponding neural spiking data representing pixel intensity of each pixel of the pixel channel at different subsampling scales;
  
  generating at least one corresponding saliency map by extracting at least one salient feature from the corresponding multi-scale data structure;
  
  normalizing resolution of each corresponding saliency map;
  
  applying a Gaussian smoothing operator to each corresponding saliency map to suppress speckles and enhance centers indicating salient features; and
  
  merging each saliency map corresponding to each pixel channel into a combined saliency map representing estimated visual saliency for the input sequence of image frames.
- View Dependent Claims (20)
- - 20. The computer program product of claim 19, wherein:
    - for each pixel channel of each image frame;
      
      a corresponding multi-scale data structure comprises multiple data layers;
      
      each data layer of the corresponding multi-scale data structure corresponds to a subsampling scale of different subsampling scales; and
      
      the corresponding multi-scale data structure represents a distinct multi-scale pyramidal representation of the pixel channel.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Andreopoulos, Alexander, Esser, Steven K., Modha, Dharmendra S.
Primary Examiner(s)
Carter, Aaron W

Application Number

US14/850,046
Publication Number

US 20160004931A1
Time in Patent Office

264 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06N 3/049   Temporal neural networks, e...

G06N 3/063   using electronic means

G06V 10/44   Local feature extraction by...

G06V 10/462   Salient features, e.g. scal...

G06V 10/464   using a plurality of salien...

G06V 10/56   relating to colour

G06V 20/46   Extracting features or char...

G06V 30/194   References adjustable by an...

Extracting salient features from video using a neurosynaptic system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

67 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Extracting salient features from video using a neurosynaptic system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

67 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others