Image processing with recurrent attention

US 10,223,617 B1
Filed: 06/04/2015
Issued: 03/05/2019
Est. Priority Date: 06/06/2014
Status: Active Grant

First Claim

Patent Images

1. A method for processing an image sequence, wherein the image sequence comprises a plurality of first images, wherein each of the plurality of first images are the same, and wherein the method comprises, for each first image:

determining a location in the first image, comprising;

determining the location based on an output of a location neural network for the first image if the first image is after an initial first image in the image sequence;

extracting a glimpse from the first image using the location;

updating a current internal state of a recurrent neural network using the glimpse extracted from the first image to generate a new internal state, comprising;

generating a glimpse representation of the extracted glimpse, andprocessing the glimpse representation using the recurrent neural network to update the current internal state of the recurrent neural network to generate a new internal state;

processing, using the location neural network, the new internal state of the recurrent neural network generated using the glimpse extracted from the first image to generate an output of the location neural network for a next image in the image sequence after the first image;

selecting an action from a predetermined set of possible actions, wherein each possible action in the predetermined set of possible actions defines a respective object category, including;

processing, using an action neural network, the new internal state of the recurrent neural network to generate an action neural network output comprising a respective action score for each of the possible actions, wherein for each of the possible actions, the respective action score for the possible action represents a likelihood that the first image includes an image of an object belonging to the respective object category defined by the possible action, andselecting the action based on the action neural network output;

wherein the location neural network, the recurrent neural network, and the action neural network have been trained by an end-to-end optimization procedure.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using recurrent attention. One of the methods includes determining a location in the first image; extracting a glimpse from the first image using the location; generating a glimpse representation of the extracted glimpse; processing the glimpse representation using a recurrent neural network to update a current internal state of the recurrent neural network to generate a new internal state; processing the new internal state to select a location in a next image in the image sequence after the first image; and processing the new internal state to select an action from a predetermined set of possible actions.

Citations

17 Claims

1. A method for processing an image sequence, wherein the image sequence comprises a plurality of first images, wherein each of the plurality of first images are the same, and wherein the method comprises, for each first image:
- determining a location in the first image, comprising;
  
  determining the location based on an output of a location neural network for the first image if the first image is after an initial first image in the image sequence;
  
  extracting a glimpse from the first image using the location;
  
  updating a current internal state of a recurrent neural network using the glimpse extracted from the first image to generate a new internal state, comprising;
  
  generating a glimpse representation of the extracted glimpse, andprocessing the glimpse representation using the recurrent neural network to update the current internal state of the recurrent neural network to generate a new internal state;
  
  processing, using the location neural network, the new internal state of the recurrent neural network generated using the glimpse extracted from the first image to generate an output of the location neural network for a next image in the image sequence after the first image;
  
  selecting an action from a predetermined set of possible actions, wherein each possible action in the predetermined set of possible actions defines a respective object category, including;
  
  processing, using an action neural network, the new internal state of the recurrent neural network to generate an action neural network output comprising a respective action score for each of the possible actions, wherein for each of the possible actions, the respective action score for the possible action represents a likelihood that the first image includes an image of an object belonging to the respective object category defined by the possible action, andselecting the action based on the action neural network output;
  
  wherein the location neural network, the recurrent neural network, and the action neural network have been trained by an end-to-end optimization procedure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, whereinselecting the action comprises selecting a highest-scoring possible action according to the action scores.
  - 3. The method of claim 1, wherein extracting the glimpse from the first image comprises:
    - extracting a plurality of patches from the first image, each patch being centered at the location in the first image; and
      
      combining the patches to generate the glimpse.
  - 4. The method of claim 3, wherein each of the plurality of patches has a distinct size, and wherein each of the plurality of patches has a distinct resolution.
  - 5. The method of claim 4, wherein combining the patches to generate the glimpse comprises:
    - re-scaling the patches so that each patch has a same size; and
      
      concatenating the re-scaled patches to generate the glimpse.
  - 6. The method of claim 1, wherein generating a glimpse representation of the extracted glimpse comprises:
    - processing the extracted glimpse and the location in the first image using a glimpse neural network to generate the glimpse representation.
  - 7. The method of claim 6, wherein processing the extracted glimpse and the location in the first image using a glimpse neural network to generate the glimpse representation comprises:
    - processing the extracted glimpse using one or more first neural network layers to generate an initial representation of the extracted glimpse;
      
      processing the location in the first image using one or more second neural network layers to generate an initial representation of the location in the first image; and
      
      processing the initial representation of the extracted glimpse and the initial representation of the location in the first image using one or more third neural network layers to generate the glimpse representation.
  - 8. The method of claim 1, wherein processing the new internal state to generate an output of the location neural network comprises:
    - processing the new internal state using the location neural network to generate a distribution parameter; and
      
      stochastically selecting a location from a distribution of possible locations that is parameterized by the distribution parameter.
  - 9. The method of claim 1, wherein the recurrent neural network is a long short term memory (LSTM) neural network.
  - 10. The method of claim 1, wherein the image sequence further comprises one or more second images, and wherein the method further comprises for each second image:
    - determining a location in the second image;
      
      extracting a glimpse from the second image using the location in the second image;
      
      generating a glimpse representation of the extracted glimpse from the second image;
      
      processing the glimpse representation of the extracted glimpse from the second image using the recurrent neural network to update a current internal state of the recurrent neural network for the second image to generate a new internal state for the second image;
      
      processing the new internal state to select a location in a next image in the image sequence after the second image; and
      
      refraining from selecting an action from the predetermined set of possible actions for the second image.

11. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations for processing an image sequence, wherein the image sequence comprises a plurality of first images, wherein each of the plurality of first images are the same, and wherein the operations comprise, for each first image:
- determining a location in the first image, comprising;
  
  determining the location based on an output of a location neural network for the first image if the first image is after an initial first image in the image sequence;
  
  extracting a glimpse from the first image using the location;
  
  updating a current internal state of a recurrent neural network using the glimpse extracted from the first image to generate a new internal state, comprising;
  
  generating a glimpse representation of the extracted glimpse, andprocessing the glimpse representation using the recurrent neural network to update the current internal state of the recurrent neural network to generate a new internal state;
  
  processing, using the location neural network, the new internal state of the recurrent neural network generated using the glimpse extracted from the first image to generate an output of the location neural network for a next image in the image sequence after the first image;
  
  selecting an action from a predetermined set of possible actions, wherein each possible action in the predetermined set of possible actions defines a respective object category, including;
  
  processing, using an action neural network, the new internal state of the recurrent neural network to generate an action neural network output comprising a respective action score for each of the possible actions, wherein for each of the possible actions, the respective action score for the possible action represents a likelihood that the first image includes an image of an object belonging to the respective object category defined by the possible action, andselecting the action based on the action neural network output;
  
  wherein the location neural network, the recurrent neural network, and the action neural network have been trained by an end-to-end optimization procedure.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The system of claim 11, whereinselecting the action comprises selecting a highest-scoring possible action according to the action scores.
  - 13. The system of claim 11, wherein extracting the glimpse from the first image comprises:
    - extracting a plurality of patches from the first image, each patch being centered at the location in the first image; and
      
      combining the patches to generate the glimpse.
  - 14. The system of claim 11, wherein generating a glimpse representation of the extracted glimpse comprises:
    - processing the extracted glimpse and the location in the first image using a glimpse neural network to generate the glimpse representation.
  - 15. The system of claim 14, wherein processing the extracted glimpse and the location in the first image using a glimpse neural network to generate the glimpse representation comprises:
    - processing the extracted glimpse using one or more first neural network layers to generate an initial representation of the extracted glimpse;
      
      processing the location in the first image using one or more second neural network layers to generate an initial representation of the location in the first image; and
      
      processing the initial representation of the extracted glimpse and the initial representation of the location in the first image using one or more third neural network layers to generate the glimpse representation.
  - 16. The system of claim 11, wherein processing the new internal state to generate an output of the location neural network comprises:
    - processing the new internal state using the location neural network to generate a distribution parameter; and
      
      stochastically selecting a location from a distribution of possible locations that is parameterized by the distribution parameter.

17. A computer program product encoded on one or more non-transitory computer storage media, the computer program product comprising instructions that when executed by one or more computers cause the one or more computers to perform operations for processing an image sequence, wherein the image sequence comprises a plurality of first images, wherein each of the plurality of first images are the same, and wherein the operations comprise, for each first image:
- determining a location in the first image, comprising;
  
  determining the location based on an output of a location neural network for the first image if the first image is after an initial first image in the image sequence;
  
  extracting a glimpse from the first image using the location;
  
  updating a current internal state of a recurrent neural network using the glimpse extracted from the first image to generate a new internal state, comprising;
  
  generating a glimpse representation of the extracted glimpse, andprocessing the glimpse representation using the recurrent neural network to update the current internal state of the recurrent neural network to generate a new internal state;
  
  processing, using the location neural network, the new internal state of the recurrent neural network generated using the glimpse extracted from the first image to generate an output of the location neural network for a next image in the image sequence after the first image;
  
  selecting an action from a predetermined set of possible actions, wherein each possible action in the predetermined set of possible actions defines a respective object category, including;
  
  processing, using an action neural network, the new internal state of the recurrent neural network to generate an action neural network output comprising a respective action score for each of the possible actions, wherein for each of the possible actions, the respective action score for the possible action represents a likelihood that the first image includes an image of an object belonging to the respective object category defined by the possible action, andselecting the action based on the action neural network output;
  
  wherein the location neural network, the recurrent neural network, and the action neural network have been trained by an end-to-end optimization procedure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
DeepMind Technologies Limited (Alphabet Inc.)
Inventors
Mnih, Volodymyr, Kavukcuoglu, Koray
Primary Examiner(s)
Le, Vu
Assistant Examiner(s)
Rivera-Martinez, Guillermo M

Application Number

US14/731,348
Time in Patent Office

1,370 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/2431   Multiple classes

G06V 10/25   Determination of region of ...

G06V 10/44   Local feature extraction by...

G06V 10/82   using neural networks

G06V 20/80   Recognising image objects c...

G06V 30/194   References adjustable by an...

G06V 30/413   Classification of content, ...

G06V 40/20   Movements or behaviour, e.g...

Image processing with recurrent attention

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Image processing with recurrent attention

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links