Image processing with recurrent attention
First Claim
1. A method for processing an image sequence, wherein the image sequence comprises a plurality of first images, wherein each of the plurality of first images are the same, and wherein the method comprises, for each first image:
- determining a location in the first image, comprising;
determining the location based on an output of a location neural network for the first image if the first image is after an initial first image in the image sequence;
extracting a glimpse from the first image using the location;
updating a current internal state of a recurrent neural network using the glimpse extracted from the first image to generate a new internal state, comprising;
generating a glimpse representation of the extracted glimpse, andprocessing the glimpse representation using the recurrent neural network to update the current internal state of the recurrent neural network to generate a new internal state;
processing, using the location neural network, the new internal state of the recurrent neural network generated using the glimpse extracted from the first image to generate an output of the location neural network for a next image in the image sequence after the first image;
selecting an action from a predetermined set of possible actions, wherein each possible action in the predetermined set of possible actions defines a respective object category, including;
processing, using an action neural network, the new internal state of the recurrent neural network to generate an action neural network output comprising a respective action score for each of the possible actions, wherein for each of the possible actions, the respective action score for the possible action represents a likelihood that the first image includes an image of an object belonging to the respective object category defined by the possible action, andselecting the action based on the action neural network output;
wherein the location neural network, the recurrent neural network, and the action neural network have been trained by an end-to-end optimization procedure.
4 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using recurrent attention. One of the methods includes determining a location in the first image; extracting a glimpse from the first image using the location; generating a glimpse representation of the extracted glimpse; processing the glimpse representation using a recurrent neural network to update a current internal state of the recurrent neural network to generate a new internal state; processing the new internal state to select a location in a next image in the image sequence after the first image; and processing the new internal state to select an action from a predetermined set of possible actions.
-
Citations
17 Claims
-
1. A method for processing an image sequence, wherein the image sequence comprises a plurality of first images, wherein each of the plurality of first images are the same, and wherein the method comprises, for each first image:
-
determining a location in the first image, comprising; determining the location based on an output of a location neural network for the first image if the first image is after an initial first image in the image sequence; extracting a glimpse from the first image using the location; updating a current internal state of a recurrent neural network using the glimpse extracted from the first image to generate a new internal state, comprising; generating a glimpse representation of the extracted glimpse, and processing the glimpse representation using the recurrent neural network to update the current internal state of the recurrent neural network to generate a new internal state; processing, using the location neural network, the new internal state of the recurrent neural network generated using the glimpse extracted from the first image to generate an output of the location neural network for a next image in the image sequence after the first image; selecting an action from a predetermined set of possible actions, wherein each possible action in the predetermined set of possible actions defines a respective object category, including; processing, using an action neural network, the new internal state of the recurrent neural network to generate an action neural network output comprising a respective action score for each of the possible actions, wherein for each of the possible actions, the respective action score for the possible action represents a likelihood that the first image includes an image of an object belonging to the respective object category defined by the possible action, and selecting the action based on the action neural network output; wherein the location neural network, the recurrent neural network, and the action neural network have been trained by an end-to-end optimization procedure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations for processing an image sequence, wherein the image sequence comprises a plurality of first images, wherein each of the plurality of first images are the same, and wherein the operations comprise, for each first image:
-
determining a location in the first image, comprising; determining the location based on an output of a location neural network for the first image if the first image is after an initial first image in the image sequence; extracting a glimpse from the first image using the location; updating a current internal state of a recurrent neural network using the glimpse extracted from the first image to generate a new internal state, comprising; generating a glimpse representation of the extracted glimpse, and processing the glimpse representation using the recurrent neural network to update the current internal state of the recurrent neural network to generate a new internal state; processing, using the location neural network, the new internal state of the recurrent neural network generated using the glimpse extracted from the first image to generate an output of the location neural network for a next image in the image sequence after the first image; selecting an action from a predetermined set of possible actions, wherein each possible action in the predetermined set of possible actions defines a respective object category, including; processing, using an action neural network, the new internal state of the recurrent neural network to generate an action neural network output comprising a respective action score for each of the possible actions, wherein for each of the possible actions, the respective action score for the possible action represents a likelihood that the first image includes an image of an object belonging to the respective object category defined by the possible action, and selecting the action based on the action neural network output; wherein the location neural network, the recurrent neural network, and the action neural network have been trained by an end-to-end optimization procedure. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computer program product encoded on one or more non-transitory computer storage media, the computer program product comprising instructions that when executed by one or more computers cause the one or more computers to perform operations for processing an image sequence, wherein the image sequence comprises a plurality of first images, wherein each of the plurality of first images are the same, and wherein the operations comprise, for each first image:
-
determining a location in the first image, comprising; determining the location based on an output of a location neural network for the first image if the first image is after an initial first image in the image sequence; extracting a glimpse from the first image using the location; updating a current internal state of a recurrent neural network using the glimpse extracted from the first image to generate a new internal state, comprising; generating a glimpse representation of the extracted glimpse, and processing the glimpse representation using the recurrent neural network to update the current internal state of the recurrent neural network to generate a new internal state; processing, using the location neural network, the new internal state of the recurrent neural network generated using the glimpse extracted from the first image to generate an output of the location neural network for a next image in the image sequence after the first image; selecting an action from a predetermined set of possible actions, wherein each possible action in the predetermined set of possible actions defines a respective object category, including; processing, using an action neural network, the new internal state of the recurrent neural network to generate an action neural network output comprising a respective action score for each of the possible actions, wherein for each of the possible actions, the respective action score for the possible action represents a likelihood that the first image includes an image of an object belonging to the respective object category defined by the possible action, and selecting the action based on the action neural network output; wherein the location neural network, the recurrent neural network, and the action neural network have been trained by an end-to-end optimization procedure.
-
Specification