Automatically segmenting images based on natural language phrases
First Claim
1. A computer-readable storage medium having instructions stored thereon for segmenting an image that includes a plurality of pixels, which, when executed by a processor of a computing device cause the computing device to perform actions comprising:
- receiving an ordered set of tokens that references a first region of the image;
generating an image map that represents a correspondence between each of a plurality of image features and a corresponding portion of the plurality of pixels;
generating a set of token data elements, wherein each of the token data elements represents semantic features of a corresponding token of the set of tokens;
iteratively updating a segmentation map that represents whether each of the plurality of pixels is included in the first region of the image, wherein each of a plurality of iterative updates of the segmentation map is based on a previous version of the segmentation map and a combination of the image map and one of the token data elements that is based on an order of the set of tokens; and
generating a segmented image based on the image and the segmentation map.
2 Assignments
0 Petitions
Accused Products
Abstract
The invention is directed towards segmenting images based on natural language phrases. An image and an n-gram, including a sequence of tokens, are received. An encoding of image features and a sequence of token vectors are generated. A fully convolutional neural network identifies and encodes the image features. A word embedding model generates the token vectors. A recurrent neural network (RNN) iteratively updates a segmentation map based on combinations of the image feature encoding and the token vectors. The segmentation map identifies which pixels are included in an image region referenced by the n-gram. A segmented image is generated based on the segmentation map. The RNN may be a convolutional multimodal RNN. A separate RNN, such as a long short-term memory network, may iteratively update an encoding of semantic features based on the order of tokens. The first RNN may update the segmentation map based on the semantic feature encoding.
13 Citations
20 Claims
-
1. A computer-readable storage medium having instructions stored thereon for segmenting an image that includes a plurality of pixels, which, when executed by a processor of a computing device cause the computing device to perform actions comprising:
-
receiving an ordered set of tokens that references a first region of the image; generating an image map that represents a correspondence between each of a plurality of image features and a corresponding portion of the plurality of pixels; generating a set of token data elements, wherein each of the token data elements represents semantic features of a corresponding token of the set of tokens; iteratively updating a segmentation map that represents whether each of the plurality of pixels is included in the first region of the image, wherein each of a plurality of iterative updates of the segmentation map is based on a previous version of the segmentation map and a combination of the image map and one of the token data elements that is based on an order of the set of tokens; and generating a segmented image based on the image and the segmentation map. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for segmenting an image, comprising:
-
receiving the image, wherein the image includes a plurality of pixels; generating an n-gram based on a natural language phrase that references an object depicted within a first region of image, wherein the n-gram includes an ordered set of tokens; generating an image data structure that encodes a mapping between each of a plurality of image features and a corresponding portion of the plurality of pixels, wherein the plurality of images features are identified within the image based on an image feature identification model; generating a set of token data structures based on a natural language model, wherein each of the token data structures encodes semantic features of a corresponding token of the set of tokens; iteratively generating a segmentation map based on a first recurrent neural network (RNN) and a plurality of iteratively generated combinations of the image data structure and portions of the set of token data structures, wherein the first RNN propagates the segmentation map during the iterative generation of the segmentation data structure and the segmentation map identifies a subset of the plurality of pixels that are included in the first region of the image; and segmenting the image based on the iteratively generated segmentation map. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computing system for segmenting an image based on an n-gram that references a first region of the image, wherein the image includes a plurality of pixels and the n-gram includes an ordered set of tokens, the system comprising:
-
a processor device; and a computer-readable storage medium, coupled with the processor device, having instructions stored thereon, which, when executed by the processor device, perform actions comprising; steps for identifying a plurality of images features within the image based on an image feature identification model; steps for encoding a mapping between each of the plurality of image features and a corresponding portion of the plurality of pixels in an image data structure; steps for identifying semantic features for each token in the set of tokens based on a natural language model; steps for encoding the sematic features of each token in the set of tokens as a set of token data structures; steps for iteratively updating a segmentation map based on the segmentation map and an ordered set of combinations of the image data structure and the set of token data structures based on an order of the set of tokens; and steps for providing a segmented image based on the image and the segmentation map. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification