Automatically Segmenting Images Based On Natural Language Phrases
First Claim
1. A computer-readable storage medium having instructions stored thereon for segmenting an image, which, when executed by a processor of a computing device cause the computing device to perform actions comprising:
- receiving a phrase that references a first region of the image, wherein the phrase includes a set of tokens;
generating a plurality of token data elements based on the set of tokens, wherein each of the plurality of token data elements indicates a semantic feature of a corresponding token of the set of tokens;
generating a plurality of iterative updates of a segmentation map of the image based on an order of the set of tokens, wherein each of a plurality of iterative updates of the segmentation map is based on the semantic feature indicated by the corresponding token data element; and
segmenting the first region of the image based on the iteratively updated segmentation map.
2 Assignments
0 Petitions
Accused Products
Abstract
The invention is directed towards segmenting images based on natural language phrases. An image and an n-gram, including a sequence of tokens, are received. An encoding of image features and a sequence of token vectors are generated. A fully convolutional neural network identifies and encodes the image features. A word embedding model generates the token vectors. A recurrent neural network (RNN) iteratively updates a segmentation map based on combinations of the image feature encoding and the token vectors. The segmentation map identifies which pixels are included in an image region referenced by the n-gram. A segmented image is generated based on the segmentation map. The RNN may be a convolutional multimodal RNN. A separate RNN, such as a long short-term memory network, may iteratively update an encoding of semantic features based on the order of tokens. The first RNN may update the segmentation map based on the semantic feature encoding.
15 Citations
20 Claims
-
1. A computer-readable storage medium having instructions stored thereon for segmenting an image, which, when executed by a processor of a computing device cause the computing device to perform actions comprising:
-
receiving a phrase that references a first region of the image, wherein the phrase includes a set of tokens; generating a plurality of token data elements based on the set of tokens, wherein each of the plurality of token data elements indicates a semantic feature of a corresponding token of the set of tokens; generating a plurality of iterative updates of a segmentation map of the image based on an order of the set of tokens, wherein each of a plurality of iterative updates of the segmentation map is based on the semantic feature indicated by the corresponding token data element; and segmenting the first region of the image based on the iteratively updated segmentation map. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for segmenting an image, the method comprising:
-
receiving an image that includes a plurality of pixels; receiving an n-gram that includes an ordered set of tokens based on a phrase that references an object depicted within a first region of the image; generating an image data structure that encodes a mapping between each of a plurality of image features corresponding to the image and a corresponding portion of the plurality of pixels; generating a set of token data structures; employing a first recurrent neural network (RNN) to iteratively generate a segmentation map; and segmenting the image based on the iteratively generated segmentation map. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. An image segmentation system for segmenting an image, the system comprising:
-
a processor device; and a computer-readable non-transitory storage medium, coupled with the processor device, having instructions stored thereon, which, when executed by the processor device, perform actions comprising; receiving an image feature data structure that encodes images features corresponding to an image; employing a first recurrent neural network (RNN) to generate an n-gram feature data structure that encodes n-gram features corresponding to an ordered set of tokens included in a natural language phrase that references a portion of the image; employing a second RNN to iteratively update a current state of a segmentation map based on the image feature data structure and the n-gram feature data structure, wherein the second RNN propagates a current state of the segmentation map; generating a segmented image based on the iteratively updated current state of the segmentation map, wherein the segmented image indicates the portion of the image referenced by the natural language phrase. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification