Deep reinforcement learning-based captioning with embedding reward
First Claim
Patent Images
1. A method comprising:
- extracting, by an image captioning system, an image feature from an image;
analyzing, by a policy network of the image captioning system, the image feature to compute a probability of a next word to be generated for a caption describing the image feature, the probability comprising a list of options for the next word and a policy network score for each possible option in the list of options;
ranking, by the policy network of the image captioning system, the list of options for the next word of the caption based on the policy network score for each possible option in the list of options;
analyzing, by a value network of the image captioning system, the image feature and the probability of the next word generated by the policy network to generate a value network score for each possible option in the list of options;
ranking, by the value network, the list of options for the next word of the caption based on the value network score; and
selecting, by the image captioning system, a next word for the caption based on the ranking of the list of options by the policy network and the ranking of the list of options by the value network.
1 Assignment
0 Petitions
Accused Products
Abstract
An image captioning system and method is provided for generating a caption for an image. The image captioning system utilizes a policy network and a value network to generate the caption. The policy network serves as a local guidance and the value network serves as a global and lookahead guidance.
-
Citations
20 Claims
-
1. A method comprising:
-
extracting, by an image captioning system, an image feature from an image; analyzing, by a policy network of the image captioning system, the image feature to compute a probability of a next word to be generated for a caption describing the image feature, the probability comprising a list of options for the next word and a policy network score for each possible option in the list of options; ranking, by the policy network of the image captioning system, the list of options for the next word of the caption based on the policy network score for each possible option in the list of options; analyzing, by a value network of the image captioning system, the image feature and the probability of the next word generated by the policy network to generate a value network score for each possible option in the list of options; ranking, by the value network, the list of options for the next word of the caption based on the value network score; and selecting, by the image captioning system, a next word for the caption based on the ranking of the list of options by the policy network and the ranking of the list of options by the value network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 18)
-
-
12. An image captioning system comprising:
-
one or more processors; and a computer-readable medium coupled with the processor, the computer-readable medium comprising instructions stored thereon that are executable by the one or more processors to cause the imaging captioning system to perform operations comprising; extracting an image feature from an image; analyzing, by a policy network of the image captioning system, the image feature to compute a probability of a next word to be generated for a caption describing the image feature, the probability comprising a list of options for the next word and a policy network score for each possible option in the list of options; ranking, by the policy network of the image captioning system, the list of options for the next word of the caption based on the policy network score for each possible option in the list of options; analyzing, by a value network of the image captioning system, the image feature and the probability of the next word generated by the policy network to generate a value network score for each possible option in the list of options; ranking, by the value network, the list of options for the next word of the caption based on the value network score; and selecting, by the image captioning system, a next word for the caption based on the ranking of the list of options by the policy network and the ranking of the list of options by the value network. - View Dependent Claims (13, 14, 15, 16, 17, 19)
-
-
20. A non-transitory computer-readable medium comprising instructions stored thereon that are executable by at least one processor to cause a computing device to perform operations comprising:
-
extracting an image feature from an image; analyzing, by a policy network of an image captioning system, the image feature to compute a probability of a next word to be generated for a caption describing the image feature, the probability comprising a list of options for the next word and a policy network score for each possible option in the list of options; ranking, by the policy network of the image captioning system, the list of options for the next word of the caption based on the policy network score for each possible option in the list of options; analyzing, by a value network of the image captioning system, the image feature and the probability of the next word generated by the policy network to generate a value network score for each possible option in the list of options; ranking, by the value network, the list of options for the next word of the caption based on the value network score; and selecting, by the image captioning system, a next word for the caption based on the ranking of the list of options by the policy network and the ranking of the list of options by the value network.
-
Specification