Instance-level image retrieval with a region proposal network
First Claim
1. An image processing device comprising:
- an electronic processor; and
a non-transitory storage medium operatively connected with the electronic processor and storing instructions readable and executable by the electronic processor to perform a method for detecting an object in an input image by operations including;
generating an input image vector representing the input image by applying to the input image an image processing network including applying a convolutional neural network (CNN) to the input image to generate an input image CNN response map, defining regions of the input image CNN response map by applying a region proposal network (RPN) to the input image CNN response map, generating a region vector representing each region of the input image CNN response map, and sum-aggregating the region vectors representing the regions of the input image CNN response map;
for a reference image that is different from the input image and that depicts the object, generating a reference image vector representing the reference image by applying to the reference image the same image processing network as applied to the input image including applying the CNN to the reference image to generate a reference image CNN response map, defining regions of the reference image CNN response map by applying the RPN to the reference image CNN response map, generating a region vector representing each region of the reference image CNN response map, and sum-aggregating the region vectors representing the regions of the reference image CNN response map;
computing a similarity metric between the input image vector and the reference image vector; and
detecting the object in the input image if the similarity metric satisfies a detection criterion.
7 Assignments
0 Petitions
Accused Products
Abstract
In a method for detecting an object in an input image, an input image vector representing the input image is generated by performing a regional maximum activations of convolutions (R-MAC) using a convolutional neural network (CNN) applied to the input image and using regions for the R-MAC defined by applying a region proposal network (RPN) to the output of the CNN applied to the input image. Likewise, a reference image vector representing a reference image depicting the object is generated by performing the R-MAC using the CNN applied to the reference image and using regions for the R MAC defined by applying the RPN to the output of the CNN applied to the reference image. A similarity metric between the input image vector and the reference image vector is computed, and the object is detected as present in the input image if the similarity metric satisfies a detection criterion.
-
Citations
14 Claims
-
1. An image processing device comprising:
-
an electronic processor; and a non-transitory storage medium operatively connected with the electronic processor and storing instructions readable and executable by the electronic processor to perform a method for detecting an object in an input image by operations including; generating an input image vector representing the input image by applying to the input image an image processing network including applying a convolutional neural network (CNN) to the input image to generate an input image CNN response map, defining regions of the input image CNN response map by applying a region proposal network (RPN) to the input image CNN response map, generating a region vector representing each region of the input image CNN response map, and sum-aggregating the region vectors representing the regions of the input image CNN response map; for a reference image that is different from the input image and that depicts the object, generating a reference image vector representing the reference image by applying to the reference image the same image processing network as applied to the input image including applying the CNN to the reference image to generate a reference image CNN response map, defining regions of the reference image CNN response map by applying the RPN to the reference image CNN response map, generating a region vector representing each region of the reference image CNN response map, and sum-aggregating the region vectors representing the regions of the reference image CNN response map; computing a similarity metric between the input image vector and the reference image vector; and detecting the object in the input image if the similarity metric satisfies a detection criterion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method, performed by one or more processors, comprising:
-
generating an input image vector representing an input image by applying to the input image an image processing network including applying a convolutional neural network (CNN) to the input image to generate an input image CNN response map, defining regions of the input image CNN response map by applying a region proposal network (RPN) to the input image CNN response map, generating a region vector representing each region of the input image CNN response map, and sum-aggregating the region vectors representing the regions of the input image CNN response map; for a reference image that is different from the input image and that depicts the object, generating a reference image vector representing the reference image by applying to the reference image the same image processing network as applied to the input image including applying the CNN to the reference image to generate a reference image CNN response map, defining regions of the reference image CNN response map by applying the RPN to the reference image CNN response map, generating a region vector representing each region of the reference image CNN response map, and sum-aggregating the region vectors representing the regions of the reference image CNN response map; computing a similarity metric between the input image vector and the reference image vector; and detecting the object in the input image if the similarity metric satisfies a detection criterion.
-
Specification