Methods and Software For Detecting Objects in Images Using a Multiscale Fast Region-Based Convolutional Neural Network
First Claim
1. A method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising:
- receiving the image and storing it in computer memory;
sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales;
pooling at least one of the feature maps to create a corresponding at least one pooled feature map;
normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps;
concatenating the series of normalized feature maps together with one another to create a concatenated feature map;
dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map;
processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification;
if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps;
pooling each of the regions of interest to create a corresponding pooled region of interest;
normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest;
concatenating the normalized regions of interest with one another to create a concatenated region of interest;
dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest;
processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and
storing the bounding box and the confidence score in the computer memory in association with the image.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods of detecting an object in an image using a convolutional neural network based architecture that processes multiple feature maps of differing scales from differing convolution layers within a convolutional network to create a regional-proposal bounding box. The bounding box is projected back to the feature maps of the individual convolution layers to obtain a set of regions of interest. These regions of interest are then processed to ultimately create a confidence score representing the confidence that the object detected in the bounding box is the desired object. These processes allow the method to utilize deep features encoded in both the global and the local representation for object regions, allowing the method to robustly deal with challenges in the problem of robust object detection. Software for executing the disclosed methods within an object-detection system is also disclosed.
-
Citations
20 Claims
-
1. A method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising:
-
receiving the image and storing it in computer memory; sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales; pooling at least one of the feature maps to create a corresponding at least one pooled feature map; normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps; concatenating the series of normalized feature maps together with one another to create a concatenated feature map; dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map; processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification; if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps; pooling each of the regions of interest to create a corresponding pooled region of interest; normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest; concatenating the normalized regions of interest with one another to create a concatenated region of interest; dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest; processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and storing the bounding box and the confidence score in the computer memory in association with the image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable storage medium containing computer-executable instructions that, when executed by a computing system, performs a method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising:
-
receiving the image and storing it in computer memory; sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales; pooling at least one of the feature maps to create a corresponding at least one pooled feature map; normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps; concatenating the series of normalized feature maps together with one another to create a concatenated feature map; dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map; processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification; if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps; pooling each of the regions of interest to create a corresponding pooled region of interest; normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest; concatenating the normalized regions of interest with one another to create a concatenated region of interest; dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest; processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and storing the bounding box and the confidence score in the computer memory in association with the image. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification