Technologies for improved object detection accuracy with multi-scale representation and training
First Claim
1. A computing device for object detection, the computing device comprising a data manager, a multi-layer convolution network, and a multi-scale region proposal network, wherein:
- the multi-scale region proposal network includes a plurality of region proposal network layers, wherein each region proposal network layer has a different receptive field size;
the data manager is to input an input image into the multi-layer convolution network;
the multi-layer convolution network is to generate a convolution map in response to an input of the input image;
the data manager is further to input the convolution map into the multi-scale region proposal network; and
the multi-scale region proposal network is to generate a plurality of region proposals in response to an input of the convolution map, wherein each region proposal is output by a corresponding region proposal network layer, and wherein each region proposal includes a classification vector and a regression vector.
1 Assignment
0 Petitions
Accused Products
Abstract
Technologies for multi-scale object detection include a computing device including a multi-layer convolution network and a multi-scale region proposal network (RPN). The multi-layer convolution network generates a convolution map based on an input image. The multi-scale RPN includes multiple RPN layers, each with a different receptive field size. Each RPN layer generates region proposals based on the convolution map. The computing device may include a multi-scale object classifier that includes multiple region of interest (ROI) pooling layers and multiple associated fully connected (FC) layers. Each ROI pooling layer has a different output size, and each FC layer may be trained for an object scale based on the output size of the associated ROI pooling layer. Each ROI pooling layer may generate pooled ROIs based on the region proposals and each FC layer may generate object classification vectors based on the pooled ROIs. Other embodiments are described and claimed.
-
Citations
25 Claims
-
1. A computing device for object detection, the computing device comprising a data manager, a multi-layer convolution network, and a multi-scale region proposal network, wherein:
-
the multi-scale region proposal network includes a plurality of region proposal network layers, wherein each region proposal network layer has a different receptive field size; the data manager is to input an input image into the multi-layer convolution network; the multi-layer convolution network is to generate a convolution map in response to an input of the input image; the data manager is further to input the convolution map into the multi-scale region proposal network; and the multi-scale region proposal network is to generate a plurality of region proposals in response to an input of the convolution map, wherein each region proposal is output by a corresponding region proposal network layer, and wherein each region proposal includes a classification vector and a regression vector. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for object detection, the method comprising:
-
inputting, by a computing device, an input image into a multi-layer convolution network; executing, by the computing device, the multi-layer convolution network in response to inputting the input image to generate a convolution map; inputting, by the computing device, the convolution map into a multi-scale region proposal network that includes a plurality of region proposal network layers, wherein each region proposal network layer has a different receptive field size; and executing, by the computing device, the multi-scale region proposal network in response to inputting the convolution map to generate a plurality of region proposals, wherein each region proposal is output by a corresponding region proposal network layer, and wherein each region proposal includes a classification vector and a regression vector. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. One or more computer-readable storage media comprising a plurality of instructions that in response to being executed cause a computing device to:
-
input an input image into a multi-layer convolution network; execute the multi-layer convolution network in response to inputting the input image to generate a convolution map; input the convolution map into a multi-scale region proposal network that includes a plurality of region proposal network layers, wherein each region proposal network layer has a different receptive field size; and execute the multi-scale region proposal network in response to inputting the convolution map to generate a plurality of region proposals, wherein each region proposal is output by a corresponding region proposal network layer, and wherein each region proposal includes a classification vector and a regression vector. - View Dependent Claims (21, 22, 23, 24, 25)
-
Specification