Technologies for improved object detection accuracy with multi-scale representation and training

US 10,262,237 B2
Filed: 12/08/2016
Issued: 04/16/2019
Est. Priority Date: 12/08/2016
Status: Active Grant

First Claim

Patent Images

1. A computing device for object detection, the computing device comprising a data manager, a multi-layer convolution network, and a multi-scale region proposal network, wherein:

the multi-scale region proposal network includes a plurality of region proposal network layers, wherein each region proposal network layer has a different receptive field size;

the data manager is to input an input image into the multi-layer convolution network;

the multi-layer convolution network is to generate a convolution map in response to an input of the input image;

the data manager is further to input the convolution map into the multi-scale region proposal network; and

the multi-scale region proposal network is to generate a plurality of region proposals in response to an input of the convolution map, wherein each region proposal is output by a corresponding region proposal network layer, and wherein each region proposal includes a classification vector and a regression vector.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technologies for multi-scale object detection include a computing device including a multi-layer convolution network and a multi-scale region proposal network (RPN). The multi-layer convolution network generates a convolution map based on an input image. The multi-scale RPN includes multiple RPN layers, each with a different receptive field size. Each RPN layer generates region proposals based on the convolution map. The computing device may include a multi-scale object classifier that includes multiple region of interest (ROI) pooling layers and multiple associated fully connected (FC) layers. Each ROI pooling layer has a different output size, and each FC layer may be trained for an object scale based on the output size of the associated ROI pooling layer. Each ROI pooling layer may generate pooled ROIs based on the region proposals and each FC layer may generate object classification vectors based on the pooled ROIs. Other embodiments are described and claimed.

Citations

25 Claims

1. A computing device for object detection, the computing device comprising a data manager, a multi-layer convolution network, and a multi-scale region proposal network, wherein:
- the multi-scale region proposal network includes a plurality of region proposal network layers, wherein each region proposal network layer has a different receptive field size;
  
  the data manager is to input an input image into the multi-layer convolution network;
  
  the multi-layer convolution network is to generate a convolution map in response to an input of the input image;
  
  the data manager is further to input the convolution map into the multi-scale region proposal network; and
  
  the multi-scale region proposal network is to generate a plurality of region proposals in response to an input of the convolution map, wherein each region proposal is output by a corresponding region proposal network layer, and wherein each region proposal includes a classification vector and a regression vector.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The computing device of claim 1, wherein the multi-scale region proposal network comprises three region proposal network layers.
  - 3. The computing device of claim 2, wherein the first region proposal network layer has a receptive field size of one pixel square, the second region proposal network layer has a receptive field size of three pixels square, and the third region proposal network layer has a receptive field size of five pixels square.
  - 4. The computing device of claim 1, wherein to input the convolution map into the multi-scale region proposal network comprises to input a convolution map from a different convolution layer of the multi-layer convolution network into each of the plurality of region proposal network layers.
  - 5. The computing device of claim 4, wherein:
    - the multi-layer convolution network comprises thirteen convolution layers;
      
      the multi-scale region proposal network comprises three region proposal network layers; and
      
      to input the convolution map into the plurality of region proposal network layers comprises to input a convolution map from the seventh convolution layer into the first region proposal network layer, input a convolution map from the tenth convolution layer into the second region proposal network layer, and input a convolution map from the thirteenth convolution layer into the third region proposal network layer.
  - 6. The computing device of claim 1, further comprising a multi-scale object classifier, wherein:
    - the multi-scale object classifier includes a plurality of region of interest pooling layers and a plurality of fully connected layers, wherein each region of interest pooling layer has a different output size and wherein each region of interest pooling layer is associated with a fully connected layer;
      
      the data manager is further to input the convolution map and the plurality of region proposals into the multi-scale object classifier; and
      
      the multi-scale object classifier is to generate a plurality of object classification vectors in response to an input of the convolution map and the plurality of region proposals, wherein each object classification vector corresponds to a region proposal.
  - 7. The computing device of claim 6, wherein each fully connected layer is trained for an object scale based on the output size of the associated region of interest pooling layer.
  - 8. The computing device of claim 7, wherein the multi-scale object classifier comprises a first fully connected layer trained for image dimensions smaller than 128 pixels and a second fully connected layer trained for image dimensions greater than or equal to 128 pixels.
  - 9. The computing device of claim 6, wherein:
    - a first region of interest pooling layer of the multi-scale object classifier is to generate a first pooled region of interest that corresponds to a first region proposal of the plurality of region proposals;
      
      the multi-scale object classifier is to input the first pooled region of interest into the fully connected layer associated with the first region of interest pooling layer; and
      
      the fully connected layer is to generate a first object classification vector that corresponds to the first region proposal in response to an input of the first pooled region of interest.
  - 10. The computing device of claim 9, wherein:
    - the multi-scale object classifier is further to select the first region of interest pooling layer based on a proposed object size of the first region proposal and the output size of the first region of interest pooling layer; and
      
      to generate the first pooled region of interest comprises to generate the first pooled region of interest in response to selection of the first region of interest pooling layer.
  - 11. The computing device of claim 10, wherein:
    - the multi-scale object classifier further comprises a trainable selection network; and
      
      to select the first region of interest pooling layer comprises to select the first region of interest pooling layer with the trainable selection network.
  - 12. The computing device of claim 1, wherein:
    - the multi-layer convolution network comprises a plurality of convolution layers; and
      
      the multi-layer convolution network is further to concatenate a plurality of convolution maps to generate a concatenated convolution map, wherein each of the convolution maps is generated by a different convolution layer of the plurality of convolution layers.
  - 13. The computing device of claim 12, wherein the multi-layer convolution network is further to input the concatenated convolution map to a convolution layer with a kernel size of one square.

14. A method for object detection, the method comprising:
- inputting, by a computing device, an input image into a multi-layer convolution network;
  
  executing, by the computing device, the multi-layer convolution network in response to inputting the input image to generate a convolution map;
  
  inputting, by the computing device, the convolution map into a multi-scale region proposal network that includes a plurality of region proposal network layers, wherein each region proposal network layer has a different receptive field size; and
  
  executing, by the computing device, the multi-scale region proposal network in response to inputting the convolution map to generate a plurality of region proposals, wherein each region proposal is output by a corresponding region proposal network layer, and wherein each region proposal includes a classification vector and a regression vector.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The method of claim 14, wherein inputting the convolution map into the multi-scale region proposal network comprises inputting a convolution map from a different convolution layer of the multi-layer convolution network into each of the plurality of region proposal network layers.
  - 16. The method of claim 14, further comprising:
    - inputting, by the computing device, the convolution map and the plurality of region proposals into a multi-scale object classifier that includes a plurality of region of interest pooling layers, wherein each region of interest pooling layer has a different output size and wherein each region of interest pooling layer is associated with a fully connected layer; and
      
      executing, by the computing device, the multi-scale object classifier in response to inputting the convolution map and the plurality of region proposals to generate a plurality of object classification vectors, wherein each object classification vector corresponds to a region proposal.
  - 17. The method of claim 16, wherein each fully connected layer is trained for an object scale based on the output size of the associated region of interest pooling layer.
  - 18. The method of claim 16, wherein executing the multi-scale object classifier comprises:
    - executing a first region of interest pooling layer to generate a first pooled region of interest corresponding to a first region proposal of the plurality of region proposals;
      
      inputting the first pooled region of interest into the fully connected layer associated with the first region of interest pooling layer; and
      
      executing the fully connected layer in response to inputting the first pooled region of interest to generate a first object classification vector corresponding to the first region proposal.
  - 19. The method of claim 14, wherein executing the multi-layer convolution network comprises:
    - executing a multi-layer convolution network that includes a plurality of convolution layers; and
      
      concatenating a plurality of convolution maps to generate a concatenated convolution map, wherein each of the convolution maps is generated by a different convolution layer of the plurality of convolution layers.

20. One or more computer-readable storage media comprising a plurality of instructions that in response to being executed cause a computing device to:
- input an input image into a multi-layer convolution network;
  
  execute the multi-layer convolution network in response to inputting the input image to generate a convolution map;
  
  input the convolution map into a multi-scale region proposal network that includes a plurality of region proposal network layers, wherein each region proposal network layer has a different receptive field size; and
  
  execute the multi-scale region proposal network in response to inputting the convolution map to generate a plurality of region proposals, wherein each region proposal is output by a corresponding region proposal network layer, and wherein each region proposal includes a classification vector and a regression vector.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The one or more computer-readable storage media of claim 20, wherein to input the convolution map into the multi-scale region proposal network comprises to input a convolution map from a different convolution layer of the multi-layer convolution network into each of the plurality of region proposal network layers.
  - 22. The one or more computer-readable storage media of claim 20, further comprising a plurality of instructions that in response to being executed cause the computing device to:
    - input the convolution map and the plurality of region proposals into a multi-scale object classifier that includes a plurality of region of interest pooling layers, wherein each region of interest pooling layer has a different output size and wherein each region of interest pooling layer is associated with a fully connected layer; and
      
      execute the multi-scale object classifier in response to inputting the convolution map and the plurality of region proposals to generate a plurality of object classification vectors, wherein each object classification vector corresponds to a region proposal.
  - 23. The one or more computer-readable storage media of claim 22, wherein each fully connected layer is trained for an object scale based on the output size of the associated region of interest pooling layer.
  - 24. The one or more computer-readable storage media of claim 22, wherein to execute the multi-scale object classifier comprises to:
    - execute a first region of interest pooling layer to generate a first pooled region of interest corresponding to a first region proposal of the plurality of region proposals;
      
      input the first pooled region of interest into the fully connected layer associated with the first region of interest pooling layer; and
      
      execute the fully connected layer in response to inputting the first pooled region of interest to generate a first object classification vector corresponding to the first region proposal.
  - 25. The one or more computer-readable storage media of claim 20, wherein to execute the multi-layer convolution network comprises to:
    - execute a multi-layer convolution network that includes a plurality of convolution layers; and
      
      concatenate a plurality of convolution maps to generate a concatenated convolution map, wherein each of the convolution maps is generated by a different convolution layer of the plurality of convolution layers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Roh, Byungseok, Kim, Kye-Hyeon, Hong, Sanghoon, Park, Minje, Cheon, Yeongjae
Primary Examiner(s)
Strege, John B

Application Number

US15/372,953
Publication Number

US 20180165551A1
Time in Patent Office

859 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/24   Classification techniques

G06N 3/045   Combinations of networks

G06V 10/25   Determination of region of ...

G06V 10/454   Integrating the filters int...

G06V 10/82   using neural networks

Technologies for improved object detection accuracy with multi-scale representation and training

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Technologies for improved object detection accuracy with multi-scale representation and training

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links