Methods and Software For Detecting Objects in Images Using a Multiscale Fast Region-Based Convolutional Neural Network

US 20180096457A1
Filed: 09/08/2017
Published: 04/05/2018
Est. Priority Date: 09/08/2016
Status: Active Grant

First Claim

Patent Images

1. A method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising:

receiving the image and storing it in computer memory;

sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales;

pooling at least one of the feature maps to create a corresponding at least one pooled feature map;

normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps;

concatenating the series of normalized feature maps together with one another to create a concatenated feature map;

dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map;

processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification;

if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps;

pooling each of the regions of interest to create a corresponding pooled region of interest;

normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest;

concatenating the normalized regions of interest with one another to create a concatenated region of interest;

dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest;

processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and

storing the bounding box and the confidence score in the computer memory in association with the image.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods of detecting an object in an image using a convolutional neural network based architecture that processes multiple feature maps of differing scales from differing convolution layers within a convolutional network to create a regional-proposal bounding box. The bounding box is projected back to the feature maps of the individual convolution layers to obtain a set of regions of interest. These regions of interest are then processed to ultimately create a confidence score representing the confidence that the object detected in the bounding box is the desired object. These processes allow the method to utilize deep features encoded in both the global and the local representation for object regions, allowing the method to robustly deal with challenges in the problem of robust object detection. Software for executing the disclosed methods within an object-detection system is also disclosed.

Citations

20 Claims

1. A method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising:
- receiving the image and storing it in computer memory;
  
  sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales;
  
  pooling at least one of the feature maps to create a corresponding at least one pooled feature map;
  
  normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps;
  
  concatenating the series of normalized feature maps together with one another to create a concatenated feature map;
  
  dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map;
  
  processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification;
  
  if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps;
  
  pooling each of the regions of interest to create a corresponding pooled region of interest;
  
  normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest;
  
  concatenating the normalized regions of interest with one another to create a concatenated region of interest;
  
  dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest;
  
  processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and
  
  storing the bounding box and the confidence score in the computer memory in association with the image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, wherein the normalizing of the at least one pooled feature map and each of the feature maps not pooled is performed using an L2 normalization.
  - 3. The method according to claim 1, wherein the processing of the dimensionally reduced region of interest to generate a determined confidence score includes using a softmax function.
  - 4. The method according to claim 1, wherein the desired classification is a human face.
  - 5. The method according to claim 1, further comprising the annotating the image to include a visual depiction of the bounding box and the confidence score.
  - 6. The method according to claim 1, wherein the pooling of at least one of the feature maps includes using a max pooling algorithm.
  - 7. The method according to claim 1, wherein the pooling of at least one of the feature maps includes pooling at least two of the feature maps.
  - 8. The method according to claim 1, wherein the normalization of the pooled regions of interest is performed using an L2 normalization.
  - 9. The method according to claim 1, wherein dimensionally reducing the concatenated region of interest includes using a 1×
    - 1 convolution.
  - 10. The method according to claim 1, further comprising displaying to a user on an electronic display, the image, a visual depiction of the bounding box overlaid on the image, and a the confidence score displayed in association with the bounding box.

11. A computer-readable storage medium containing computer-executable instructions that, when executed by a computing system, performs a method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising:
- receiving the image and storing it in computer memory;
  
  sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales;
  
  pooling at least one of the feature maps to create a corresponding at least one pooled feature map;
  
  normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps;
  
  concatenating the series of normalized feature maps together with one another to create a concatenated feature map;
  
  dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map;
  
  processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification;
  
  if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps;
  
  pooling each of the regions of interest to create a corresponding pooled region of interest;
  
  normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest;
  
  concatenating the normalized regions of interest with one another to create a concatenated region of interest;
  
  dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest;
  
  processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and
  
  storing the bounding box and the confidence score in the computer memory in association with the image.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The computer-readable storage medium according to claim 11, wherein the normalizing of the at least one pooled feature map and each of the feature maps not pooled is performed using an L2 normalization.
  - 13. The computer-readable storage medium according to claim 11, wherein the processing of the dimensionally reduced region of interest to generate a determined confidence score includes using a softmax function.
  - 14. The computer-readable storage medium according to claim 11, wherein the desired classification is a human face.
  - 15. The computer-readable storage medium according to claim 11, further comprising the annotating the image to include a visual depiction of the bounding box and the confidence score.
  - 16. The computer-readable storage medium according to claim 11, wherein the pooling of at least one of the feature maps includes using a max pooling algorithm.
  - 17. The computer-readable storage medium according to claim 11, wherein the pooling of at least one of the feature maps includes pooling at least two of the feature maps.
  - 18. The computer-readable storage medium according to claim 11, wherein the normalization of the pooled regions of interest is performed using an L2 normalization.
  - 19. The computer-readable storage medium according to claim 11, wherein dimensionally reducing the concatenated region of interest includes using a 1×
    - 1 convolution.
  - 20. The computer-readable storage medium according to claim 11, further comprising displaying to a user on an electronic display, the image, a visual depiction of the bounding box overlaid on the image, and a the confidence score displayed in association with the bounding box.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Carnegie Mellon University
Original Assignee
Carnegie Mellon University
Inventors
Savvides, Marios, Luu, Khoa, Zheng, Yutong, Zhu, Chenchen

Granted Patent

US 10,354,362 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/50   of still image data

G06F 18/21326   involving optimisations, e....

G06F 18/214   Generating training pattern...

G06F 18/24   Classification techniques

G06F 18/24137   Distances to cluster centroïds

G06T 2207/20084   Artificial neural networks ...

G06T 2210/12   Bounding box

G06T 3/4046   using neural networks

G06V 10/764   using classification, e.g. ...

G06V 10/82   using neural networks

Methods and Software For Detecting Objects in Images Using a Multiscale Fast Region-Based Convolutional Neural Network

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and Software For Detecting Objects in Images Using a Multiscale Fast Region-Based Convolutional Neural Network

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links