Object detection and classification in images

US 9,858,496 B2
Filed: 01/20/2016
Issued: 01/02/2018
Est. Priority Date: 01/20/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving an input image;

generating a convolutional feature map;

identifying, by a first type of neural network, a candidate object in the input image;

determining, by a second type of neural network, a category of the candidate object; and

assigning a confidence score to the category of the candidate object,wherein the first type of neural network comprises a translation invariant component configured to;

classify an anchor based on overlap with a ground-truth box; and

predict a shift and a scale of the anchor.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, and computer-readable media for providing fast and accurate object detection and classification in images are described herein. In some examples, a computing device can receive an input image. The computing device can process the image, and generate a convolutional feature map. In some configurations, the convolutional feature map can be processed through a Region Proposal Network (RPN) to generate proposals for candidate objects in the image. In various examples, the computing device can process the convolutional feature map with the proposals through a Fast Region-Based Convolutional Neural Network (FRCN) proposal classifier to determine a class of each object in the image and a confidence score associated therewith. The computing device can then provide a requestor with an output including the object classification and/or confidence score.

69 Citations

View as Search Results

17 Claims

1. A method comprising:
- receiving an input image;
  
  generating a convolutional feature map;
  
  identifying, by a first type of neural network, a candidate object in the input image;
  
  determining, by a second type of neural network, a category of the candidate object; and
  
  assigning a confidence score to the category of the candidate object,wherein the first type of neural network comprises a translation invariant component configured to;
  
  classify an anchor based on overlap with a ground-truth box; and
  
  predict a shift and a scale of the anchor.
- View Dependent Claims (2, 3)
- - 2. A method as claim 1 recites, wherein the convolutional feature map is generated by a Zeiler and Fergus model or a Simonyan and Zisserman model deep convolutional neural network.
  - 3. A method as claim 1 recites, further comprising training the convolutional feature map, the first type of neural network, and the second type of neural network using at least one of:
    - stochastic gradient descent;
      
      orback-propagation.

4. A method as comprising:
- receiving an input image;
  
  generating a convolutional feature map;
  
  identifying, by a first type of neural network, a candidate object in the input image, wherein the identifying the candidate object in the input image comprises;
  
  generating one or more anchors at a point of the input image;
  
  determining an overlap of individual ones of the one or more anchors to a ground-truth box;
  
  assigning a label to each anchor of the one or more anchors based at least in part on the overlap;
  
  assigning a score to the label based at least in part on the overlap; and
  
  identifying the candidate object at the point based at least in part on the score;
  
  determining, by a second type of neural network, a category of the candidate object, wherein the first type of neural network and the second type of neural network share at least one algorithm; and
  
  assigning a confidence score to the category of the candidate object.
- View Dependent Claims (5, 6, 7)
- - 5. A method as claim 4 recites, wherein the identifying the candidate object in the input image further comprises:
    - identifying an anchor corresponding to a highest score, the highest score corresponding to a percentage of the overlap;
      
      shifting the anchor corresponding to the highest score to better define the candidate object; and
      
      scaling the anchor corresponding to the highest score to better define the candidate object.
  - 6. A method as claim 4 recites, wherein the generating the one or more anchors at the point of the input image comprises generating a set of anchor boxes;
    - the set anchor boxes having three scales and three aspect ratios.
  - 7. A method as claim 4 recites, wherein the label is positive when the overlap exceeds a threshold level.

8. A system comprising:
- a processor; and
  
  a computer-readable medium including instructions for an object detection and classification network, for execution by the processor, the object detection and classification network comprising;
  
  an initial processing module configured to input an image and generate a convolutional feature map;
  
  an object proposal module configured to generate a proposal corresponding to a candidate object in the image, and further comprising a translation invariant component configured to;
  
  classify an anchor based on overlap with a ground-truth box; and
  
  predict a shift and a scale of the anchor; and
  
  a proposal classifier module configured to assign a category associated with the candidate object, wherein the object proposal module and the proposal classifier module share at least one convolutional layer.
- View Dependent Claims (9, 10)
- - 9. A system as claim 8 recites, wherein the proposal classifier module is further configured to assign a confidence score to the classification.
  - 10. A system as claim 8 recites, wherein the object proposal module is further configured to:
    - generate one or more anchors at a point of the image;
      
      determine an overlap of each anchor of the one or more anchors to a ground-truth box;
      
      assign a label to each anchor of the one or more anchor based at least in part on the overlap;
      
      assign a score to the label based at least in part on the overlap;
      
      select an anchor with a highest score; and
      
      generate the proposal based at least in part on the highest score.

11. A system comprising:
- a processor; and
  
  a computer-readable medium including instructions for an object detection and classification network, for execution by the processor, the object detection and classification network comprising;
  
  an initial processing module configured to input an image and generate a convolutional feature map;
  
  an object proposal module configured to generate a proposal corresponding to a candidate object in the image, wherein the object proposal module is further configured to;
  
  identify an anchor corresponding to a highest score, the highest score corresponding to a percentage of the overlap;
  
  shift the anchor corresponding to the highest score to better define the candidate object;
  
  orscale the anchor corresponding to the highest score to better define the candidate object; and
  
  a proposal classifier module configured to assign a category associated with the candidate object, wherein the object proposal module and the proposal classifier module share at least one convolutional layer.

12. A system comprising:
- a processor;
  
  a computer-readable medium including instructions for an object detection and classification network, for execution by the processor, the object detection and classification network comprising;
  
  an initial processing module configured to input an image and generate a convolutional feature map;
  
  an object proposal module configured to generate a proposal corresponding to a candidate object in the image; and
  
  a proposal classifier module configured to assign a category associated with the candidate object, wherein the object proposal module and the proposal classifier module share at least one convolutional layer; and
  
  a machine learning module configured to;
  
  train one or more parameters of the initial processing module and the object proposal module to generate one or more proposals on a training image; and
  
  train one or more parameters of the proposal classifier module to assign a category to each of the one or more proposals on the training image.
- View Dependent Claims (13)
- - 13. A system as claim 12 recites, wherein the machine learning module is further configured to train the one or more parameters of the initial processing module, the object proposal module, and the proposal classifier module using one or more of:
    - stochastic gradient descent;
      
      orback-propagation.

14. A non-transitory computer readable storage medium having instructions stored thereon, the instructions when executed by a computing device cause the computing device to:
- receive an input image;
  
  generate a convolutional feature map;
  
  generate one or more anchors at a point of the input image;
  
  determine an overlap of individual ones of the one or more anchors to a ground-truth box;
  
  assign a label to each anchor of the one or more anchors based at least in part on the overlap;
  
  assign a score to the label based at least in part on the overlap;
  
  identify, by a neural network, a candidate object in the input image, the candidate object at the point based at least in part on the score;
  
  determine, by a proposal classifier sharing an algorithm with the neural network, a category of the candidate object; and
  
  assign, by the proposal classifier, a confidence score to the category of the candidate object.
- View Dependent Claims (15, 16)
- - 15. A non-transitory computer readable storage medium as claim 14 recites, wherein the neural network is a region proposal network and the proposal classifier is a component of a fast region based convolutional neural network.
  - 16. A non-transitory computer readable storage medium as claim 14 recites, further comprising instructions to train the convolution feature map, the neural network, and the proposal classifier using at least one of:
    - stochastic gradient descent;
      
      orback-propagation.

17. A non-transitory computer readable storage medium having instructions stored thereon, the instructions when executed on a computing device cause the computing device to:
- receive an input image;
  
  generate a convolutional feature map;
  
  identify, by a neural network, a candidate object in the input image;
  
  determine, by a proposal classifier sharing an algorithm with the neural network, a category of the candidate object; and
  
  assign, by the proposal classifier, a confidence score to the category of the candidate object, wherein the neural network comprises a translation invariant component configured to;
  
  classify an anchor based on overlap with a ground-truth box; and
  
  predict a shift and a scale of the anchor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Sun, Jian, Girshick, Ross, Ren, Shaoqing, He, Kaiming
Primary Examiner(s)
Allison, Andrae S

Application Number

US15/001,417
Publication Number

US 20170206431A1
Time in Patent Office

713 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/5838   using colour

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9538   Presentation of query results

G06F 18/24   Classification techniques

G06N 3/045   Combinations of networks

G06N 3/084   Backpropagation, e.g. using...

G06V 10/25   Determination of region of ...

G06V 10/454   Integrating the filters int...

G06V 30/248   involving plural approaches...

Object detection and classification in images

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

69 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

Object detection and classification in images

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

69 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others