Training a neural network to detect objects in images

US 9,373,057 B1
Filed: 10/30/2014
Issued: 06/21/2016
Est. Priority Date: 11/01/2013
Status: Active Grant

First Claim

Patent Images

1. A method for training a neural network that receives an input image and outputs a predetermined number of candidate bounding boxes that each cover a respective portion of the input image at a respective position in the input image and a respective confidence score for each candidate bounding box that represents a likelihood that the candidate bounding box contains an image of an object, the method comprising:

receiving a training image and object location data for the training image, wherein the object location data identifies one or more object locations in the training image;

providing the training image to the neural network and obtaining bounding box data for the training image from the neural network, wherein the bounding box data comprises data defining a plurality of candidate bounding boxes in the training image and a respective confidence score for each candidate bounding box in the training image;

determining an optimal set of assignments using the object location data for the training image and the bounding box data for the training image, wherein the optimal set of assignments assigns a respective candidate bounding box to each of the object locations; and

training the neural network on the training image using the optimal set of assignments.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to detect object in images. One of the methods includes receiving a training image and object location data for the training image; providing the training image to a neural network and obtaining bounding box data for the training image from the neural network, wherein the bounding box data comprises data defining a plurality of candidate bounding boxes in the training image and a respective confidence score for each candidate bounding box in the training image; determining an optimal set of assignments using the object location data for the training image and the bounding box data for the training image, wherein the optimal set of assignments assigns a respective candidate bounding box to each of the object locations; and training the neural network on the training image using the optimal set of assignments.

Citations

20 Claims

1. A method for training a neural network that receives an input image and outputs a predetermined number of candidate bounding boxes that each cover a respective portion of the input image at a respective position in the input image and a respective confidence score for each candidate bounding box that represents a likelihood that the candidate bounding box contains an image of an object, the method comprising:
- receiving a training image and object location data for the training image, wherein the object location data identifies one or more object locations in the training image;
  
  providing the training image to the neural network and obtaining bounding box data for the training image from the neural network, wherein the bounding box data comprises data defining a plurality of candidate bounding boxes in the training image and a respective confidence score for each candidate bounding box in the training image;
  
  determining an optimal set of assignments using the object location data for the training image and the bounding box data for the training image, wherein the optimal set of assignments assigns a respective candidate bounding box to each of the object locations; and
  
  training the neural network on the training image using the optimal set of assignments.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein determining the optimal set of assignments comprises performing a bipartite matching between the object locations and the candidate bounding boxes to select the optimal set of assignments.
  - 3. The method of claim 2, wherein performing the bipartite matching comprises:
    - selecting as the optimal set of assignments a set of assignments that minimizes a loss function that includes a localization loss term and a confidence loss term.
  - 4. The method of claim 3, wherein the location loss term for a particular set of assignments is based on, for each of the object locations, a distance in the training image between the object location and a candidate bounding box assigned to the object location by the particular set of assignments.
  - 5. The method of claim 4, wherein the location loss term F_locfor the particular set of assignments x satisfies:
  - 6. The method of claim 3, wherein the confidence loss term for a particular set of assignments is based on, for each candidate bounding box that is assigned to any of the object locations by the particular set of assignments, how close the confidence score for the candidate bounding box is to a first target confidence score for candidate bounding boxes that are assigned to object locations.
  - 7. The method of claim 6, wherein the confidence loss term for the particular set of assignments is further based on, for each candidate bounding box that is not assigned to any of the object locations by the particular set of assignments, how close the confidence score for the candidate bounding box is to a second target confidence score for candidate bounding boxes that are not assigned to object locations, wherein the second target confidence score is lower than the first target confidence score.
  - 8. The method of claim 7, wherein the confidence loss F_confor the particular set of assignments x satisfies:
  - 9. The method of claim 1, wherein the neural network is a deep convolutional neural network.
  - 10. The method of claim 1, wherein the neural network is a deep neural network that comprises an output layer and one or more hidden layers, and wherein training the neural network comprises:
    - training the output layer by minimizing a loss function given the optimal set of assignments; and
      
      training the hidden layers through backpropagation.

11. A system for training a neural network that receives an input image and outputs a predetermined number of candidate bounding boxes that each cover a respective portion of the input image at a respective position in the input image and a respective confidence score for each candidate bounding box that represents a likelihood that the candidate bounding box contains an image of an object, the system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
- receiving a training image and object location data for the training image, wherein the object location data identifies one or more object locations in the training image;
  
  providing the training image to the neural network and obtaining bounding box data for the training image from the neural network, wherein the bounding box data comprises data defining a plurality of candidate bounding boxes in the training image and a respective confidence score for each candidate bounding box in the training image;
  
  determining an optimal set of assignments using the object location data for the training image and the bounding box data for the training image, wherein the optimal set of assignments assigns a respective candidate bounding box to each of the object locations; and
  
  training the neural network on the training image using the optimal set of assignments.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The system of claim 11, wherein determining the optimal set of assignments comprises performing a bipartite matching between the object locations and the candidate bounding boxes to select the optimal set of assignments.
  - 13. The system of claim 12, wherein performing the bipartite matching comprises:
    - selecting as the optimal set of assignments a set of assignments that minimizes a loss function that includes a localization loss term and a confidence loss term.
  - 14. The system of claim 13, wherein the location loss term for a particular set of assignments is based on, for each of the object locations, a distance in the training image between the object location and a candidate bounding box assigned to the object location by the particular set of assignments.
  - 15. The system of claim 14, wherein the location loss term F_locfor the particular set of assignments x satisfies:
  - 16. The system of claim 13, wherein the confidence loss term for a particular set of assignments is based on, for each candidate bounding box that is assigned to any of the object locations by the particular set of assignments, how close the confidence score for the candidate bounding box is to a first target confidence score for candidate bounding boxes that are assigned to object locations.
  - 17. The system of claim 16, wherein the confidence loss term for the particular set of assignments is further based on, for each candidate bounding box that is not assigned to any of the object locations by the particular set of assignments, how close the confidence score for the candidate bounding box is to a second target confidence score for candidate bounding boxes that are not assigned to object locations, wherein the second target confidence score is lower than the first target confidence score.
  - 18. The system of claim 17, wherein the confidence loss F_confor the particular set of assignments x satisfies:
  - 19. The system of claim 11, wherein the neural network is a deep neural network that comprises an output layer and one or more hidden layers, and wherein training the neural network comprises:
    - training the output layer by minimizing a loss function given the optimal set of assignments; and
      
      training the hidden layers through backpropagation.

20. A computer storage medium encoded with a computer program, the computer program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations for training a neural network that receives an input image and outputs a predetermined number of candidate bounding boxes that each cover a respective portion of the input image at a respective position in the input image and a respective confidence score for each candidate bounding box that represents a likelihood that the candidate bounding box contains an image of an object, the operations comprising:
- receiving a training image and object location data for the training image, wherein the object location data identifies one or more object locations in the training image;
  
  providing the training image to the neural network and obtaining bounding box data for the training image from the neural network, wherein the bounding box data comprises data defining a plurality of candidate bounding boxes in the training image and a respective confidence score for each candidate bounding box in the training image;
  
  determining an optimal set of assignments using the object location data for the training image and the bounding box data for the training image, wherein the optimal set of assignments assigns a respective candidate bounding box to each of the object locations; and
  
  training the neural network on the training image using the optimal set of assignments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Erhan, Dumitru, Szegedy, Christian, Anguelov, Dragomir
Primary Examiner(s)
Kassa, Yosef

Application Number

US14/528,815
Time in Patent Office

600 Days
Field of Search

382/156, 382/157, 382/158, 382/159, 382/278, 706/15, 706/16
US Class Current

1/1
CPC Class Codes

G06F 18/214   Generating training pattern...

G06N 3/045   Combinations of networks

G06N 3/084   Backpropagation, e.g. using...

G06V 10/255   Detecting or recognising po...

G06V 10/454   Integrating the filters int...

Training a neural network to detect objects in images

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Training a neural network to detect objects in images

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links