NEURAL NETWORK FOR OBJECT DETECTION IN IMAGES

US 20180121762A1
Filed: 11/01/2016
Published: 05/03/2018
Est. Priority Date: 11/01/2016
Status: Active Grant

First Claim

Patent Images

1. A device implemented method for image recognition, the method comprising:

Accessing, using one or more processors of the device coupled to a memory of the device, an image depicting an object of interest and a background within a field of view;

generating, by the one or more processors configured by a multilayer object model, a set of bounding boxes within the image;

detecting, the one or more processors using a set of detection layers of the multilayer object model at least a portion of the object of interest within the image in two or more bounding boxes;

determining, by the one or more processors, context information by passing a layer output of a second detection layer to a first detection layer and incorporating the layer output of the second detection layer into the layer output of the first detection layer; and

based on detecting the portion of the object of interest and determining the context information, identifying, by the one or more processors, the object of interest from the portion of the object of interest included within the two or more bounding boxes, using a set of image representation layers of the multilayer object model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, devices, media, and methods are presented for identifying and categorically labeling objects within a set of images. The systems and methods receive an image depicting an object of interest, detect at least a portion of the object of interest within the image using a multilayer object model, determine context information, and identify the object of interest included in two or more bounding boxes.

86 Citations

View as Search Results

20 Claims

1. A device implemented method for image recognition, the method comprising:
- Accessing, using one or more processors of the device coupled to a memory of the device, an image depicting an object of interest and a background within a field of view;
  
  generating, by the one or more processors configured by a multilayer object model, a set of bounding boxes within the image;
  
  detecting, the one or more processors using a set of detection layers of the multilayer object model at least a portion of the object of interest within the image in two or more bounding boxes;
  
  determining, by the one or more processors, context information by passing a layer output of a second detection layer to a first detection layer and incorporating the layer output of the second detection layer into the layer output of the first detection layer; and
  
  based on detecting the portion of the object of interest and determining the context information, identifying, by the one or more processors, the object of interest from the portion of the object of interest included within the two or more bounding boxes, using a set of image representation layers of the multilayer object model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein generating the set of bounding boxes further comprises:
    - identifying a set of coordinates within the image, the set of coordinates including an indication of one or more boundaries for the image;
      
      determining a set of sizes and a set of aspect ratios for the set of bounding boxes;
      
      determining a distribution of bounding boxes to encompass each coordinate of the set of coordinates in at least one bounding box of the set of bounding boxes; and
      
      generating the set of bounding boxes to distribute the set of bounding boxes uniformly over the image, wherein each bounding box of the set of bounding boxes is generated with a size included in the set of sizes and an aspect ratio included in the set of aspect ratios.
  - 3. The method of claim 2, wherein the set of bounding boxes includes at least one bounding box having a first size and a first aspect ratio and at least one bounding box having a second size and a second aspect ratio, and wherein the first size is distinct from the second size and the first aspect ratio is distinct from the second aspect ratio.
  - 4. The method of claim 1, wherein detecting the portion of the object of interest using the set of detection layers of the multilayer object model further comprises:
    - detecting part of the portion of the object of interest in a first bounding box of the two or more bounding boxes using a first detection layer, the first detection layer associated with a first scale corresponding to the first bounding box; and
      
      detecting part of the portion of the object of interest in a second bounding box of the two or more bounding boxes using a second detection layer, the second detection layer associated with a second scale corresponding to the second bounding box.
  - 5. The method of claim 4, wherein the first detection layer generates a first confidence score and a first set of coordinates for the part of the object of interest depicted within the first bounding box and the second detection layer generates a second confidence score and a second set of coordinates for the part of the object of interest depicted within the second bounding box.
  - 6. The method of claim 1, wherein the layer output of the second detection layer is passed to the first detection layer using a deconvolution layer of the multilayer object model.
  - 7. The method of claim 1 further comprising training the multilayer object model by:
    - accessing a set of training images, each training image depicting a known object of interest;
      
      identifying a set of bounding boxes within the set of training images, each bounding box having a set of coordinates identifying a location within a training image, a resolution, and a label;
      
      determining the resolution of a bounding box exceeds a specified box resolution;
      
      rescaling the resolution of the bounding box to match the specified box resolution by identifying a center point of the bounding box and cropping portions of the bounding box outside of the specified box resolution with respect to the center point;
      
      initializing one or more model parameters; and
      
      iteratively adjusting the one or more model parameters to until a change in averaged loss function values resulting from iterations of the one or more model parameters falls below a change threshold.
  - 8. The method of claim 1, further comprising training the multilayer object model by:
    - accessing a set of training images, each training image depicting a known object of interest; and
      
      detecting the known objects of interest within the set of training images using the multilayer object model, the detection performed with one or more layers of the multilayer object model set at a first resolution, and one or more layers of the multilayer object model set at a second resolution.
  - 9. The method of claim 8, wherein detecting the known objects of interest further comprises:
    - iteratively adjusting one or more model parameters until a change in averaged loss function values falls below a change threshold, the averaged loss function obtained for two or more instances of a training image of the set of training images, each of the two or more instances of the training image having distinct resolutions.
  - 10. The method of claim 1, further comprising training the set of detection layers of the multilayer object model by:
    - accessing a set of training images, each training image depicting a known object of interest and containing at least one bounding box comprising a tuple indicating coordinates of the bounding box within a training image and a classification label for the known object of interest; and
      
      iteratively initializing one or more image representation layers of the set of image representation layers and one or more detection layers of the set of detection layers and adjusting one or more parameters of the one or more image representation layers and the one or more detection layers until a change in averaged loss function values resulting from iterations of the one or more model parameters falls below a change threshold.

11. A system comprising:
- one or more processors; and
  
  a processor-readable storage device coupled to the one or more processors, the processor-readable storage device storing processor-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  accessing an image depicting an object of interest and a background within a field of view;
  
  generating, by one or more processors configured by a multilayer object model, a set of bounding boxes within the image;
  
  detecting, using a set of detection layers of the multilayer object model, at least a portion of the object of interest within the image in two or more bounding boxes;
  
  determining context information by passing a layer output of a second detection layer to a first detection layer and incorporating the layer output of the second detection layer into the layer output of the first detection layer; and
  
  based on detecting the portion of the object of interest and determining the context information, identifying the object of interest from the portions of the object of interest included within the two or more bounding boxes, using a set of image representation layers of the multilayer object model.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system of claim 11, wherein generating the set of bounding boxes further comprises:
    - identifying a set of coordinates within the image, the set of coordinates including an indication of one or more boundaries for the image;
      
      determining a set of sizes and a set of aspect ratios for the set of bounding boxes;
      
      determining a distribution of bounding boxes to encompass each coordinate of the set of coordinates in at least one bounding box of the set of bounding boxes; and
      
      generating the set of bounding boxes to distribute the set of bounding boxes uniformly over the image, wherein each bounding box of the set of bounding boxes is generated with a size included in the set of sizes and an aspect ratio included in the set of aspect ratios.
  - 13. The system of claim 11, wherein detecting the portion of the object of interest using the set of detection layers of the multilayer object model further comprises:
    - detecting part of the portion of the object of interest in a first bounding box of the two or more bounding boxes using a first detection layer, the first detection layer associated with a first scale corresponding to the first bounding box; and
      
      detecting part of the portion of the object of interest in a second bounding box of the two or more bounding boxes using the second detection layer, the second detection layer associated with a second scale corresponding to the second bounding box.
  - 14. The system of claim 11, wherein the layer output of the second detection layer is passed to the first detection layer using a deconvolution layer of the multilayer object model.
  - 15. The system of claim 11 further comprising training the multilayer object model by:
    - accessing a set of training images, each training image depicting a known object of interest;
      
      identifying a set of bounding boxes within the set of training images, each bounding box having a set of coordinates identifying a location within a training image, a resolution, and a label;
      
      determining the resolution of a bounding box exceeds a specified box resolution;
      
      rescaling the resolution of the bounding box to match the specified box resolution by identifying a center point of the bounding box and cropping portions of the bounding box outside of the specified box resolution with respect to the center point;
      
      initializing one or more model parameters; and
      
      iteratively adjusting the one or more model parameters until a change in averaged loss function values resulting from iterations of the one or more model parameters falls below a change threshold.

16. A processor-readable storage device storing processor-executable instructions that, when executed by one or more processors of a machine, cause the machine to perform operations comprising:
- accessing an image depicting an object of interest and a background within a field of view;
  
  generating, by a multilayer object model implemented using the one or more processors, a set of bounding boxes within the image;
  
  detecting, using a set of detection layers of the multilayer object model, at least a portion of the object of interest within the image in two or more bounding boxes;
  
  determining context information by passing a layer output of a second detection layer to a first detection layer and incorporating the layer output of the second detection layer into the layer output of the first detection layer; and
  
  based on detecting the portion of the object of interest and determining the context information, identifying the object of interest from the portions of the object of interest included within the two or more bounding boxes, using a set of image representation layers of the multilayer object model.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The processor-readable storage device of claim 16, wherein generating the set of bounding boxes further comprises:
    - identifying a set of coordinates within the image, the set of coordinates including an indication of one or more boundaries for the image;
      
      determining a set of sizes and a set of aspect ratios for the set of bounding boxes;
      
      determining a distribution of bounding boxes to encompass each coordinate of the set of coordinates in at least one bounding box of the set of bounding boxes; and
      
      generating the set of bounding boxes to distribute the set of bounding boxes uniformly over the image, wherein each bounding box of the set of bounding boxes is generated with a size included in the set of sizes and an aspect ratio included in the set of aspect ratios.
  - 18. The processor-readable storage device of claim 16, wherein detecting the portion of the object of interest using the set of detection layers of the multilayer object model further comprises:
    - detecting part of the portion of the object of interest in a first bounding box of the two or more bounding boxes using the first detection layer, the first detection layer associated with a first scale corresponding to the first bounding box; and
      
      detecting part of the portion of the object of interest in a second bounding box of the two or more bounding boxes using the second detection layer, the second detection layer associated with a second scale corresponding to the second bounding box.
  - 19. The processor-readable storage device of claim 16, wherein the layer output of the second detection layer is passed to the first detection layer using a deconvolution layer of the multilayer object model.
  - 20. The processor-readable storage device of claim 16 further comprising training the multilayer object model by:
    - accessing a set of training images, each training image depicting a known object of interest;
      
      identifying a set of bounding boxes within the set of training images, each bounding box having a set of coordinates identifying a location within a training image, a resolution, and a label;
      
      determining the resolution of a bounding box exceeds a specified box resolution;
      
      rescaling the resolution of the bounding box to match the specified box resolution by identifying a center point of the bounding box and cropping portions of the bounding box outside of the specified box resolution with respect to the center point;
      
      initializing one or more model parameters; and
      
      iteratively adjusting the one or more model parameters until a change in averaged loss function values resulting from iterations of the one or more model parameters falls below a change threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Snap, Inc.
Original Assignee
Snap, Inc.
Inventors
Han, Wei, Yang, Jianchao, Zhang, Ning, Li, Jia

Granted Patent

US 10,346,723 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/24   Classification techniques

G06F 18/2413   based on distances to train...

G06N 3/045   Combinations of networks

G06N 3/084   Backpropagation, e.g. using...

G06T 3/40   Scaling of whole images or ...

G06V 10/25   Determination of region of ...

G06V 10/42   Global feature extraction b...

G06V 10/44   Local feature extraction by...

G06V 10/454   Integrating the filters int...

G06V 10/462   Salient features, e.g. scal...

G06V 10/764   using classification, e.g. ...

G06V 10/82   using neural networks

G06V 30/194   References adjustable by an...

NEURAL NETWORK FOR OBJECT DETECTION IN IMAGES

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

86 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

NEURAL NETWORK FOR OBJECT DETECTION IN IMAGES

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

86 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links