Compact models for object recognition

US 10,706,267 B2
Filed: 01/12/2018
Issued: 07/07/2020
Est. Priority Date: 01/12/2018
Status: Active Grant

First Claim

Patent Images

1. An apparatus for object recognition, comprising:

a processor;

memory in electronic communication with the processor; and

instructions stored in the memory and executable by the processor to cause the apparatus to;

obtain a two-dimensional array of pixels representing an image;

apply a first convolutional operation to the two-dimensional array of pixels to generate a plurality of input channels;

perform a set of processing operations on the plurality of input channels, the set of processing operations comprising;

applying a second convolutional operation to the plurality of input channels to generate a second plurality of input channels;

dividing the second plurality of input channels into first channel groups, wherein each input channel of the second plurality of input channels is associated with a single first channel group;

performing a feature selection operation for each first channel group to generate a first plurality of intermediate channels, wherein each intermediate channel is associated with a respective channel group of the first channel groups; and

applying a third convolutional operation to the first plurality of intermediate channels, wherein the third convolutional operation comprises a first operation applied to each intermediate channel to generate a plurality of feature maps followed by a second operation applied across the plurality of feature maps to generate a first plurality of output channels;

applying the second convolutional operation to the plurality of output channels to generate a third plurality of input channels;

dividing the third plurality of input channels into second channel groups, wherein each input channel of the third plurality of input channels is associated with a single second channel group;

performing the feature selection operation for each second channel group to generate a second plurality of intermediate channels, wherein each intermediate channel of the second plurality of intermediate channels is associated with a respective channel group of the second channel groups; and

applying a pooling function to the second plurality of intermediate channels to generate a second plurality of output channels; and

recognize an object in the image based at least in part on the first plurality of output channels and the second plurality of output channels.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and devices for object recognition are described. Generally, the described techniques provide for a compact and efficient convolutional neural network (CNN) model for facial recognition. The proposed techniques relate to a light model with a set of layers of convolution and one fully connected layer for feature representation. A new building block of for each convolution layer is proposed. A maximum feature map (MFM) operation may be employed to reduce channels (e.g., by combining two or more channels via maximum feature selection within the channels). Depth-wise separable convolution may be employed for computation reduction (e.g., reduction of convolution computation). Batch normalization may be applied to normalize the output of the convolution layers and the fully connected layer (e.g., to prevent overfitting). The described techniques provide a compact and efficient CNN model which can be used for efficient and effective face recognition.

18 Citations

17 Claims

1. An apparatus for object recognition, comprising:
- a processor;
  
  memory in electronic communication with the processor; and
  
  instructions stored in the memory and executable by the processor to cause the apparatus to;
  
  obtain a two-dimensional array of pixels representing an image;
  
  apply a first convolutional operation to the two-dimensional array of pixels to generate a plurality of input channels;
  
  perform a set of processing operations on the plurality of input channels, the set of processing operations comprising;
  
  applying a second convolutional operation to the plurality of input channels to generate a second plurality of input channels;
  
  dividing the second plurality of input channels into first channel groups, wherein each input channel of the second plurality of input channels is associated with a single first channel group;
  
  performing a feature selection operation for each first channel group to generate a first plurality of intermediate channels, wherein each intermediate channel is associated with a respective channel group of the first channel groups; and
  
  applying a third convolutional operation to the first plurality of intermediate channels, wherein the third convolutional operation comprises a first operation applied to each intermediate channel to generate a plurality of feature maps followed by a second operation applied across the plurality of feature maps to generate a first plurality of output channels;
  
  applying the second convolutional operation to the plurality of output channels to generate a third plurality of input channels;
  
  dividing the third plurality of input channels into second channel groups, wherein each input channel of the third plurality of input channels is associated with a single second channel group;
  
  performing the feature selection operation for each second channel group to generate a second plurality of intermediate channels, wherein each intermediate channel of the second plurality of intermediate channels is associated with a respective channel group of the second channel groups; and
  
  applying a pooling function to the second plurality of intermediate channels to generate a second plurality of output channels; and
  
  recognize an object in the image based at least in part on the first plurality of output channels and the second plurality of output channels.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The apparatus of claim 1, wherein the set of processing operations further comprise:
    - applying one or more batch normalization operations, each batch normalization operation adjusting a mean of one or more channels, a variance of one or more channels, or both.
  - 3. The apparatus of claim 1, wherein the set of processing operations is performed a plurality of times, and wherein the plurality of input channels of a second iteration of the set of processing operations is based at least in part on the first plurality of output channels of a first iteration of the set of processing operations.
  - 4. The apparatus of claim 3, wherein the instructions are further executable by the processor to cause the apparatus to:
    - feed the first plurality of output channels of a last iteration of the set of processing operations to a fully connected layer to generate a plurality of connected-layer output channels;
      
      divide the plurality of connected-layer output channels into third channel groups, wherein each connected-layer output channel of the plurality of connected-layer output channels is associated with a single channel group of the third channel groups; and
      
      perform the feature selection operation for each third channel group to generate a plurality of final channels, wherein each final channel is associated with a respective channel group of the third channel groups and wherein the object in the image is recognized based at least in part on the plurality of final channels.
  - 5. The apparatus of claim 4, wherein the instructions are further executable by the processor to cause the apparatus to:
    - feed the first plurality of output channels of the last iteration of the set of processing operations to a second fully connected layer to generate a second plurality of connected-layer output channels;
      
      divide the second plurality of connected-layer output channels into fourth channel groups, wherein each connected-layer output channel of the second plurality of connected-layer output channels is associated with a single channel group of the fourth channel groups;
      
      perform the feature selection operation for each fourth channel group to generate a second plurality of final channels, wherein each final channel of the second plurality of final channels is associated with a respective channel group of the fourth channel groups; and
      
      detect a feature of the object in the image based at least in part on the second plurality of final channels.
  - 6. The apparatus of claim 1, wherein the instructions to divide the second plurality of input channels into first channel groups are executable by the processor to cause the apparatus to:
    - divide the second plurality of input channels into pairs such that a number of the first plurality of intermediate channels is half a number of the second plurality of input channels.
  - 7. The apparatus of claim 1, wherein the instructions to apply the first convolutional operation to the two-dimensional array of pixels are executable by the processor to cause the apparatus to:
    - apply a plurality of convolution kernels to the two-dimensional array of pixels to generate a plurality of initial input channels, each convolution kernel having a same size;
      
      apply a batch normalization operation to the plurality of initial input channels to generate a plurality of normalized input channels, wherein the batch normalization operation adjusts a mean of one or more initial input channels, a variance of one or more initial input channels, or both;
      
      divide the plurality of normalized input channels into initial channel groups, wherein each normalized input channel of the plurality of normalized input channels is associated with a single channel group of the initial channel groups; and
      
      perform the feature selection operation for each initial channel group to generate the plurality of input channels, wherein each input channel is associated with a respective initial channel group.

8. A method for object recognition at a device, comprising:
- obtaining a two-dimensional array of pixels representing an image;
  
  applying a first convolutional operation to the two-dimensional array of pixels to generate a plurality of input channels;
  
  performing a set of processing operations on the plurality of input channels, the set of processing operations comprising;
  
  applying a second convolutional operation to the plurality of input channels to generate a second plurality of input channels;
  
  dividing the second plurality of input channels into first channel groups, wherein each input channel of the second plurality of input channels is associated with a single first channel group;
  
  performing a feature selection operation for each first channel group to generate a first plurality of intermediate channels, wherein each intermediate channel is associated with a respective channel group of the first channel groups; and
  
  applying a third convolutional operation to the first plurality of intermediate channels, wherein the third convolutional operation comprises a first operation applied to each intermediate channel to generate a plurality of feature maps followed by a second operation applied across the plurality of feature maps to generate a first plurality of output channels;
  
  applying the second convolutional operation to the plurality of output channels to generate a third plurality of input channels;
  
  dividing the third plurality of input channels into second channel groups, wherein each input channel of the third plurality of input channels is associated with a single second channel group;
  
  performing the feature selection operation for each second channel group to generate a second plurality of intermediate channels, wherein each intermediate channel of the second plurality of intermediate channels is associated with a respective channel group of the second channel groups; and
  
  applying a pooling function to the second plurality of intermediate channels to generate a second plurality of output channels; and
  
  recognizing an object in the image based at least in part on the first plurality of output channels and the second plurality of output channels.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The method of claim 8, wherein performing the set of processing operations further comprises:
    - applying one or more batch normalization operations, each batch normalization operation adjusting a mean of one or more channels, a variance of one or more channels, or both.
  - 10. The method of claim 8, wherein the set of processing operations is performed a plurality of times, and wherein the plurality of input channels of a second iteration of the set of processing operations is based at least in part on the first plurality of output channels of a first iteration of the set of processing operations.
  - 11. The method of claim 10, further comprising:
    - feeding the first plurality of output channels of a last iteration of the set of processing operations to a fully connected layer to generate a plurality of connected-layer output channels;
      
      dividing the plurality of connected-layer output channels into third channel groups, wherein each connected-layer output channel of the plurality of connected-layer output channels is associated with a single channel group of the third channel groups; and
      
      performing the feature selection operation for each third channel group to generate a plurality of final channels, wherein each final channel is associated with a respective channel group of the third channel groups and wherein the object in the image is recognized based at least in part on the plurality of final channels.
  - 12. The method of claim 8, wherein dividing the second plurality of input channels into first channel groups comprises:
    - dividing the second plurality of input channels into pairs such that a number of the first plurality of intermediate channels is half a number of the second plurality of input channels.
  - 13. The method of claim 8, wherein applying the first convolutional operation to the two-dimensional array of pixels comprises:
    - applying a plurality of convolution kernels to the two-dimensional array of pixels to generate a plurality of initial input channels, each convolution kernel having a same size;
      
      applying a batch normalization operation to the plurality of initial input channels to generate a plurality of normalized input channels, wherein the batch normalization operation adjusts a mean of one or more initial input channels, a variance of one or more initial input channels, or both;
      
      dividing the plurality of normalized input channels into initial channel groups, wherein each normalized input channel of the plurality of normalized input channels is associated with a single channel group of the initial channel groups; and
      
      performing the feature selection operation for each initial channel group to generate the plurality of input channels, wherein each input channel is associated with a respective initial channel group.

14. A non-transitory computer-readable medium storing code for object recognition, the code comprising instructions executable by a processor to:
- obtain a two-dimensional array of pixels representing an image;
  
  apply a first convolutional operation to the two-dimensional array of pixels to generate a plurality of input channels;
  
  perform a set of processing operations on the plurality of input channels, the set of processing operations comprising instructions executable by the processor to;
  
  apply a second convolutional operation to the plurality of input channels to generate a second plurality of input channels;
  
  divide the second plurality of input channels into first channel groups, wherein each input channel of the second plurality of input channels is associated with a single first channel group;
  
  perform a feature selection operation for each first channel group to generate a first plurality of intermediate channels, wherein each intermediate channel is associated with a respective channel group of the first channel groups; and
  
  apply a third convolutional operation to the first plurality of intermediate channels, wherein the third convolutional operation comprises a first operation applied to each intermediate channel to generate a plurality of feature maps followed by a second operation applied across the plurality of feature maps to generate a first plurality of output channels;
  
  apply the second convolutional operation to the plurality of output channels to generate a third plurality of input channels;
  
  divide the third plurality of input channels into second channel groups, wherein each input channel of the third plurality of input channels is associated with a single second channel group;
  
  perform the feature selection operation for each second channel group to generate a second plurality of intermediate channels, wherein each intermediate channel of the second plurality of intermediate channels is associated with a respective channel group of the second channel groups; and
  
  apply a pooling function to the second plurality of intermediate channels to generate a second plurality of output channels; and
  
  recognize an object in the image based at least in part on the first plurality of output channels and the second plurality of output channels.
- View Dependent Claims (15, 16, 17)
- - 15. The non-transitory computer-readable medium of claim 14, wherein the set of processing operations is performed a plurality of times, and wherein the plurality of input channels of a second iteration of the set of processing operations is based at least in part on the first plurality of output channels of a first iteration of the set of processing operations.
  - 16. The non-transitory computer-readable medium of claim 14, wherein the instructions to divide the second plurality of input channels into first channel groups are executable by the processor to:
    - divide the second plurality of input channels into pairs such that a number of the first plurality of intermediate channels is half a number of the second plurality of input channels.
  - 17. The non-transitory computer-readable medium of claim 14, wherein the instructions to apply the first convolutional operation to the two-dimensional array of pixels are executable by the processor to:
    - apply a plurality of convolution kernels to the two-dimensional array of pixels to generate a plurality of initial input channels, each convolution kernel having a same size;
      
      apply a batch normalization operation to the plurality of initial input channels to generate a plurality of normalized input channels, wherein the batch normalization operation adjusts a mean of one or more initial input channels, a variance of one or more initial input channels, or both;
      
      divide the plurality of normalized input channels into initial channel groups, wherein each normalized input channel of the plurality of normalized input channels is associated with a single channel group of the initial channel groups; and
      
      perform the feature selection operation for each initial channel group to generate the plurality of input channels, wherein each input channel is associated with a respective initial channel group.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Wang, Lei, Bi, Ning, Qi, Yingyong
Primary Examiner(s)
Niu, Feng

Application Number

US15/869,342
Publication Number

US 20190220653A1
Time in Patent Office

907 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/24143   Distances to neighbourhood ...

G06V 10/82   using neural networks

G06V 30/19173   Classification techniques

G06V 40/168   Feature extraction; Face re...

G06V 40/172   Classification, e.g. identi...

Compact models for object recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

18 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

Compact models for object recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others