Compact models for object recognition
First Claim
1. An apparatus for object recognition, comprising:
- a processor;
memory in electronic communication with the processor; and
instructions stored in the memory and executable by the processor to cause the apparatus to;
obtain a two-dimensional array of pixels representing an image;
apply a first convolutional operation to the two-dimensional array of pixels to generate a plurality of input channels;
perform a set of processing operations on the plurality of input channels, the set of processing operations comprising;
applying a second convolutional operation to the plurality of input channels to generate a second plurality of input channels;
dividing the second plurality of input channels into first channel groups, wherein each input channel of the second plurality of input channels is associated with a single first channel group;
performing a feature selection operation for each first channel group to generate a first plurality of intermediate channels, wherein each intermediate channel is associated with a respective channel group of the first channel groups; and
applying a third convolutional operation to the first plurality of intermediate channels, wherein the third convolutional operation comprises a first operation applied to each intermediate channel to generate a plurality of feature maps followed by a second operation applied across the plurality of feature maps to generate a first plurality of output channels;
applying the second convolutional operation to the plurality of output channels to generate a third plurality of input channels;
dividing the third plurality of input channels into second channel groups, wherein each input channel of the third plurality of input channels is associated with a single second channel group;
performing the feature selection operation for each second channel group to generate a second plurality of intermediate channels, wherein each intermediate channel of the second plurality of intermediate channels is associated with a respective channel group of the second channel groups; and
applying a pooling function to the second plurality of intermediate channels to generate a second plurality of output channels; and
recognize an object in the image based at least in part on the first plurality of output channels and the second plurality of output channels.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and devices for object recognition are described. Generally, the described techniques provide for a compact and efficient convolutional neural network (CNN) model for facial recognition. The proposed techniques relate to a light model with a set of layers of convolution and one fully connected layer for feature representation. A new building block of for each convolution layer is proposed. A maximum feature map (MFM) operation may be employed to reduce channels (e.g., by combining two or more channels via maximum feature selection within the channels). Depth-wise separable convolution may be employed for computation reduction (e.g., reduction of convolution computation). Batch normalization may be applied to normalize the output of the convolution layers and the fully connected layer (e.g., to prevent overfitting). The described techniques provide a compact and efficient CNN model which can be used for efficient and effective face recognition.
18 Citations
17 Claims
-
1. An apparatus for object recognition, comprising:
-
a processor; memory in electronic communication with the processor; and instructions stored in the memory and executable by the processor to cause the apparatus to; obtain a two-dimensional array of pixels representing an image; apply a first convolutional operation to the two-dimensional array of pixels to generate a plurality of input channels; perform a set of processing operations on the plurality of input channels, the set of processing operations comprising; applying a second convolutional operation to the plurality of input channels to generate a second plurality of input channels; dividing the second plurality of input channels into first channel groups, wherein each input channel of the second plurality of input channels is associated with a single first channel group; performing a feature selection operation for each first channel group to generate a first plurality of intermediate channels, wherein each intermediate channel is associated with a respective channel group of the first channel groups; and applying a third convolutional operation to the first plurality of intermediate channels, wherein the third convolutional operation comprises a first operation applied to each intermediate channel to generate a plurality of feature maps followed by a second operation applied across the plurality of feature maps to generate a first plurality of output channels; applying the second convolutional operation to the plurality of output channels to generate a third plurality of input channels; dividing the third plurality of input channels into second channel groups, wherein each input channel of the third plurality of input channels is associated with a single second channel group; performing the feature selection operation for each second channel group to generate a second plurality of intermediate channels, wherein each intermediate channel of the second plurality of intermediate channels is associated with a respective channel group of the second channel groups; and applying a pooling function to the second plurality of intermediate channels to generate a second plurality of output channels; and recognize an object in the image based at least in part on the first plurality of output channels and the second plurality of output channels. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for object recognition at a device, comprising:
-
obtaining a two-dimensional array of pixels representing an image; applying a first convolutional operation to the two-dimensional array of pixels to generate a plurality of input channels; performing a set of processing operations on the plurality of input channels, the set of processing operations comprising; applying a second convolutional operation to the plurality of input channels to generate a second plurality of input channels; dividing the second plurality of input channels into first channel groups, wherein each input channel of the second plurality of input channels is associated with a single first channel group; performing a feature selection operation for each first channel group to generate a first plurality of intermediate channels, wherein each intermediate channel is associated with a respective channel group of the first channel groups; and applying a third convolutional operation to the first plurality of intermediate channels, wherein the third convolutional operation comprises a first operation applied to each intermediate channel to generate a plurality of feature maps followed by a second operation applied across the plurality of feature maps to generate a first plurality of output channels; applying the second convolutional operation to the plurality of output channels to generate a third plurality of input channels; dividing the third plurality of input channels into second channel groups, wherein each input channel of the third plurality of input channels is associated with a single second channel group; performing the feature selection operation for each second channel group to generate a second plurality of intermediate channels, wherein each intermediate channel of the second plurality of intermediate channels is associated with a respective channel group of the second channel groups; and applying a pooling function to the second plurality of intermediate channels to generate a second plurality of output channels; and recognizing an object in the image based at least in part on the first plurality of output channels and the second plurality of output channels. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium storing code for object recognition, the code comprising instructions executable by a processor to:
-
obtain a two-dimensional array of pixels representing an image; apply a first convolutional operation to the two-dimensional array of pixels to generate a plurality of input channels; perform a set of processing operations on the plurality of input channels, the set of processing operations comprising instructions executable by the processor to; apply a second convolutional operation to the plurality of input channels to generate a second plurality of input channels; divide the second plurality of input channels into first channel groups, wherein each input channel of the second plurality of input channels is associated with a single first channel group; perform a feature selection operation for each first channel group to generate a first plurality of intermediate channels, wherein each intermediate channel is associated with a respective channel group of the first channel groups; and apply a third convolutional operation to the first plurality of intermediate channels, wherein the third convolutional operation comprises a first operation applied to each intermediate channel to generate a plurality of feature maps followed by a second operation applied across the plurality of feature maps to generate a first plurality of output channels; apply the second convolutional operation to the plurality of output channels to generate a third plurality of input channels; divide the third plurality of input channels into second channel groups, wherein each input channel of the third plurality of input channels is associated with a single second channel group; perform the feature selection operation for each second channel group to generate a second plurality of intermediate channels, wherein each intermediate channel of the second plurality of intermediate channels is associated with a respective channel group of the second channel groups; and apply a pooling function to the second plurality of intermediate channels to generate a second plurality of output channels; and recognize an object in the image based at least in part on the first plurality of output channels and the second plurality of output channels. - View Dependent Claims (15, 16, 17)
-
Specification