DEEP NEURAL NETWORK ARCHITECTURE FOR IMAGE SEGMENTATION
First Claim
1. A method for encoding objects in a camera-captured image with a deep neural network pipeline, the method comprising:
- identifying at least a portion of the camera-capture image;
applying a first convolutional neural network to the at least the portion of the camera-captured image at a first stage;
pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage;
performing, at a third stage, at least one convolution of an output of the second stage;
performing, at a fourth stage, at least one deconvolution from the output of the first stage or the output of the second stage;
concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage;
applying a second convolutional neural network to the output of the fifth stage; and
classifying the at least the portion of the camera-captured image as an object category in response to an output of the second convolutional neural network.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and method for encoding objects in a camera-captured image with a deep neural network pipeline including multiple convolutional neural networks or convolutional layers. After identifying at least a portion of the camera-capture image, a first convolutional layer is applied to the at least the portion of the camera-captured image and multiple subregion representations are pooled from the output of the first convolutional layer. One or more additional convolutions are performed. At least one deconvolution is performed and concatenated with the output of one or more convolutions. One or more final convolutions are performed. The at least the portion of the camera-captured image is classified as an object category in response to an output of the one or more final convolutions.
36 Citations
20 Claims
-
1. A method for encoding objects in a camera-captured image with a deep neural network pipeline, the method comprising:
-
identifying at least a portion of the camera-capture image; applying a first convolutional neural network to the at least the portion of the camera-captured image at a first stage; pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage; performing, at a third stage, at least one convolution of an output of the second stage; performing, at a fourth stage, at least one deconvolution from the output of the first stage or the output of the second stage; concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage; applying a second convolutional neural network to the output of the fifth stage; and classifying the at least the portion of the camera-captured image as an object category in response to an output of the second convolutional neural network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus comprising:
-
a first convolution module configured to apply a first convolution to the at least a portion of a camera-captured image; a second convolution module configured to pool a plurality of subregion representations from an output of the first convolutional module; a first deconvolution module configured to perform at least one deconvolution from the output of the first convolution module; a second deconvolution module configured to perform at least one deconvolution from the output of the second convolution module; a third convolution module configured to apply a second convolution of an output of the first deconvolution module concatenated with an output of the second deconvolution module, wherein the at least the portion of the camera-captured image is classified in response to an output of the third convolution module.
-
-
20. A non-transitory computer readable medium including instructions that when executed by a process are configured to:
-
Identify at least a portion of an image; applying a first convolutional neural network to the at least the portion of the image at a first stage; pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage; performing, at a third stage, at least one convolution of an output of the second stage; performing, at a fourth stage, at least one deconvolution from the output of the first stage or the output of the second stage; concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage; applying a second convolutional neural network to the output of the fifth stage; and classifying the at least the portion of the image as an object category in response to an output of the second convolutional neural network.
-
Specification