DEEP NEURAL NETWORK ARCHITECTURE FOR IMAGE SEGMENTATION

US 20200134833A1
Filed: 10/26/2018
Published: 04/30/2020
Est. Priority Date: 10/26/2018
Status: Active Grant

First Claim

Patent Images

1. A method for encoding objects in a camera-captured image with a deep neural network pipeline, the method comprising:

identifying at least a portion of the camera-capture image;

applying a first convolutional neural network to the at least the portion of the camera-captured image at a first stage;

pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage;

performing, at a third stage, at least one convolution of an output of the second stage;

performing, at a fourth stage, at least one deconvolution from the output of the first stage or the output of the second stage;

concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage;

applying a second convolutional neural network to the output of the fifth stage; and

classifying the at least the portion of the camera-captured image as an object category in response to an output of the second convolutional neural network.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus and method for encoding objects in a camera-captured image with a deep neural network pipeline including multiple convolutional neural networks or convolutional layers. After identifying at least a portion of the camera-capture image, a first convolutional layer is applied to the at least the portion of the camera-captured image and multiple subregion representations are pooled from the output of the first convolutional layer. One or more additional convolutions are performed. At least one deconvolution is performed and concatenated with the output of one or more convolutions. One or more final convolutions are performed. The at least the portion of the camera-captured image is classified as an object category in response to an output of the one or more final convolutions.

36 Citations

View as Search Results

20 Claims

1. A method for encoding objects in a camera-captured image with a deep neural network pipeline, the method comprising:
- identifying at least a portion of the camera-capture image;
  
  applying a first convolutional neural network to the at least the portion of the camera-captured image at a first stage;
  
  pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage;
  
  performing, at a third stage, at least one convolution of an output of the second stage;
  
  performing, at a fourth stage, at least one deconvolution from the output of the first stage or the output of the second stage;
  
  concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage;
  
  applying a second convolutional neural network to the output of the fifth stage; and
  
  classifying the at least the portion of the camera-captured image as an object category in response to an output of the second convolutional neural network.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1, wherein the at least one deconvolution from the output of the first stage or the output of the second stage further comprises:
    - a first deconvolution from the output of the first stage; and
      
      a second deconvolution from the output of the second stage.
  - 3. The method of claim 2, further comprising:
    - concatenating an output of the second deconvolution with the output of the first stage to provide a concatenated input for the first deconvolution.
  - 4. The method of claim 2, further comprising:
    - performing, at the fourth stage, a third deconvolution from an output of the third stage.
  - 5. The method of claim 4, further comprising:
    - concatenating an output of the third deconvolution with the output of the second stage to provide a concatenated input for the second deconvolution.
  - 6. The method of claim 1, wherein pooling the plurality of subregion representations comprises:
    - calculating a large image block at a first level of coarseness; and
      
      calculating a small image block at a second level of coarseness.
  - 7. The method of claim 6, wherein the plurality of subregion representations comprises a pyramid of blocks having varying objects or varying detail levels.
  - 8. The method of claim 1, further comprising:
    - training the second convolutional neural network using the output of the fifth stage and a ground truth data set.
  - 9. The method of claim 8, wherein the ground truth data set includes a plurality of predetermined object categories.
  - 10. The method of claim 1, further comprising:
    - sending the object category to a vehicle system.
  - 11. The method of claim 10, wherein the vehicle system provides navigation in response to the object category.
  - 12. The method of claim 10, wherein the vehicle system provides assisted or autonomous driving in response to the object category.
  - 13. The method of claim 1, further comprising:
    - upsampling the output of the fifth stage to match a resolution of the camera-captured image.
  - 14. The method of claim 1, further comprising:
    - inserting padding values in between at least row or at least one column in the output of the first stage or the output of the second stage comprises, wherein the padding values and the output of the first stage or the output of the second stage are applied to the at least one deconvolution.
  - 15. The method of claim 14, wherein the performing the at least one convolution of an output of the second stage further comprises:
    - performing, at the third stage, a first third stage convolution including a set of weights;
      
      performing, at the third stage, a second third stage convolution from an output of the first third stage convolution and initialized using the set of weights from the first third stage convolution and defined before the second third stage convolution is performed.
  - 16. The method of claim 1, further comprising:
    - expanding dimensions of a filter from the fifth stage; and
      
      performing a final stage convolution on the output of the fifth stage using the expanded dimensions of the filter from the fifth stage.
  - 17. The method of claim 1, wherein the deep neural network pipeline includes a plurality of paths including:
    - a first prong from the first stage through the fourth stage for low level features and shallow layers; and
      
      a second prong from the first stage through the second stage, the third stage, and the fourth stage for upsampling.
  - 18. The method of claim 17, wherein the plurality of paths includes:
    - a third prong from the first stage through the second stage, the third stage, and the fifth stage for pyramidal pooling.

19. An apparatus comprising:
- a first convolution module configured to apply a first convolution to the at least a portion of a camera-captured image;
  
  a second convolution module configured to pool a plurality of subregion representations from an output of the first convolutional module;
  
  a first deconvolution module configured to perform at least one deconvolution from the output of the first convolution module;
  
  a second deconvolution module configured to perform at least one deconvolution from the output of the second convolution module;
  
  a third convolution module configured to apply a second convolution of an output of the first deconvolution module concatenated with an output of the second deconvolution module,wherein the at least the portion of the camera-captured image is classified in response to an output of the third convolution module.

20. A non-transitory computer readable medium including instructions that when executed by a process are configured to:
- Identify at least a portion of an image;
  
  applying a first convolutional neural network to the at least the portion of the image at a first stage;
  
  pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage;
  
  performing, at a third stage, at least one convolution of an output of the second stage;
  
  performing, at a fourth stage, at least one deconvolution from the output of the first stage or the output of the second stage;
  
  concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage;
  
  applying a second convolutional neural network to the output of the fifth stage; and
  
  classifying the at least the portion of the image as an object category in response to an output of the second convolutional neural network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
HERE Global B.V. (Nokia Corporation)
Original Assignee
HERE Global B.V. (Nokia Corporation)
Inventors
Biswas, Souham, Boddhu, Sanjay Kumar

Granted Patent

US 11,600,006 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G05D 1/0246   using a video camera in com...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 5/046   Forward inferencing; Produc...

G06T 2207/10024   Color image

G06T 2207/20081   Training; Learning

G06T 2207/20084   Artificial neural networks ...

G06T 2207/30252   Vehicle exterior; Vicinity ...

G06T 7/11   Region-based segmentation

G06T 7/187   involving region growing; i...

G06V 10/267   by performing operations on...

G06V 10/82   using neural networks

DEEP NEURAL NETWORK ARCHITECTURE FOR IMAGE SEGMENTATION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

36 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DEEP NEURAL NETWORK ARCHITECTURE FOR IMAGE SEGMENTATION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links