Joint depth estimation and semantic segmentation from a single image

US 10,019,657 B2
Filed: 05/28/2015
Issued: 07/10/2018
Est. Priority Date: 05/28/2015
Status: Active Grant

First Claim

Patent Images

1. A method of performing joint depth estimation and semantic labeling of an image by one or more computing devices, the method comprising:

estimating global semantic and depth layouts of a scene of the image through machine learning by the one or more computing devices;

estimating local semantic and depth layouts for respective ones of a plurality of segments of the scene of the image through machine learning by the one or more computing devices, the local depth layouts including relative depth values for an individual pixel in the image representing a depth of the individual pixel in relation to other pixels; and

merging the estimated global semantic and depth layouts with the estimated local semantic and depth layouts by the one or more computing devices to semantically label and assign a depth value to the individual pixels in the image.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Joint depth estimation and semantic labeling techniques usable for processing of a single image are described. In one or more implementations, global semantic and depth layouts are estimated of a scene of the image through machine learning by the one or more computing devices. Local semantic and depth layouts are also estimated for respective ones of a plurality of segments of the scene of the image through machine learning by the one or more computing devices. The estimated global semantic and depth layouts are merged with the local semantic and depth layouts by the one or more computing devices to semantically label and assign a depth value to individual pixels in the image.

Citations

20 Claims

1. A method of performing joint depth estimation and semantic labeling of an image by one or more computing devices, the method comprising:
- estimating global semantic and depth layouts of a scene of the image through machine learning by the one or more computing devices;
  
  estimating local semantic and depth layouts for respective ones of a plurality of segments of the scene of the image through machine learning by the one or more computing devices, the local depth layouts including relative depth values for an individual pixel in the image representing a depth of the individual pixel in relation to other pixels; and
  
  merging the estimated global semantic and depth layouts with the estimated local semantic and depth layouts by the one or more computing devices to semantically label and assign a depth value to the individual pixels in the image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 20)
- - 2. A method as described in claim 1, wherein the estimating of the global semantic and depth layouts is performed as a template classification problem by selecting one or more of a plurality of global templates having corresponding global semantic and depth layouts as corresponding to the scene of the image.
  - 3. A method as described in claim 2, wherein the selecting is performed using a plurality of the global templates in combination to perform the estimating of the global semantic and depth layout of the scene of the image.
  - 4. A method as described in claim 2, further comprising generating the plurality of global templates by using a kernel k-means with a distance from a semantic label ground truth and depth ground truth that are associated with each training image in a dataset.
  - 5. A method as described in claim 1, wherein the estimating of the global semantic and depth layouts is performed through the machine learning by learning a model that directly predicts global semantic and depth layouts of the scene such that each pixel in the image has a corresponding semantic label and depth value.
  - 6. A method as described in claim 1, wherein the estimated global depth layout of the scene assigns a respective absolute distance to a plurality of pixels in the image.
  - 7. A method as described in claim 1, wherein the estimating of the local semantic and depth layout is performed as a template classification problem by selecting one or more of a plurality of local templates having corresponding local semantic and depth layouts as corresponding to the image.
  - 8. A method as described in claim 1, wherein the machine learning is performed using a convolutional neural network (CNN) or support vector machine (SVM).
  - 9. A method as described in claim 1, wherein the merging is performed to generate the depth map using absolute distance values estimated by the global depth layout and relative depth values estimated by the local depth layout.
  - 10. A method as described in claim 1, wherein the estimating of the global semantic and depth layouts, the estimating of the local semantic and depth layouts, and the merging are performed to jointly calculate the semantic values and depth labels to the pixels of the image.
  - 11. A method as described in claim 1, wherein the merging includes smoothing the semantically labels depth values assigned to individual pixels in the image.
  - 20. A method as described in claim 1, wherein the merging is performed to generate a semantically labeled image by combining the semantic predictions of the local semantic layouts with the global semantic layouts.

12. A system comprising:
- one or more computing devices implemented at least partially in hardware, the one or more computing devices configured to perform operations comprising;
  
  estimating global semantic and depth layouts of a scene of an image through machine learning;
  
  decomposing the image into a plurality of segments;
  
  guiding a prediction of local semantic and depth layout of individual ones of the plurality of segments using the estimated global and semantic depth layouts of the scene, the local depth layout including relative depth values for an individual pixel in the image representing a depth of the individual pixel in relation to other pixels; and
  
  jointly forming a semantically-labeled version of the image in which the individual pixels are assigned a semantic label and a depth map of the image in which the individual pixels are assigned a depth value.
- View Dependent Claims (13, 14, 15)
- - 13. A system as described in claim 12, wherein the decomposing is performed by maintaining semantic region boundaries.
  - 14. A system as described in claim 13, wherein the maintaining is performed to consider information from appearance, semantic edges, or spatial information.
  - 15. A system as described in claim 12, wherein:
    - the estimating of the global semantic and depth layouts is performed as a template classification problem by selecting one or more of a plurality of global templates having corresponding global semantic and depth layouts as corresponding to the scene of the image; and
      
      the prediction of the local semantic and depth layout of the individual ones of the plurality of segments includes estimating the local semantic and depth layout through machine learning.

16. A system comprising:
- a global determination module implemented at least partially in hardware, the global determination module for estimating global semantic and depth layouts of a scene of an image through machine learning;
  
  a local determination module implemented at least partially in hardware, the local determination module for estimating local semantic and depth layouts for respective ones of a plurality of segments of the scene of the image through machine learning, the local depth layouts including relative depth values for an individual pixel in the image representing a depth of the individual pixel in relation to other pixels; and
  
  a merge calculation module implemented at least partially in hardware, the merge calculation module for merging the estimated global semantic and depth layouts with the local semantic and depth layouts to semantically label and assign a depth value to the individual pixels in the image.
- View Dependent Claims (17, 18, 19)
- - 17. A system as described in claim 16, wherein the global determination module is configured to estimate the global semantic and depth layouts as a template classification problem by selecting one or more of a plurality of global templates having corresponding global semantic and depth layouts as corresponding to the scene of the image.
  - 18. A system as described in claim 16, wherein the global determination module is configured to estimate the global semantic and depth layouts is performed through the machine learning by learning a model that directly predicts the global semantic and depth layouts of the scene such that each pixel in the image has a corresponding semantic label and depth value.
  - 19. A system as described in claim 16, wherein the estimating of the local semantic and depth layouts is performed as a template classification problem by selecting one or more of a plurality of local templates having corresponding local semantic and depth layouts as corresponding to the image.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Lin, Zhe, Cohen, Scott D., Wang, Peng, Shen, Xiaohui, Price, Brian L.
Primary Examiner(s)
Park, Chan
Assistant Examiner(s)
RICE, ELISA M

Application Number

US14/724,660
Publication Number

US 20160350930A1
Time in Patent Office

1,139 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/24137   Distances to cluster centroïds

G06N 20/10   using kernel methods, e.g. ...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 7/01   Probabilistic graphical mod...

G06T 2207/20081   Training; Learning

G06T 7/50   Depth or shape recovery

G06V 10/454   Integrating the filters int...

G06V 10/82   using neural networks

G06V 20/10   Terrestrial scenes scenes u...

G06V 20/70   Labelling scene content, e....

G06V 30/19173   Classification techniques

Joint depth estimation and semantic segmentation from a single image

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Joint depth estimation and semantic segmentation from a single image

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links