Joint Depth Estimation and Semantic Segmentation from a Single Image

US 20160350930A1
Filed: 05/28/2015
Published: 12/01/2016
Est. Priority Date: 05/28/2015
Status: Active Grant

First Claim

Patent Images

1. A method of performing joint depth estimation and semantic labeling of an image by one or more computing devices, the method comprising:

estimating global semantic and depth layouts of a scene of the image through machine learning by the one or more computing devices;

estimating local semantic and depth layouts for respective ones of a plurality of segments of the scene of the image through machine learning by the one or more computing devices; and

merging the estimated global semantic and depth layouts with the estimated local semantic and depth layouts by the one or more computing devices to semantically label and assign a depth value to individual pixels in the image.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Joint depth estimation and semantic labeling techniques usable for processing of a single image are described. In one or more implementations, global semantic and depth layouts are estimated of a scene of the image through machine learning by the one or more computing devices. Local semantic and depth layouts are also estimated for respective ones of a plurality of segments of the scene of the image through machine learning by the one or more computing devices. The estimated global semantic and depth layouts are merged with the local semantic and depth layouts by the one or more computing devices to semantically label and assign a depth value to individual pixels in the image.

100 Citations

View as Search Results

20 Claims

1. A method of performing joint depth estimation and semantic labeling of an image by one or more computing devices, the method comprising:
- estimating global semantic and depth layouts of a scene of the image through machine learning by the one or more computing devices;
  
  estimating local semantic and depth layouts for respective ones of a plurality of segments of the scene of the image through machine learning by the one or more computing devices; and
  
  merging the estimated global semantic and depth layouts with the estimated local semantic and depth layouts by the one or more computing devices to semantically label and assign a depth value to individual pixels in the image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method as described in claim 1, wherein the estimating of the global semantic and depth layouts is performed as a template classification problem by selecting one or more of a plurality of global templates having corresponding global semantic and depth layouts as corresponding to the scene of the image.
  - 3. A method as described in claim 2, wherein the selecting is performed using a plurality of the global templates in combination to perform the estimating of the global semantic and depth layout of the scene of the image.
  - 4. A method as described in claim 2, further comprising generating the plurality of global templates by using a kernel k-means with a distance from a semantic label ground truth and depth ground truth that are associated with each training image in a dataset.
  - 5. A method as described in claim 1, wherein the estimating of the global semantic and depth layouts is performed through the machine learning by learning a model that directly predicts global semantic and depth layouts of the scene such that each pixel in the image has a corresponding semantic label and depth value.
  - 6. A method as described in claim 1, wherein the estimated global depth layout of the scene assigns a respective absolute distance to a plurality of pixels in the image.
  - 7. A method as described in claim 1, wherein the estimating of the local semantic and depth layout is performed as a template classification problem by selecting one or more of a plurality of local templates having corresponding local semantic and depth layouts as corresponding to the image.
  - 8. A method as described in claim 1, wherein the estimated local depth layout of the scene assigns a respective relative distance to pixels in a respective said segment in the image, one to another.
  - 9. A method as described in claim 1, wherein the machine learning is performed using a convolutional neural network (CNN) or support vector machine (SVM).
  - 10. A method as described in claim 1, wherein the merging is performed to generate the depth map using absolute distance values estimated by the global depth layout and relative depth values estimated by the local depth layout.
  - 11. A method as described in claim 1, wherein the estimating of the global semantic and depth layouts, the estimating of the local semantic and depth layouts, and the merging are performed to jointly calculate the semantic values and depth labels to the pixels of the image.
  - 12. A method as described in claim 1, wherein the merging includes smoothing the semantically labels depth values assigned to individual pixels in the image.

13. A system comprising:
- one or more computing devices implemented at least partially in hardware, the one or more computing devices configured to perform operations comprising;
  
  estimating global semantic and depth layouts of a scene of an image through machine learning;
  
  decomposing the image into a plurality of segments;
  
  guiding a prediction of local semantic and depth layout of individual ones of the plurality of segments using the estimated global and semantic depth layouts of the scene; and
  
  jointly forming a semantically-labeled version of the image in which individual pixels are assigned a semantic label and a depth map of the image in which individual pixels are assigned a depth value.
- View Dependent Claims (14, 15, 16)
- - 14. A system as described in claim 13, wherein the decomposing is performed by maintaining semantic region boundaries.
  - 15. A system as described in claim 14, wherein the maintaining is performed to consider information from appearance, semantic edges, or spatial information.
  - 16. A system as described in claim 13, wherein:
    - the estimating of the global semantic and depth layouts is performed as a template classification problem by selecting one or more of a plurality of global templates having corresponding global semantic and depth layouts as corresponding to the scene of the image; and
      
      the prediction of the local semantic and depth layout of the individual ones of the plurality of segments includes estimating the local semantic and depth layout through machine learning.

17. A system comprising:
- a global determination module implemented at least partially in hardware, the global determination module configured to estimate global semantic and depth layouts of a scene of an image through machine learning;
  
  a local determination module implemented at least partially in hardware, the local determination module configured to estimate local semantic and depth layouts for respective ones of a plurality of segments of the scene of the image through machine learning; and
  
  a merge calculation module configured to merge the estimated global semantic and depth layouts with the local semantic and depth layouts to semantically label and assign a depth value to individual pixels in the image.
- View Dependent Claims (18, 19, 20)
- - 18. A system as described in claim 17, wherein the global determination module is configured to estimate the global semantic and depth layouts as a template classification problem by selecting one or more of a plurality of global templates having corresponding global semantic and depth layouts as corresponding to the scene of the image.
  - 19. A system as described in claim 17, wherein the global determination module is configured to estimate the global semantic and depth layouts is performed through the machine learning by learning a model that directly predicts the global semantic and depth layouts of the scene such that each pixel in the image has a corresponding semantic label and depth value.
  - 20. A system as described in claim 17, wherein the estimating of the local semantic and depth layouts is performed as a template classification problem by selecting one or more of a plurality of local templates having corresponding local semantic and depth layouts as corresponding to the image.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Lin, Zhe, Cohen, Scott D., Wang, Peng, Shen, Xiaohui, Price, Brian L.

Granted Patent

US 10,019,657 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 18/24137   Distances to cluster centroïds

G06N 20/10   using kernel methods, e.g. ...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 7/01   Probabilistic graphical mod...

G06T 2207/20081   Training; Learning

G06T 7/50   Depth or shape recovery

G06V 10/454   Integrating the filters int...

G06V 10/82   using neural networks

G06V 20/10   Terrestrial scenes scenes u...

G06V 20/70   Labelling scene content, e....

G06V 30/19173   Classification techniques

Joint Depth Estimation and Semantic Segmentation from a Single Image

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

100 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Joint Depth Estimation and Semantic Segmentation from a Single Image

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

100 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links