Automatic 2D-to-stereoscopic video conversion

US 9,414,048 B2
Filed: 12/09/2011
Issued: 08/09/2016
Est. Priority Date: 12/09/2011
Status: Active Grant

First Claim

Patent Images

1. A method for generating a stereoscopic view from a 2D image of a scene, comprising using a computing device to perform steps for:

receiving a set of arbitrary images having known per-pixel depth information for one or more features in each of the arbitrary images;

receiving one or more input images;

for each of the input images, evaluating the set of arbitrary images to find one or more candidate images that are similar to each corresponding input image;

estimating a per-pixel depth for each of the input images using the per-pixel depth information for features of the corresponding candidate images; and

for each input image, synthesizing at least one new view corresponding to each input image from a combination of the input image and the corresponding estimated per-pixel depth information.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In general, a “Stereoscopic Video Converter” (SVC) provides various techniques for automatically converting arbitrary 2D video sequences into perceptually plausible stereoscopic or “3D” versions while optionally generating dense depth maps for every frame of the video sequence. In particular, the automated 2D-to-3D conversion process first automatically estimates scene depth for each frame of an input video sequence via a label transfer process that matches features extracted from those frames with features from a database of images and videos having known ground truth depths. The estimated depth distributions for all image frames of the input video sequence are then used by the SVC for automatically generating a “right view” of a corresponding stereoscopic image for each frame (assuming that each original input frame represents the “left view” of the stereoscopic image).

Citations

20 Claims

1. A method for generating a stereoscopic view from a 2D image of a scene, comprising using a computing device to perform steps for:
- receiving a set of arbitrary images having known per-pixel depth information for one or more features in each of the arbitrary images;
  
  receiving one or more input images;
  
  for each of the input images, evaluating the set of arbitrary images to find one or more candidate images that are similar to each corresponding input image;
  
  estimating a per-pixel depth for each of the input images using the per-pixel depth information for features of the corresponding candidate images; and
  
  for each input image, synthesizing at least one new view corresponding to each input image from a combination of the input image and the corresponding estimated per-pixel depth information.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further comprising steps for constructing a stereoscopic view for each input image from each input image and each corresponding synthesized view.
  - 3. The method of claim 1 wherein estimating a per-pixel depth for each of the input images further comprises warping each candidate image to each corresponding input image, and using corresponding per-pixel depth information of the warped candidate images to infer the estimated per-pixel depth for each input image.
  - 4. The method of claim 3 wherein inferring the estimated per-pixel depth for each input image further comprises an iterative probabilistic optimization process that is initialized using a per-pixel average depth computed from the set of arbitrary images.
  - 5. The method of claim 1 wherein synthesizing at least one new view corresponding to each input image further comprises considering each input image as a left view and synthesizing a corresponding right view for each input image by warping the left view into the right view using the estimated depth information of the input image.
  - 6. The method of claim 1 wherein synthesizing at least one new view corresponding to each input image further comprises synthesizing a left view and a right view for each corresponding input image.
  - 7. The method of claim 1 wherein synthesizing at least one new view corresponding to each input image further comprises steps for:
    - inverting the estimated depth information for each input image to generate a disparity for each input image;
      
      identifying boundaries of salient regions in each input image using edge detection techniques; and
      
      selectively warping salient regions of each input image based on the estimated per-pixel depth information to synthesize the at least one new view.
  - 8. The method of claim 5 wherein synthesizing the right view for each left view further comprises steps for:
    - inverting the estimated depth information for each input image to generate a disparity for each input image;
      
      extracting one or more texture patches from each input image and synthesizing the right view such that on a local texture level, the synthesized right view looks similar to one or more of the extracted textures.
  - 9. The system of claim 5 wherein synthesizing the right view for each left view further comprises steps for performing a joint estimation of depth and stereo that concurrently optimizes a highest probability right view given the corresponding left view and the estimated per-pixel depth in combination with a highest probability per-pixel depth given the given the corresponding left image.

10. A computer-implemented process for generating a perceptually plausible 3D video sequence from a 2D video sequence image of a scene, comprising:
- using a computer to perform process actions for;
  
  receiving a set of arbitrary images having known per-pixel depth information;
  
  receiving a 2D video sequence of a scene comprising two or more sequential input images;
  
  evaluating the set of set of arbitrary images to identify one or more candidate images that are similar to each corresponding input image;
  
  estimating a per-pixel depth for each input image from the per-pixel depth information of the corresponding candidate images by warping each candidate image to each corresponding input image, and using corresponding per-pixel depth information of the warped candidate images to infer the estimated per-pixel depth for each input image;
  
  synthesizing at least one new view corresponding to each input image from a combination of the input image and the corresponding estimated per-pixel depth; and
  
  constructing a perceptually plausible 3D video sequence corresponding to the 2D video sequence from each input image and each corresponding synthesized view.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The computer-implemented process of claim 10 wherein inferring the estimated per-pixel depth for each input image further comprises an iterative probabilistic optimization process that is initialized using a per-pixel average depth computed from the set of arbitrary images.
  - 12. The computer-implemented process of claim 10 wherein synthesizing at least one new view corresponding to each input image further comprises considering each input image as a left view and synthesizing a corresponding right view for each input image by warping the left view into the right view using the estimated depth information of the input image.
  - 13. The computer-implemented process of claim 10 wherein synthesizing at least one new view corresponding to each input image further comprises synthesizing a left view and a right view for each corresponding input image.
  - 14. The computer-implemented process of claim 10 wherein synthesizing at least one new view corresponding to each input image further comprises:
    - inverting the estimated depth information for each input image to generate a disparity for each input image;
      
      identifying boundaries of salient regions in each input image using edge detection techniques; and
      
      selectively warping salient regions of each input image based on the estimated per-pixel depth information to synthesize the at least one new view.
  - 15. The computer-implemented process of claim 11 wherein synthesizing the right view for each left view further comprises:
    - inverting the estimated depth information for each input image to generate a disparity for each input image;
      
      extracting one or more texture patches from each input image and synthesizing the right view such that on a local texture level, the synthesized right view looks similar to one or more of the extracted textures.

16. A computer-readable storage device having computer executable instructions stored therein for generating perceptually plausible stereoscopic views of a 2D scene, said instructions comprising:
- receiving a set of arbitrary images having known per-pixel depth information;
  
  receiving a 2D video sequence of a scene comprising two or more sequential input images;
  
  evaluating the set of set of arbitrary images to identify one or more candidate images that are similar to each corresponding input image;
  
  estimating a per-pixel depth for each input image from the per-pixel depth information of the corresponding candidate images by warping each candidate image to each corresponding input image, and using corresponding per-pixel depth information of the warped candidate images to infer the estimated per-pixel depth for each input image;
  
  synthesizing at least one new view corresponding to each input image from a combination of the input image and the corresponding estimated per-pixel depth; and
  
  constructing a perceptually plausible 3D version of the 2D video sequence from each input image and each corresponding synthesized view.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-readable storage device of claim 16 wherein inferring the estimated per-pixel depth for each input image further comprises an iterative probabilistic optimization process that is initialized using a per-pixel average depth computed from the set of arbitrary images.
  - 18. The computer-readable storage device of claim 16 wherein synthesizing at least one new view corresponding to each input image further comprises considering each input image as a left view and synthesizing a corresponding right view for each input image by warping the left view into the right view using the estimated depth information of the input image.
  - 19. The computer-readable storage device of claim 16 wherein synthesizing at least one new view corresponding to each input image further comprises:
    - inverting the estimated depth information for each input image to generate a disparity for each input image;
      
      identifying boundaries of salient regions in each input image using edge detection techniques; and
      
      selectively warping salient regions of each input image based on the estimated per-pixel depth information to synthesize the at least one new view.
  - 20. The computer-readable storage device of claim 16 wherein synthesizing the right view for each left view further comprises:
    - inverting the estimated depth information for each input image to generate a disparity for each input image;
      
      extracting one or more texture patches from each input image and synthesizing the right view such that on a local texture level, the synthesized right view looks similar to one or more of the extracted textures.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Karsch, Kevin Robert, Liu, Ce, Kang, Sing Bing
Primary Examiner(s)
DIEP, NHON THANH

Application Number

US13/315,488
Publication Number

US 20130147911A1
Time in Patent Office

1,705 Days
Field of Search
US Class Current

1/1
CPC Class Codes

H04N 13/261 with monoscopic-to-stereosc...

Automatic 2D-to-stereoscopic video conversion

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic 2D-to-stereoscopic video conversion

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links