Automatic 2D-to-stereoscopic video conversion
First Claim
1. A method for generating a stereoscopic view from a 2D image of a scene, comprising using a computing device to perform steps for:
- receiving a set of arbitrary images having known per-pixel depth information for one or more features in each of the arbitrary images;
receiving one or more input images;
for each of the input images, evaluating the set of arbitrary images to find one or more candidate images that are similar to each corresponding input image;
estimating a per-pixel depth for each of the input images using the per-pixel depth information for features of the corresponding candidate images; and
for each input image, synthesizing at least one new view corresponding to each input image from a combination of the input image and the corresponding estimated per-pixel depth information.
2 Assignments
0 Petitions
Accused Products
Abstract
In general, a “Stereoscopic Video Converter” (SVC) provides various techniques for automatically converting arbitrary 2D video sequences into perceptually plausible stereoscopic or “3D” versions while optionally generating dense depth maps for every frame of the video sequence. In particular, the automated 2D-to-3D conversion process first automatically estimates scene depth for each frame of an input video sequence via a label transfer process that matches features extracted from those frames with features from a database of images and videos having known ground truth depths. The estimated depth distributions for all image frames of the input video sequence are then used by the SVC for automatically generating a “right view” of a corresponding stereoscopic image for each frame (assuming that each original input frame represents the “left view” of the stereoscopic image).
-
Citations
20 Claims
-
1. A method for generating a stereoscopic view from a 2D image of a scene, comprising using a computing device to perform steps for:
-
receiving a set of arbitrary images having known per-pixel depth information for one or more features in each of the arbitrary images; receiving one or more input images; for each of the input images, evaluating the set of arbitrary images to find one or more candidate images that are similar to each corresponding input image; estimating a per-pixel depth for each of the input images using the per-pixel depth information for features of the corresponding candidate images; and for each input image, synthesizing at least one new view corresponding to each input image from a combination of the input image and the corresponding estimated per-pixel depth information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented process for generating a perceptually plausible 3D video sequence from a 2D video sequence image of a scene, comprising:
-
using a computer to perform process actions for; receiving a set of arbitrary images having known per-pixel depth information; receiving a 2D video sequence of a scene comprising two or more sequential input images; evaluating the set of set of arbitrary images to identify one or more candidate images that are similar to each corresponding input image; estimating a per-pixel depth for each input image from the per-pixel depth information of the corresponding candidate images by warping each candidate image to each corresponding input image, and using corresponding per-pixel depth information of the warped candidate images to infer the estimated per-pixel depth for each input image; synthesizing at least one new view corresponding to each input image from a combination of the input image and the corresponding estimated per-pixel depth; and constructing a perceptually plausible 3D video sequence corresponding to the 2D video sequence from each input image and each corresponding synthesized view. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer-readable storage device having computer executable instructions stored therein for generating perceptually plausible stereoscopic views of a 2D scene, said instructions comprising:
-
receiving a set of arbitrary images having known per-pixel depth information; receiving a 2D video sequence of a scene comprising two or more sequential input images; evaluating the set of set of arbitrary images to identify one or more candidate images that are similar to each corresponding input image; estimating a per-pixel depth for each input image from the per-pixel depth information of the corresponding candidate images by warping each candidate image to each corresponding input image, and using corresponding per-pixel depth information of the warped candidate images to infer the estimated per-pixel depth for each input image; synthesizing at least one new view corresponding to each input image from a combination of the input image and the corresponding estimated per-pixel depth; and constructing a perceptually plausible 3D version of the 2D video sequence from each input image and each corresponding synthesized view. - View Dependent Claims (17, 18, 19, 20)
-
Specification