Interactive viewpoint video system and process
First Claim
1. A computer-implemented process for generating an interactive viewpoint video, comprising using a computer to perform the following process actions:
- inputting a plurality of synchronized video streams each depicting a portion of the same scene and calibration data defining geometric and photometric parameters associated with each video stream; and
for each group of contemporaneous frames from the synchronized video streams,generating a 3D reconstruction of the scene,using the reconstruction to compute a disparity map for each frame in the group of contemporaneous frames, andfor each frame in the group of contemporaneous frames,identifying areas of significant depth discontinuities based on its disparity map,generating a main layer comprising pixel information associated with areas in a frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from areas having depth discontinuities above the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold, to produce a layered representation for the frame under consideration.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and process for generating, and then rendering and displaying, an interactive viewpoint video in which a user can watch a dynamic scene while manipulating (freezing, slowing down, or reversing) time and changing the viewpoint at will. In general, the interactive viewpoint video is generated using a small number of cameras to capture multiple video streams. A multi-view 3D reconstruction and matting technique is employed to create a layered representation of the video frames that enables both efficient compression and interactive playback of the captured dynamic scene, while at the same time allowing for real-time rendering.
131 Citations
37 Claims
-
1. A computer-implemented process for generating an interactive viewpoint video, comprising using a computer to perform the following process actions:
-
inputting a plurality of synchronized video streams each depicting a portion of the same scene and calibration data defining geometric and photometric parameters associated with each video stream; and for each group of contemporaneous frames from the synchronized video streams, generating a 3D reconstruction of the scene, using the reconstruction to compute a disparity map for each frame in the group of contemporaneous frames, and for each frame in the group of contemporaneous frames, identifying areas of significant depth discontinuities based on its disparity map, generating a main layer comprising pixel information associated with areas in a frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from areas having depth discontinuities above the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold, to produce a layered representation for the frame under consideration. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for generating an interactive viewpoint video, comprising:
-
a video capture sub-system comprising, a plurality of video cameras for capturing multiple video streams, synchronization equipment for synchronizing the video streams to create a sequence of groups of contemporaneously captured video frames each depicting a portion of the same scene, one or more general purpose computing devices; a first computer program having program modules executable by at least one of said one or more general purpose computing devices, said modules comprising, a camera calibration module for computing geometric and photometric parameters associated with each video stream; and a second computer program having program modules executable by at least one of said one or more general purpose computing devices, said modules comprising, a 3D reconstruction module which generates a 3D reconstruction of the scene depicted each group of contemporaneous frames from the synchronized video streams, and which uses the reconstruction to compute a disparity map for each frame in the group of contemporaneous frames, a matting module which, for each frame in each group of contemporaneous frames, identifies areas of significant depth discontinuities based on the frame'"'"'s disparity map, a layered representation module which, for each frame in each group of contemporaneous frames, generates a main layer comprising pixel information associated with areas in a frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from pixels in areas having depth discontinuities exceeding the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold, to produce a layered representation for the frame under consideration. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A computer-implemented process for rendering an interactive viewpoint video from data comprising layered representations of video frames generated from sequential groups of contemporaneously captured video frames each depicting a portion of the same scene, and comprising calibration data comprising geometric parameters associated with the capture of each video frame, said process comprising using a computer to perform the following process actions for each frame of the interactive viewpoint video to be rendered:
-
identifying a current user-specified viewpoint; identifying the frame or frames from a group of contemporaneously captured frames corresponding with a current temporal portion of the video being rendered that are needed to render the scene depicted therein from the identified viewpoint; inputting the layered representations of the identified video frame or frames, wherein the layered representation of each input frame comprises a main layer comprising pixel information associated with areas in the frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from areas of depth discontinuities above the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold; and rendering the frame of the interactive viewpoint video from the viewpoint currently specified by the user using the inputted layered frame representations. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35)
-
-
36. A system for rendering and displaying an interactive viewpoint video using data comprising layered representations of video frames generated from sequential groups of contemporaneously captured video frames each depicting a portion of the same scene, and comprising calibration data defining geometric parameters associated with the capture of each video frame, said system comprising:
-
a user interface sub-system for inputting user viewpoint selections and displaying rendered interactive viewpoint video frames to the user, comprising, an input device employed by the user to input viewpoint selections, a display device for displaying the rendered interactive viewpoint video frames to the user; a general purpose computing device; a computer program having program modules executable by the general purpose computing device, said modules comprising, a selective decoding module which decodes specified data associated with the layered representations of video frames for each frame of the interactive viewpoint video to be rendered and displayed, wherein the layered representation of each video frame comprises a main layer comprising pixel information associated with areas in the frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from areas of depth discontinuities above the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold, a rendering module which for each frame of the interactive viewpoint video being rendered and displayed, identifies the current user-selected viewpoint; specifies to the selective decoding module which frame or frames from a group of contemporaneously captured frames corresponding with a current temporal portion of the video being rendered and displayed are needed to render the scene depicted therein from the identified viewpoint; obtains the decoded frame data from the selective decoding module; and renders the frame of the interactive viewpoint video from the viewpoint currently selected by the user using the decoded frame data. - View Dependent Claims (37)
-
Specification