Interactive viewpoint video system and process

US 7,292,257 B2
Filed: 06/28/2004
Issued: 11/06/2007
Est. Priority Date: 06/28/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented process for generating an interactive viewpoint video, comprising using a computer to perform the following process actions:

inputting a plurality of synchronized video streams each depicting a portion of the same scene and calibration data defining geometric and photometric parameters associated with each video stream; and

for each group of contemporaneous frames from the synchronized video streams,generating a 3D reconstruction of the scene,using the reconstruction to compute a disparity map for each frame in the group of contemporaneous frames, andfor each frame in the group of contemporaneous frames,identifying areas of significant depth discontinuities based on its disparity map,generating a main layer comprising pixel information associated with areas in a frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from areas having depth discontinuities above the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold, to produce a layered representation for the frame under consideration.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and process for generating, and then rendering and displaying, an interactive viewpoint video in which a user can watch a dynamic scene while manipulating (freezing, slowing down, or reversing) time and changing the viewpoint at will. In general, the interactive viewpoint video is generated using a small number of cameras to capture multiple video streams. A multi-view 3D reconstruction and matting technique is employed to create a layered representation of the video frames that enables both efficient compression and interactive playback of the captured dynamic scene, while at the same time allowing for real-time rendering.

131 Citations

37 Claims

1. A computer-implemented process for generating an interactive viewpoint video, comprising using a computer to perform the following process actions:
- inputting a plurality of synchronized video streams each depicting a portion of the same scene and calibration data defining geometric and photometric parameters associated with each video stream; and
  
  for each group of contemporaneous frames from the synchronized video streams,generating a 3D reconstruction of the scene,using the reconstruction to compute a disparity map for each frame in the group of contemporaneous frames, andfor each frame in the group of contemporaneous frames,identifying areas of significant depth discontinuities based on its disparity map,generating a main layer comprising pixel information associated with areas in a frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from areas having depth discontinuities above the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold, to produce a layered representation for the frame under consideration.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The process of claim 1, further comprising the process actions of:
    - using the reconstruction to compute correspondences across all camera views; and
      
      balancing the photometric parameters of each group of frames once the correspondences are computed.
  - 3. The process of claim 1, further comprising a process action of compressing the layered representations generated for the frames of the interactive viewpoint video to facilitate transfer and/or storage of the video.
  - 4. The process of claim 3, wherein the process action of compressing the layered representations comprises using temporal compression techniques between the contemporaneous groups of interactive viewpoint video frames.
  - 5. The process of claim 4, wherein the process action of compressing the layered representations comprises using spatial compression techniques between the interactive viewpoint video frames in the same contemporaneous group of frames.
  - 6. The process of claim 3, wherein the process action of compressing the layered representations comprises using spatial compression techniques between the interactive viewpoint video frames in the same contemporaneous group of frames.
  - 7. The process of claim 1, further comprising a process action of generating an interactive viewpoint video file comprising the layered representations generated from frames of the inputted video streams and said calibration data.
  - 8. The process of claim 1, wherein the plurality of synchronized video streams and calibration data is derived from image frames captured by a plurality of video cameras.
  - 9. The process of claim 1, wherein the plurality of synchronized video streams and calibration data is computer generated.
  - 10. The process of claim 1, wherein the process action of generating a 3D reconstruction of the scene comprises an action of employing a segmentation-based reconstruction technique.
  - 11. The process of claim 1, wherein the process action of generating the main layer comprises an action of establishing the color and depth of each pixel in the layer, and wherein the process action of generating the boundary layer comprises an action of establishing the color, depth and opacity of each pixel in the layer.
  - 12. The process of claim 1, wherein the process action of generating the boundary layer comprises an action of dilating the layer to encompass a prescribed number of pixels adjoining the pixels exhibiting depth discontinuities that exceed the threshold.
  - 13. A computer-readable medium having computer-executable instructions for performing the process actions recited in claim 1.

14. A system for generating an interactive viewpoint video, comprising:
- a video capture sub-system comprising,a plurality of video cameras for capturing multiple video streams,synchronization equipment for synchronizing the video streams to create a sequence of groups of contemporaneously captured video frames each depicting a portion of the same scene,one or more general purpose computing devices;
  
  a first computer program having program modules executable by at least one of said one or more general purpose computing devices, said modules comprising,a camera calibration module for computing geometric and photometric parameters associated with each video stream; and
  
  a second computer program having program modules executable by at least one of said one or more general purpose computing devices, said modules comprising,a 3D reconstruction module which generates a 3D reconstruction of the scene depicted each group of contemporaneous frames from the synchronized video streams, and which uses the reconstruction to compute a disparity map for each frame in the group of contemporaneous frames,a matting module which, for each frame in each group of contemporaneous frames, identifies areas of significant depth discontinuities based on the frame'"'"'s disparity map,a layered representation module which, for each frame in each group of contemporaneous frames, generates a main layer comprising pixel information associated with areas in a frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from pixels in areas having depth discontinuities exceeding the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold, to produce a layered representation for the frame under consideration.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 15. The system of claim 14, wherein the plurality of video cameras are arranged in a side-by-side manner such that each camera views a scene from a different viewpoint.
  - 16. The system of claim 15, wherein a field of view of each camera overlaps the field of view of any adjacent camera by a prescribed amount.
  - 17. The system of claim 15, wherein the distance, height and horizontal placement of the viewpoint of each camera in relation to a target object or area of the scene is established so as to form a prescribed path connecting the viewpoints of the cameras.
  - 18. The system of claim 17, wherein the prescribed path is a substantially horizontal arc.
  - 19. The system of claim 17, wherein the prescribed path is a substantially vertical arc.
  - 20. The system of claim 17, wherein the prescribed path is a substantially horizontal arc which sweeps upward from one end to the other.
  - 21. The system of claim 14, wherein one or more of the video cameras are high resolution cameras.
  - 22. The system of claim 14, wherein the video cameras comprise a genlock feature.
  - 23. The system of claim 14, wherein each video camera adds metadata to each video frame it generates, where said metadata comprises the current camera settings and exposure level of the camera, and a timestamp.
  - 24. The system of claim 14, wherein the cameras are of a type that are controlled remotely via said more of more computing devices, and wherein the system further comprises a third computer program computer having program modules executable by at least one of said one or more general purpose computing devices, wherein said modules comprise a video capture program module for controlling the plurality of video cameras to simultaneously turn them on or off, and to adjust their camera settings.
  - 25. The system of claim 14, wherein the video capture sub-system further comprises storage equipment for storing the video streams prior to processing.
  - 26. The system of claim 14, wherein the second computer program further comprises a compression program module for compressing the layered representations generated for the frames of the interactive viewpoint video to facilitate transfer and/or storage of the video.
  - 27. The system of claim 14, wherein the second computer program further comprises an interactive viewpoint video file generation program module for creating a file comprising the layered representations generated from frames of the inputted video streams and the output of the calibration module.

28. A computer-implemented process for rendering an interactive viewpoint video from data comprising layered representations of video frames generated from sequential groups of contemporaneously captured video frames each depicting a portion of the same scene, and comprising calibration data comprising geometric parameters associated with the capture of each video frame, said process comprising using a computer to perform the following process actions for each frame of the interactive viewpoint video to be rendered:
- identifying a current user-specified viewpoint;
  
  identifying the frame or frames from a group of contemporaneously captured frames corresponding with a current temporal portion of the video being rendered that are needed to render the scene depicted therein from the identified viewpoint;
  
  inputting the layered representations of the identified video frame or frames, wherein the layered representation of each input frame comprises a main layer comprising pixel information associated with areas in the frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from areas of depth discontinuities above the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold; and
  
  rendering the frame of the interactive viewpoint video from the viewpoint currently specified by the user using the inputted layered frame representations.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35)
- - 29. The process of claim 28, wherein the video frame data is compressed, and wherein process action of inputting the layered representations of the identified video frame or frames, comprises an action of decoding the portion of the video frame data necessary to obtain the layered representations of the identified video frame or frames.
  - 30. The process of claim 28, wherein the process action of identifying the frame or frames from a group of contemporaneously captured frames corresponding with a current temporal portion of the video being rendered that are needed to render the scene depicted therein from the identified viewpoint, comprises the actions of:
    - using the calibration data to determine the viewpoints associated with each of the video frames from which the layer representations were generated;
      
      whenever the identified viewpoint coincides with a viewpoint of one of the video frames from which the layer representations were generated, identifying that frame as the only frame needed to render the scene; and
      
      whenever the identified viewpoint falls between the viewpoints of two of the video frames from which the layer representations were generated, identifying both frames as the frames needed to render the scene.
  - 31. The process of claim 28, wherein the process action of rendering the frame of the interactive viewpoint video, comprises an action of generating an interactive viewpoint video frame from two input frames of the group of contemporaneously captured frames corresponding with a current temporal portion of the video being rendered using the calibration data associated therewith, whenever the identified viewpoint fails between the viewpoints associated with said two input frames.
  - 32. The process of claim 31, wherein the process action of generating an interactive viewpoint video frame from two input frames of the group of contemporaneously captured frames corresponding with a current temporal portion of the video being rendered, comprises the actions of:
    - for each of the two input frames in turn,projecting the main layer of the input frame under consideration into a view corresponding to the current user-specified viewpoint, andprojecting the boundary layer of the input frame under consideration into the view corresponding to the current user-specified viewpoint;
      
      blending the two resulting sets of projected layers to create a finalized frame of the interactive viewpoint video.
  - 33. The process of claim 32, wherein the process action of blending the two resulting sets of projected layers comprises an action of blending the projected layers such that the weight each is given is in direct proportion to how close the viewpoint associated with the input layer used to create the projected layer is to the current user-specified viewpoint.
  - 34. The process of claim 28, wherein the process action of rendering the frame of the interactive viewpoint video further comprises inserting an object not found in the inputted layered frame representations into the frame being rendered.
  - 35. A computer-readable medium having computer-executable instructions for performing the process actions recited in claim 28.

36. A system for rendering and displaying an interactive viewpoint video using data comprising layered representations of video frames generated from sequential groups of contemporaneously captured video frames each depicting a portion of the same scene, and comprising calibration data defining geometric parameters associated with the capture of each video frame, said system comprising:
- a user interface sub-system for inputting user viewpoint selections and displaying rendered interactive viewpoint video frames to the user, comprising,an input device employed by the user to input viewpoint selections,a display device for displaying the rendered interactive viewpoint video frames to the user;
  
  a general purpose computing device;
  
  a computer program having program modules executable by the general purpose computing device, said modules comprising,a selective decoding module which decodes specified data associated with the layered representations of video frames for each frame of the interactive viewpoint video to be rendered and displayed, wherein the layered representation of each video frame comprises a main layer comprising pixel information associated with areas in the frame that do not exhibit depth discontinuities exceeding a prescribed threshold and background pixel information from areas of depth discontinuities above the threshold, and a boundary layer comprising foreground pixel information associated with areas having depth discontinuities that exceed the threshold,a rendering module which for each frame of the interactive viewpoint video being rendered and displayed,identifies the current user-selected viewpoint;
  
  specifies to the selective decoding module which frame or frames from a group of contemporaneously captured frames corresponding with a current temporal portion of the video being rendered and displayed are needed to render the scene depicted therein from the identified viewpoint;
  
  obtains the decoded frame data from the selective decoding module; and
  
  renders the frame of the interactive viewpoint video from the viewpoint currently selected by the user using the decoded frame data.
- View Dependent Claims (37)
- - 37. The system of claim 36, wherein the user interface sub-system further comprises a graphic user interface that allows the user to graphically indicate the viewpoint, among the possible viewpoints, from which it is desired to view the scene.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Uyttendaele, Matthew, Kang, Sing Bing, Szeliski, Richard, Winder, Simon, Zitnick, Charles III
Primary Examiner(s)
YANG, RYAN R

Application Number

US10/880,774
Publication Number

US 20050285875A1
Time in Patent Office

1,226 Days
Field of Search

345/427, 345629-634, 345/419, 345/422, 382/154
US Class Current

345/629
CPC Class Codes

G06T 15/205   Image-based rendering

G06T 2207/10021   Stereoscopic video; Stereos...

G06T 7/557   from light fields, e.g. fro...

G06T 7/596   from three or more stereo i...

H04N 5/2627   for providing spin image ef...

Interactive viewpoint video system and process

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

131 Citations

37 Claims

Specification

Solutions

Use Cases

Quick Links

Interactive viewpoint video system and process

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

131 Citations

37 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links