Automatic Scene Modeling for the 3D Camera and 3D Video

US 20080246759A1
Filed: 02/23/2006
Published: 10/09/2008
Est. Priority Date: 02/23/2005
Status: Abandoned Application

First Claim

Patent Images

1. A method for automatically segmenting a sequence of two-dimensional digital images into a navigable 3D model, said method including:

a) capturing image sequences and defining nearer matte layers and/or depth maps based on proportionately greater lateral motion;

b) generating a wireframe surface for background and foreground objects from the raw video data which has been captured and processed in step (a);

c) giving depth to foreground objects using either;

silhouettes from different perspectives, center spines that protrude depthwise in proportion to the width up and down the object, or motion parallax information if available;

d) texture mapping the raw video onto the wireframe;

e) filling in occluded areas behind foreground objects, both on the background and on sides that are out of view, by stretching image edges in to the center of blank spots; and

f) sharpening surface images on nearer objects and blurring more distant images to create more depth perception, using either existing video software development kits or by writing image processing code that implements widely-known convolution masks, thereby automatically segmenting an image sequence into a 3D model.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Single-camera image processing methods are disclosed for 3D navigation within ordinary moving video. Along with color and brightness, XYZ coordinates can be defined for every pixel. The resulting geometric models can be used to obtain measurements from digital images, as an alternative to on-site surveying and equipment such as laser range-finders. Motion parallax is used to separate foreground objects from the background. This provides a convenient method for placing video elements within different backgrounds, for product placement, and for merging video elements with computer-aided design (CAD) models and point clouds from other sources. If home users can save video fly-throughs or specific 3D elements from video, this method provides an opportunity for proactive, branded media sharing. When this image processing is used with a videoconferencing camera, the user'"'"'s movements can automatically control the viewpoint, creating 3D hologram effects on ordinary televisions and computer screens.

Citations

68 Claims

1. A method for automatically segmenting a sequence of two-dimensional digital images into a navigable 3D model, said method including:
- a) capturing image sequences and defining nearer matte layers and/or depth maps based on proportionately greater lateral motion;
  
  b) generating a wireframe surface for background and foreground objects from the raw video data which has been captured and processed in step (a);
  
  c) giving depth to foreground objects using either;
  
  silhouettes from different perspectives, center spines that protrude depthwise in proportion to the width up and down the object, or motion parallax information if available;
  
  d) texture mapping the raw video onto the wireframe;
  
  e) filling in occluded areas behind foreground objects, both on the background and on sides that are out of view, by stretching image edges in to the center of blank spots; and
  
  f) sharpening surface images on nearer objects and blurring more distant images to create more depth perception, using either existing video software development kits or by writing image processing code that implements widely-known convolution masks, thereby automatically segmenting an image sequence into a 3D model.
- View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62)
- - 4. The method of generating 3D models as defined in claim 1, wherein foreground mattes are extracted automatically and placed in depth using motion parallax, with no manual intervention required to place targets or mark objects.
  - 5. The method of generating 3D models in claim 1, wherein a full 3D object can be generated from only 3 images, and partial shape and depth models can be developed from as few as 2 sequential or perspective images.
  - 6. The procedure for generating geometric shape from 2 or 3 images in claim 5, wherein motion parallax could be used in video where the object is rotated from one perspective to another (rather than bluescreen photography or manual background removal) to automatically extract mattes of a foreground object'"'"'s silhouettes in the different perspectives.
  - 7. The method of generating 3D models in claim 1, wherein the images used to generate the 3D points and depth map or wireframe, are then also texture-mapped onto the depth map or wireframe to create a photorealistic 3D model.
  - 8. The method of generating 3D models using motion parallax as defined in claim 1, based on a dynamic wireframe model that can change with the running video.
  - 9. The method of generating 3D models in claim 1, using sequences of images from both video and/or still cameras which do not need to be in defined positions.
  - 10. The method of generating 3D models in claim 1, wherein 3D models are generated automatically and only a single imaging device is required (although stereoscopy or multi-camera image capture can be used).
  - 11. The method of automatically generating a 3D scene from linear video in claim 1, whereby the XYZ coordinates for points in the 3D scene can be scaled to allow placement of additional static or moving objects in the scene, as might be done for product placement.
  - 12. The method of generating a 3D model as defined in claim 1, wherein image comparisons from frame to frame to identify differential rates of movement are based on “
    - best”
      
      feature matches rather than absolute matches.
  - 13. The method of generating 3D models in claim 1, wherein processing can occur during image capture in a 3D camera, or at the point of viewing, for example in a set-top box, digital media hub or computer.
  - 15. The method of generating 3D models in claim 1, wherein the software interface includes optional adjustable controls for:
    - the popout between foreground layer and background;
      
      keyframe frequency;
      
      extent of foreground objects;
      
      rate at which wire frame changes; and
      
      depth of field.
  - 18. The method of generating 3D models in claim 1, wherein the XYZ viewpoint can move within the scene beyond a central or “
    - nodal”
      
      point and around foreground objects which exhibit parallax when the viewpoint moves.
  - 19. The method of generating 3D models in claim 1, wherein digital video in a variety of formats including files on disk, web cam output, streaming online video and cable broadcasts can be processed, texture-mapped and replayed in 3D, using software development kits (SDKs) in platforms such as DirectX or OpenGL.
  - 20. The method of generating 3D models in claim 1, using either linear video or panoramic video with coordinate systems such as planes, cylinders, spheres or cubic backgrounds.
  - 21. The method of generating 3D models in claim 1, wherein occlusions can also be filled in as more of the background is revealed, by saving any surface structure and images of occluded areas until new information about them is processed or the initially occluded areas are no longer in the scene.
  - 23. The method of generating 3D models in claim 1, wherein separated scene elements can be transmitted at different frame rates to more efficiently use bandwidth, using video compression codecs such as MPEG-4.
  - 24. The method of generating 3D models in claim 1, wherein the motion analysis automatically creates XYZ points in space for all scene elements visible in an image sequence, not just one individual object.
  - 25. The method of generating 3D models in claim 1, wherein trigonometry can be used with images from different perspectives to convert cross-sectional widths from different angles to XYZ coordinates, knowing the amount of rotation.
  - 26. The method of using object silhouettes from different angles to define object thickness and shape in claim 25, wherein the angle of rotation between photos can be given in a user interface, or the photos can be shot at pre-specified angles for fully automatic rendering of the 3D object model.
  - 27. The method of defining center spines to define the depth of 3D objects as defined in claims 1 and 25, wherein the depth of the object can be defined by one edge down a center ridge on the object, or can be a more rounded polygon surface, with the sharpness of corners being an adjustable user option.
  - 28. The method of generating 3D models in claim 1, wherein triangles are generated on outer object data points to construct a wireframe surface, using columns (or rows) of pairs of data points to work up the column creating triangles between three of the four coordinates, and then down the same column filling in the square with another triangle, before proceeding to the next column.
  - 29. The method of generating 3D wireframe models using triangular polygons as defined in claim 28, wherein the user has an option to join or not join triangles from object edges to the background, creating a single embossed surface map or segmented objects.
  - 30. The method of surface-mapping source images onto wireframe models defined in claim 1, wherein the software can include a variable to move the edge of a picture (the seam) to show more or less of the image, to improve the fit of the edge of the image.
  - 31. The method of generating 3D models from images in claim 1, wherein ambiguity about a moving object'"'"'s speed, size or distance is simply resolved by placing faster-moving objects on a nearer layer, and allowing the realism of the image to overcome the lack of precision in the distance.
  - 32. The method of generating 3D models from images in claim 1, wherein we compare one frame to a subsequent frame using a “
    - mask”
      
      or template of variable size, shape and values that is moved pixel by pixel through an image to track the closest match for variables such as intensity or color of each pixel from one frame to the next, to determine moving areas of the image.
  - 33. The method of detecting movement and parallax in claim 32, wherein an alternative to defining foreground objects using masks is to define areas that change from frame to frame, define a center point of each of those areas, and track that center point to determine the location, rate and direction of movement.
  - 34. The method of processing image sequences in claim 1, wherein it is possible to reduce the geometric calculations required while maintaining the video playback and a good sense of depth, with adjustable parameters that could include:
    - a number of frames to skip between comparison frames, the size of a mask, the number of depth layers created, the number of polygons in an object, and search areas based on previous direction and speed of movement.
  - 35. The methods of generating and navigating 3D models in claims 1 and 3, wherein a basic promotional version of the software and/or 3D models and video fly-throughs created can be zipped into compressed self-executing archive files, and saved by default into a media-sharing folder that is also used for other media content such as MP3 music.
  - 36. The method of generating 3D models from images in claim 1, wherein:
    - a) as a default, any 3D model or video flythrough generated can include a link to a website where others can get the software, with the XYZ location of the link defaulting to a location such as (1,1,1) that could be reset by the user, andb) the link could be placed on a simple shape like a semi-transparent blue sphere, although other objects and colors could be used.
  - 37. The method of generating 3D models from images in claim 1, wherein either continuous navigation in the video can be used, or one-button controls for simpler occasional movement of viewpoint in predefined paths.
  - 38. The method of generating depth maps from images in claim 1, wherein rather than a navigable 3D scene, distance information is used to define disparity in stereo images for viewing with a stereoscope viewer or glasses that give different perspectives to each eye from a single set of images such as red-green, polarized or LCD shutter glasses.
  - 53. The method of generating 3D models as defined in claim 1, wherein the software interface includes an optimal adjustable control to darken the background relative to foreground objects, which enhances perceived depth and pop-out.
  - 54. The method of generating 3D models as defined in claim 4, wherein credibility maps can be assessed along with shift maps and depth maps for more accurate tracking of movement from frame to frame.
  - 55. The method of analyzing movement to infer depth of foreground mattes as defined in claim 4, wherein embossed mattes can be shown that remain attached to the background.
  - 56. The method of analyzing movement to infer depth of foreground mattes as defined in claim 4, wherein embossed mattes can be shown as separate objects that are closer to the viewer.
  - 57. The method of generating 3D models as defined in claim 1, wherein camera movement can be set manually for movement interpretation or calculated from scene analysis.
  - 58. The method of claim 57, wherein the camera is stationary.
  - 59. The method of claim 57, wherein type of camera movement can be lateral.
  - 60. The method of claim 57, wherein the type of camera movement is uncontrolled.
  - 61. The method of generating 3D models of claim 15, wherein the software interface can be adjusted according to the detection frames to account for an object that pop outs to the foreground or back into the background to improve stable and accurate depth modeling.
  - 62. The method of generating stereoscopic views as defined in claim 38, wherein left and right-eye perspectives are displayed in binoculars to produce depth pop outs.

2. The method for taking non-contact measurements of objects and features in a scene based on unit measures of 3D models generated from digital images, for engineering, industrial and other applications, whereby:
- a) once the X, Y and Z coordinates have been defined for points or features, routine mathematics can be used to count or calculate distances and other measures;
  
  b) if measures, data merging or calibrating are needed in a particular scale, users can indicate as few as one length for a visible reference object in a software interface, and XYZ coordinates can be converted to those units; and
  
  c) an interface can allow the user to indicate where measurements are needed, and can show the resulting distances, volumes, or other measures.
- View Dependent Claims (14, 63, 64, 65, 66, 67, 68)
- - 14. The method by which processing can occur either at the point of imaging or viewing as defined in claim 2, whereby this is a method for automatically generating navigable 3D scenes from historical movie footage and more broadly, any linear movie footage.
  - 63. The method of rendering navigable video as defined in claim 14, wherein the default for navigation is to limit the swing of the viewpoint to an adjustable amount.
  - 64. The method of claim 63, wherein the default swing is a defined amount in any direction.
  - 65. The method of claim 64, wherein the defined amount is about 20 degrees in any direction.
  - 66. The method of rendering navigable video as defined in claim 14, wherein the default is to auto return the viewpoint to the start position.
  - 67. The method of rendering navigable 3D scenes from video as defined in claim 14, wherein movement control can be set for keyboard keys and mouse movement allowing the user to move around through a scene using the mouse while looking around using the keyboard.
  - 68. The method of rendering navigable 3D scenes for video as defined in claim 14, wherein movement control can be set for mouse and keyboard keys movement allowing the user to move around through a scene using the keyboard keys while looking around using the mouse.

3. The method for controlling navigation and viewpoint in 3D video, 3D computer games, object movies, 3D objects and panoramic VR scenes with simple body movement and gestures using a web cam to detect foreground motion of the user, which is then transmitted like mouse or keyboard inputs to control the viewpoint or to navigate.
- View Dependent Claims (16, 17, 22, 51, 52)
- - 16. The method of generating hologram effects on ordinary monitors using a videoconferencing camera in claim 3, wherein the user can adjust variables including the sensitivity of changes in viewpoint based on their movements, whether their movement affects mouse-over or mouse-down controls, reversal of movement direction, and the keyframe rate.
  - 17. The method of generating hologram effects on ordinary monitors in claim 3, wherein the user'"'"'s body movements are detected by a video conferencing camera with movement instructions submitted via a dynamic link library (DLL) and/or a software development kit (SDK) for a game engine, or by an operating system driver to add to mouse, keyboard, joystick or gamepad driver inputs.
  - 22. The method for controlling navigation and viewpoint with a videoconferencing camera in claim 3, wherein moving from side to side is detected by the camera and translated into mouse drag commands in the opposite direction to let the user look around foreground objects on the normal computer desktop, to have the ability to look behind windows on-screen.
  - 51. The method of claim 3 for controlling navigation and viewpoint in a 3D video, 3D computer game, object movies, 3D objects and panoramic VR scenes by using a video conferencing camera, wherein the user'"'"'s movements are used to control the orientation, viewing angle and distance of the viewpoint for stereoscopic viewing glasses.
  - 52. The method of claim 51, wherein the stereoscopic viewing glasses are red-green anaglyph glasses, polarized 3D glasses or LCD shutter glasses.

39. A method for automatically segmenting a two-dimensional image sequence into a 3D model, said method including:
- a) a video device used to capture images having two-dimensional coordinates in a digital environment; and
  
  b) a processor configured to receive, convert and process the two-dimensional images that are detected and captured from said video capturing device;
  
  said system generating a point cloud having 3D coordinates from said two-dimensional images, defining edges from the point cloud to generate a wireframe having 3D coordinates, and adding a wiremesh to the wireframe to subsequently texture map the image from the video capturing device onto the wiremesh to display said 3D model on a screen.
- View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
- - 40. The method of claim 39, wherein the processor system is located in a set-top box, a digital media hub or a computer.
  - 41. The method of claim 39, wherein the image device is a video capturing device or a still camera.
  - 42. The method of claim 39, wherein the video capturing device is a video-conferencing camera.
  - 43. The method of any one of claims 39 to 42, wherein the processor further fills in occluded areas by stretching the 3D image edges into the center of the occluded areas.
  - 44. The method of any one of claims 39 to 43, wherein the processor sharpens images that are in the foreground and softens or blurs the images that are further away in the background to create more depth perception.
  - 45. The method of claim 39, wherein the processor includes adjustable controls.
  - 46. The method of claim 45, wherein the adjustable controls regulate the distance between the foreground layer and the background layer and adjust the depth of field.
  - 47. The method of claim 39, wherein the two-dimensional images are in any of a variety of formats including files on disk, web cam output, streaming online video and cable broadcasts.
  - 48. The method of claim 39, using either linear video or panoramic video with coordinate systems such as planes, cylinders, spheres or cubic backgrounds.
  - 49. The method of claim 39, wherein two-dimensional image silhouettes are used at different angles to define 3D object thickness and shape.
  - 50. The method of claim 39, wherein the 3D viewpoint can move within a scene beyond a central or nodal point and around foreground objects which exhibit parallax.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Craig Summers
Original Assignee
Craig Summers
Inventors
Summers, Craig

Application Number

US11/816,978
Publication Number

US 20080246759A1
Time in Patent Office

Days
Field of Search
US Class Current

345/420
CPC Class Codes

G06F 3/0304   Detection arrangements usin...

G06F 3/04815   Interaction with a metaphor...

G06T 17/00   Three dimensional [3D] mode...

G06T 7/579   from motion

G06V 10/26   Segmentation of patterns in...

Automatic Scene Modeling for the 3D Camera and 3D Video

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

68 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic Scene Modeling for the 3D Camera and 3D Video

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

68 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links