VIRTUAL 3D METHODS, SYSTEMS AND SOFTWARE

US 20180307310A1
Filed: 03/21/2016
Published: 10/25/2018
Est. Priority Date: 03/21/2015
Status: Active Grant

First Claim

Patent Images

1. A video communication method that enables a first user to view a second user with direct virtual eye contact with the second user, the method comprising:

capturing images of the second user, the capturing comprising utilizing at least one camera having a view of the second user'"'"'s face;

generating a data representation, representative of the captured images;

reconstructing a synthetic view of the second user, based on the representation; and

displaying the synthetic view to the first user on a display screen used by the first user;

the capturing, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user'"'"'s display screen, by the reconstructing and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, even if camera has a direct eye contact gaze vector to the second user.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems and computer program products (“software”) enable a virtual three-dimensional visual experience (referred to herein as “V3D”) videoconferencing and other applications, and capturing, processing and displaying of images and image streams.

89 Citations

View as Search Results

168 Claims

1. A video communication method that enables a first user to view a second user with direct virtual eye contact with the second user, the method comprising:
- capturing images of the second user, the capturing comprising utilizing at least one camera having a view of the second user'"'"'s face;
  
  generating a data representation, representative of the captured images;
  
  reconstructing a synthetic view of the second user, based on the representation; and
  
  displaying the synthetic view to the first user on a display screen used by the first user;
  
  the capturing, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user'"'"'s display screen, by the reconstructing and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, even if camera has a direct eye contact gaze vector to the second user.
- View Dependent Claims (2, 3, 4, 5, 6, 26, 27, 30, 31, 34, 75, 76, 79, 89, 90, 91, 93, 91, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119)
- - 2. The method of claim 1 further comprising:
    - executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; and
      
      wherein the data representation is representative of the captured images and the corresponding disparity values;
      
      the capturing, detecting, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user'"'"'s display screen.
  - 3. The method of claim 2 wherein the capturing comprises utilizing at least two cameras, each having a view of the second user'"'"'s face;
    - andwherein executing a feature correspondence function comprises detecting common features between corresponding images captured by the respective cameras.
  - 4. The method of claim 1 wherein:
    - the capturing comprises utilizing at least one camera having a view of the second user'"'"'s face and which is an infra-red time-of-flight camera that directly provides depth information; and
      
      the data representation is representative of the captured images and corresponding depth information.
  - 5. The method of claim 2 wherein:
    - the capturing comprises utilizing a single camera having a view of the second user'"'"'s face; and
      
      executing a feature correspondence function comprises detecting common features between images captured by the single camera over time and measuring a relative distance in image space between the common features, to generate disparity values.
  - 6. The method of claim 3 wherein:
    - the captured images of the second user comprise visual information of the scene surrounding the second user; and
      
      the capturing, detecting, generating, reconstructing and displaying are executed such that;
      
      (a) the first user is provided the visual impression of looking through his display screen as a physical window to the second user and the visual. scene surrounding the second user, and(b) the first user is provided an immersive visual experience of the second user and the scene surrounding the second user.
  - 26. The method of claim 6 wherein:
    - the cameras for capturing images of the second user are located at or near the periphery or edges of a display device used by the second user, the display device used by the second user having a display screen viewable by the second user and having a geometric center, and the synthetic view of the second user corresponds to a selected virtual camera location, the selected virtual camera location corresponding to a point at or proximate to the geometric center.
  - 27. The method of claim 6 wherein the cameras for capturing images of the second user are located at a selected position outside the periphery or edges of a display device used by the second user.
  - 30. The method of claim 6 further comprising:
    - estimating a location of the first user'"'"'s head or eyes, thereby generating tracking information; and
      
      wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking information.
  - 31. The method of claim 6 wherein:
    - camera shake effects are inherently eliminated, in that the capturing, detecting, generating, reconstructing and displaying are executed such that the first user has a virtual direct view through his display screen to the second user and the visual scene surrounding the second user; and
      
      scale and perspective of the image of the second user and objects in the visual scene surrounding the second user are accurately represented to the first user regardless of user view distance and angle.
  - 34. The method of claim 6 wherein the method is adapted for implementation on computing or telecommunications devices comprising any of tablet computing devices, computer-driven television displays or computer-driven image projection devices, and wherein the cameras for capturing images of the second user are located at or near the periphery or edges of a computing or telecommunications device used by the second user.
  - 75. The method of claim 6 wherein generating a data representation comprises:
    - flagging a given image feature with a reference count indicating how many samples reference the given image feature, thereby to differentiate a uniquely referenced image feature, and a sample corresponding to the uniquely referenced image feature, from repeatedly referenced image features; and
      
      utilizing the reference count, extracting unique samples, so as to enable a reduction in bandwidth requirements.
  - 76. The method of claim 75 wherein generating a data representation further comprises:
    - utilizing the reference count to encode and transmit a given sample exactly once, even if a pixel or image feature corresponding to the sample is repeated in multiple camera views, so as to enable a reduction in bandwidth requirements.
  - 79. The method of claim 30 wherein:
    - reconstructing a synthetic view comprises utilizing the tracking information to control a 2D crop box, such that the synthetic view is reconstructed based on the view origin, and then cropped and scaled so as to fill the first user'"'"'s display screen view window; and
      
      the minima and maxima of the crop box are defined as a function of the first user'"'"'s head location with respect to the display screen, and the dimensions of the display screen view window.
  - 89. The method of claim 30 wherein displaying the synthetic view to the first user on a display screen used by the first user comprises:
    - displaying the synthetic view to the first user on a 2D display screen; and
      
      updating the display in real-time, based on the tracking information, so that the display appears to the first user to be a window into a 3D scene responsive to the first user'"'"'s head or eye location.
  - 90. The method of claim 30 wherein displaying the synthetic view to the first user on a display screen used by the first user comprises:
    - displaying the synthetic view to the first user on a binocular stereo display device.
  - 91. The method of claim 30 wherein displaying the synthetic view to the first user on a display screen used by the first user comprises:
    - displaying the synthetic view to the first user on a lenticular display that enables auto-stereoscopic viewing.
  - 93. The program product of claim 92 further comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to execute the method of any of claims 1, 6, 7, 8, 10,13, 14, 20.
  - 91-1. The program product of claim 94 further comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to execute the method of any of claims 1, 6, 7, 8, 10,13, 14, 20.
  - 97. The program product of claim 96 further comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 99. The program product of claim 98 further comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium which when executed in the digital processing resource cause the digital processing resource to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 101. The program product of claim 100 further comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 103. The program product of claim 102 further comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 105. The program product of claim 104 further comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 107. The digital processing system of claim 106 wherein the digital processing resource is operable to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 109. The digital processing system of claim 108 wherein the digital processing resource is operable to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 111. The digital processing system of claim 110 wherein the digital processing resource is operable to execute the method of ally of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 113. The digital processing system of claim 112 wherein the digital processing resource is operable to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 115. The digital processing system of claim 114 wherein the digital processing resource is operable to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 117. The digital processing system of claim 116 wherein the digital processing resource is operable to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.
  - 119. The digital processing system of claim 118 wherein the digital processing resource is operable to execute the method of any of claims 1, 6, 7, 8, 10, 13, 14, 20.

7. A video communication method that enables a user to view a remote scene in a manner that gives the user a visual impression of being present with respect to the remote scene, the method comprising:
- capturing images of the remote scene, the capturing comprising utilizing at least two cameras each having a view of the remote scene;
  
  executing a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in mage space between the common features, to generate disparity values;
  
  generating a data representation, representative of the captured images and the corresponding disparity values,reconstructing a synthetic view of the remote scene, based on the representation; and
  
  displaying the synthetic view to the first user on a display screen used by the first user;
  
  the capturing, detecting, generating, reconstructing and displaying being executed such that;
  
  (a) the user is provided the visual impression of looking through his display screen as a physical window to the remote scene, and(b) the user is provided an immersive visual experience of the remote scene.

8. method of facilitating self-portraiture of a user utilizing a handheld device to take the self-portrait, the handheld mobile device having a display screen for displaying images to the user, the method comprising:
- providing at least one camera around the periphery of the display screen, the at least one camera haying a view of the user'"'"'s face at a self portrait, setup time during which the user is setting up the self-portrait;
  
  capturing images of the user during the setup time, utilizing the at least one camera around the periphery of the display screen,estimating a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information;
  
  generating a data representation, representative of the captured images;
  
  reconstructing a synthetic view of the user, based on the generated data representation and the generated tracking information;
  
  displaying to the user, on the display screen during the setup time, the synthetic view of the user;
  
  thereby enabling the user, while setting up the self-portrait, to selectively orient or position his gaze or head, or the handheld device and its camera, with realtime visual feedback.
- View Dependent Claims (9)
- - 9. The method of claim 8 wherein the capturing, estimating, generating, reconstructing and displaying are executed such that, in the self-portrait, the user can appear to be looking directly into the camera, even if the camera does not have a direct eye contact gaze vector to the user.

10. A method of facilitating composition of a photograph of a scene, by a user utilizing a handheld device to take the photograph, the handheld device having a display screen on a first side for displaying images to the user, and at least one camera on a second, opposite side of the handheld device, for capturing images, the method comprising:
- capturing images of the scene, utilizing the at least one camera, at a photograph setup time during which the user is setting up the photograph;
  
  estimating a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information;
  
  generating a data representation, representative of the captured images;
  
  reconstructing a synthetic view of the scene, based on the generated data representation and the generated tracking information, the synthetic view being reconstructed such that the scale and perspective of the synthetic view has a selected correspondence to the user'"'"'s viewpoint relative to the handheld device and the scene; and
  
  displaying to the user, on the display screen during the setup time, the synthetic view of the scene;
  
  thereby enabling the user, while setting up the photograph, to frame the scene to be photographed, with selected scale and perspective within the display frame, with realtime visual feedback.
- View Dependent Claims (11, 12, 23, 24, 25, 28, 29, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 77, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88)
- - 11. The method of claim 10 wherein the user can control the scale and perspective of the synthetic view by changing the position of the handheld device relative to the position of the user'"'"'s head.
  - 12. The method of claim 10 wherein estimating a location of the user'"'"'s head or eyes relative to the handheld device comprises utilizing at least one camera on the first, display side of the handheld device, having a view of the user'"'"'s head or eyes during photograph setup time
  - 23. The method of claim 6, 7, 8, 10, 13, 14 or 20 further comprising:
    - executing image rectification to compensate for optical distortion of each camera and relative misalignment of the cameras.
  - 24. The method of claim 23 wherein executing image rectification comprises applying a 2D image space transform.
  - 25. The method of claim 24 wherein applying a 2D image space transform comprises utilizing a GPGPU processor running a shriller program
  - 28. The method of claim 6, 7, 8, 10, 13, 14, or 20 wherein respective camera view vectors are directed in non-coplanar orientations.
  - 29. The method of claim 6, 7, 8, 10, 13, 14, or 20 wherein the cameras for capturing images of the second user or remote scene are located in selected positions and positioned with selected orientations around the second user or remote scene.
  - 32. The method of claim 6, 8, or 10 wherein the method is adapted for implementation on a mobile telephone device, and the cameras for capturing images of the second user are located at or near the periphery or edges of a mobile telephone device used by the second user.
  - 33. The method of claim 6, 8, or 10 wherein the method is adapted for implementation on a laptop or desktop computer, and the cameras for capturing images of the second user are located at or near the per or edges of a display device of a laptop or desktop computer used by the second user,
  - 35. The method of claim 6, 7, 8, 10, 13, 14, or 20 wherein the capturing comprises utilizing at least one color camera.
  - 36. The method of claim 6, 7, 8, 10, 13, 14 or 20 wherein the capturing comprises utilizing at least one infrared structured light emitter.
  - 37. The method of claim 6, 7, 8, 10, 13, 14 or 20 wherein the capturing comprises utilizing a view vector rotated camera configuration wherein:
    - the locations of first and second cameras define a line; and
      
      the line defined by the first and second camera locations is rotated by a selected amount from a selected horizontal or vertical axis;
      
      thereby increasing the number of valid feature correspondences identified in typical real-world settings by the feature correspondence function,
  - 38. The method of claim 37 wherein the first and second cameras are positioned relative to each other along epipolar lines.
  - 39. The method of claim 37 wherein subsequent to the capturing of images, disparity values are rotated back to a selected horizontal or vertical orientation along with the captured images,
  - 40. The method of claim 37 wherein subsequent to the reconstructing of a synthetic view, the synthetic view is rotated back to a selected horizontal or vertical orientation,
  - 41. The method of claim 6, 7, 8, 10, 13, 14 or 20 wherein the capturing comprises utilizing exposure cycling, the exposure cycling comprising:
    - dynamically adjusting the exposure of the cameras on a frame-by-frame basis to improve disparity estimation in regions outside the exposed region viewed by the user;
      
      wherein a series of exposures are taken, including exposures lighter than and exposures darker than a visibility-optimal exposure, disparity values are calculated for each exposure, and the disparity values are integrated into an overall disparity solution over time, so as to improve disparity estimation.
  - 42. The method of claim 6, 7, 8, 10, 13, 14 or 20 wherein the capturing comprises utilizing exposure cycling, the exposure cycling comprising:
    - dynamically adjusting the exposure of the cameras on a frame-by-frame basis to improve disparity estimation in regions outside the exposed region viewed by the user;
      
      wherein a series of exposures are taken, including exposures lighter than and exposures darker than a visibility-optimal exposure, disparity values are calculated for each exposure, and the disparity values are integrated in a disparity histogram, the disparity histogram being converged over time, so as to improve disparity estimation.
  - 43. The method of claim 42 further comprising analyzing quality of the overall disparity solution on respective dark, mid-range and light pixels to generate variance information used to control the exposure settings of the cameras, thereby to form a closed loop between the quality of the disparity estimate and the set of exposures requested from the cameras.
  - 44. The method of claim 43 further comprising analyzing variance of the disparity histograms on respective dark, mid-range and light pixels to generate variance information used to control the exposure settings of the cameras, thereby to form a closed loop between the quality of the disparity estimate and the set of exposures requested from the cameras.
  - 45. The method of claim 6, 7, 8, 10, 13, 14 or 20 wherein the feature correspondence function comprises evaluating and combining vertical- and horizontal-axis correspondence information.
  - 46. The method of claim 45 wherein the feature correspondence function further comprises applying, to image pixels containing a disparity solution, a coordinate transformation, to a unified coordinate system.
  - 47. The method of claim 46 wherein the unified coordinate system is the un-rectified coordinate system of the captured images.
  - 48. The method of claim 45 further comprising utilizing at least three cameras arranged in a substantially “
    - L”
      
      -shaped configuration, such that a pair of cameras is presented along a first axis and a second pair of cameras is presented along a second axis substantially perpendicular to the first axis.
  - 49. The method of claim 6 or 48 further wherein the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence.
  - 50. The method of claim 6, 7, 8, 10, 13, 14 or 20 wherein the feature correspondence function comprises refining correspondence information overtime.
  - 51. The method of claim 50 wherein the refining comprises retaining a disparity solution over a time interval, and continuing to integrate disparity solution values for each image frame over the time interval, so as to converge on an improved disparity solution by sampling over time.
  - 52. The method of claim 6, 7, 8, 10, 13, 14 or 20 wherein the feature correspondence function comprises filling unknowns in a correspondence information set with historical data obtained from previously captured images.
  - 53. The method of claim 52 wherein the filling of unknowns comprises:
    - if a given image feature is detected in an image captured by one of the cameras, and no corresponding image feature is found in a corresponding image captured by another of the cameras, then utilizing data for a pixel corresponding to the given image feature, from a corresponding, previously captured image.
  - 54. The method of claim 6, 7, 8, 10, 13, 14 or 20 wherein the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence.
  - 55. The method of claim 54 wherein utilizing a disparity histogram-based method comprises constructing a disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel.
  - 56. The method of claim 55 wherein the disparity histogram functions as a Probability Density Function (PDF) of disparity for the given pixel, in which higher values indicate a higher probability of the corresponding disparity range being valid for the given pixel.
  - 57. The method of claim 56 wherein one axis of the disparity histogram indicates a given disparity range, and a second axis of the histogram indicates the number of pixels in a kernel surrounding the central pixel in question that are voting for the given disparity range,
  - 58. The method of claim 57 wherein votes indicated by the disparity histogram are initially generated utilizing a Sum of Square Differences (SSD) method.
  - 59. The method of claim 58 wherein utilizing an SSD method comprises:
    - executing an SSD method with a relatively small kernel to produce a fast dense disparity map in which each pixel has a selected disparity that represents the lowest error;
      
      then, processing a plurality of pixels to accumulate into the disparity histogram a tally of the number of votes fora given disparity in a relatively larger kernel surrounding the pixel in question.
  - 60. The method of claim 57 further comprising transforming the disparity histogram into a Cumulative Distribution Function (CDF) from which the width of a corresponding interquartile range can be determined, thereby to establish a confidence level in the corresponding disparity solution,
  - 61. The method of claim 58 further comprising maintaining a count of the number of statistically significant modes in the histogram, thereby to indicate modality.
  - 62. The method of claim 61 wherein modality is utilized as an input to reconstruction, to control application of stretch vs. slide reconstruction method.
  - 63. The method of claim 58 further comprising maintaining a disparity histogram over a selected time interval and accumulating samples into the histogram, thereby to compensate for camera noise or other sources of motion or error.
  - 64. The method of claim 58 further comprising:
    - generating fast disparity estimates for multiple independent axes; and
      
      thencombining the corresponding, respective disparity histograms to produce a statistically more robust disparity solution.
  - 65. The method of claim 60 further comprising:
    - evaluating the interquartile range of a CDF of a given disparity histogram to produce an interquartile result; and
      
      if the interquartile result is indicative of an area of poor sampling signal to noise ratio, due to camera over- or underexposure, then controlling camera exposure based on the interquartile result to improve a poorly sampled area of a given disparity histogram.
  - 66. The method of claim 38 further comprising:
    - testing for only a small set of disparity values using a small-kernel SSD method to generate initial results;
      
      populating a corresponding disparity histogram with the initial results; and
      
      then using histogram votes to drive further SSD testing within a given range to in prove disparity resolution over time,
  - 67. The method of claim 58 further comprising:
    - extracting sub-pixel disparity information from the disparity histogram, the extracting comprising;
      
      where the histogram indicates a maximum-vote disparity range and an adjacent, runner-up disparity range, calculating a weighted average disparity value based on the ratio between the number of votes for each of the adjacent disparity ranges.
  - 68. The method of claim 6, 7, 8, 10, 13, 14, or 20 wherein the feature correspondence function comprises weighting toward a center pixel in a Sum of Squared Differences (SSD) approach, the weighting comprising:
    - applying a higher weight to the center pixel for which a disparity solution is sought, and a lesser weight outside the center pixel, the lesser weight being proportional to the distance of a given kernel sample from the center.
  - 69. The method of claim 6, 7, 8, 10, 13, 14, or 20 wherein the feature correspondence function comprises optimizing generation of disparity values on GPGPU computing structures.
  - 70. The method of claim 6, 7, 8, 10, 13, 14, or 20 wherein generating a data representation comprises:
    - generating a data structure representing 2D coordinates of a control point in image space, and containing a disparity value treated as a pixel velocity in screen space with respect to a given movement of a given view vector; and
      
      utilizing the disparity value in combination with a movement vector to slide a pixel in a given source image in selected directions, in 2D, to enable a reconstruction of 3D image movement,
  - 71. The method of claim 70 wherein:
    - each camera generates a respective camera stream; and
      
      the data structure further contains a sample buffer index, stored in association with the control point coordinates, that indicates which camera stream to sample in association with the given control point.
  - 72. The method of claim 71 further comprising:
    - determining whether a given pixel should be assigned a control point.
  - 73. The method of claim 72 further comprising:
    - assigning control points along image edges.
  - 74. The method of claim 73 wherein assigning control points along image edges comprises executing computations enabling identification of image edges.
  - 77. The method of claim 73 further comprising:
    - estimating a location of the first user'"'"'s head or eyes, thereby generating tracking information;
      
      wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking information; and
      
      wherein 3D image reconstruction is executed by warping a 2D image by utilizing the control points, by sliding a given pixel along a head movement vector at a displacement rate proportional to disparity, based on the tracking information and disparity values.
  - 78. The method of claim 77 wherein the disparity values are acquired from the feature correspondence function or from a control point data stream.
  - 80. The method of claim 70 wherein reconstructing a synthetic view comprises executing a 2D warping reconstruction of a selected view based on selected control points, wherein the 2D warping reconstruction comprises:
    - designating a set of control points, respective control points corresponding to respective, selected pixels in a source image;
      
      sliding the control points in selected directions in 2D space, wherein the control points are slid proportionally to respective disparity values; and
      
      interpolating data values for pixels between the selected pixels corresponding to the control points;
      
      so as to create a synthetic view of the image from a selected new perspective in 3D space.
  - 81. The method of claim 80 further comprising:
    - rotating the source, image and control point coordinates such that rows or columns of image pixels are parallel to the vector between the original source image center and the new view vector defined by the selected now perspective.
  - 82. The method of claim 81 further comprising:
    - rotating the source image and control point coordinates so as to align the view vector to image scanlines;
      
      iterating through each scanline and each control point for a given scanline, generating a line element beginning and ending at each control point in 2D image space, with the addition of the corresponding disparity value multiplied by the corresponding view vector magnitude with the corresponding x-axis coordinate;
      
      assigning a texture coordinate to the beginning and ending points of each generated line element, equal to their respective, original 2D location in the source image;
      
      interpolating texture coordinates linearly along each line element;
      
      thereby to create a resulting image in which image data between the control points is linearly stretched.
  - 83. The method of claim 82 further comprising:
    - rotating the resulting image the inverse of the rotation applied to align the view vector with the scanlines.
  - 84. The method of claim 82 further comprising:
    - linking the control points between scanlines, as well as along scanlines, to create polygon elements defined by the control points, across which interpolation is executed.
  - 85. The method of claim 82 wherein reconstructing a synthetic view further comprises:
    - for a given source image, selectively sliding image foreground and image background independently of each other.
  - 86. The method of claim 85 wherein sliding is utilized in regions of large disparity or depth change.
  - 87. The method of claim 80 wherein a determination of whether to utilize sliding comprises:
    - evaluating a disparity histogram to detect multi-modal behavior indicating that a given control point is on an image boundary for which allowing foreground and background to slide independent of each other presents a better solution than interpolating depth between foreground and background,wherein the disparity histogram functions as a Probability Density Function (PDF) of disparity for a given pixel, in which higher values indicate a higher probability of the corresponding disparity range being valid for the given pixel.
  - 88. The method of claim 6, 7, 8, 10, 13 or 14 wherein reconstructing a synthetic view utilizes at least one Sample Integration Function Table (SIFT), the SIFT comprising a table of sample integration functions for one or more pixels in a desired output resolution of an image to be displayed to the user, wherein a given sample integration function maps an input view origin vector to at least one known, weighted 2D image sample location in at least one input image buffer.

13. A method of displaying images to a user utilizing a binocular stereo head-mounted display (HMD), the method comprising:
- capturing at least two image streams using at least one camera attached or mounted on or proximate to an external portion or surface of the HIM, the captured image streams containing images of a scene;
  
  generating a data representation, representative of captured images contained in the captured image streams;
  
  reconstructing two synthetic views, based on the representation; and
  
  displaying the synthetic views to the user, via the HMD;
  
  the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes,so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The method of claim 13 or 14 further comprising:
    - tracking the location or position of the user'"'"'s head or eyes to generate a motion vector usable in the reconstructing of synthetic views,
  - 16. The method of claim 15 further comprising:
    - using the motion vector to modify the respective view origins, during the reconstructing of synthetic views, so as to produce intermediate image frames to be interposed between captured image frames in the captured image streams; and
      
      interposing the intermediate image frames between the captured image frames so as to reduce apparent latency.
  - 17. The method of claim 13 or 14 further comprising:
    - executing a feature correspondence function, by detecting common features between corresponding images captured by at least one camera, and measuring a relative distance in image space between the common features, to generate disparity values; and
      
      wherein the data representation is representative of the captured images and corresponding disparity values.
  - 18. The method of claim 13 or 14 wherein:
    - the capturing comprises utilizing at least one infra-red time-of-flight camera that directly provides depth information; and
      
      the data representation is representative of the captured images and corresponding depth information.
  - 19. The method of claim 13 or 14 wherein at least one camera is a panoramic camera, night-vision camera, or thermal imaging camera.

14. A method of capturing and displaying image content on a binocular stereo head-mounted display (HMD), the method comprising:
- capturing at least two image streams using at least one camera, the captured image streams containing images of a scene;
  
  generating a data representation, representative of captured images contained in the captured image streams;
  
  reconstructing two synthetic views, based on the representation; and
  
  displaying the synthetic views to a user via the HMD;
  
  the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes,so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.

20. A method of generating an image data stream for use by a control system of an autonomous vehicle, the method comprising:
- capturing images of a scene around at least a portion of the vehicle, the capturing comprising utilizing at least one camera having a view of the scene;
  
  executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values;
  
  calculating corresponding depth information based on the disparity values; and
  
  generating from the images and corresponding depth information an image data stream for use by the control system.
- View Dependent Claims (21, 22)
- - 21. The method of claim 20 wherein:
    - capturing comprises utilizing at least two cameras, each having a view of the scene andexecuting a feature correspondence function comprises detecting common features between corresponding images captured by the respective cameras.
  - 22. The method of claim 20 wherein:
    - capturing comprises utilizing a single camera having a view of the scene; and
      
      executing a feature correspondence function comprises detecting common features between images captured by the single camera over time and measuring a relative distance in image space between the common features, to generate disparity values.

92. A program product for use with a digital processing system, for enabling a first user to view a second user with direct virtual eye contact with the second user, the digital processing system comprising at least one camera having a view of the second user'"'"'s face, a display screen for use by the first user, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
- capture images of the second user, utilizing the at least one camera;
  
  generate a data representation, representative of the captured images;
  
  reconstruct a synthetic view of the second user, based on the representation; and
  
  display the synthetic view to the first user on the display screen for use by the first user;
  
  the capturing, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user'"'"'s display screen, by the reconstructing and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, even if no camera has a direct eye contact gaze vector to the second user.

94. A program product for use with a digital processing system, for enabling a first user to view a remote scene with the visual impression of being present with respect to the remote scene, the digital processing system comprising at least two cameras, each having a view of the remote scene, a display screen for use by the first user, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
- capture images of the remote scene, utilizing the at least two cameras;
  
  execute a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values;
  
  generate a data representation, representative of the captured images and the corresponding disparity values;
  
  reconstruct a synthetic view of the remote scene, based on the representation; and
  
  display the synthetic view to the first user on the display screen;
  
  the capturing, detecting, generating, reconstructing and displaying being executed such that;
  
  (a) the first user is provided the visual impression of looking through his display screen as a physical window to the remote scene, and(b) the first user is provided an immersive visual experience of the remote scene.

96. A program product for use with a handheld digital processing device, for facilitating self-portraiture of a user utilizing the handheld device to take the self portrait, the handheld device having a digital processor, a display screen for displaying images to the user, and at least one camera around the periphery of the display screen, the at least one camera having a view of the user'"'"'s face at a self portrait setup time during which the user is setting up the self portrait, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processor cause the digital processor to:
- capture images of the user during the setup time, utilizing the at least one camera around the periphery of the display screen;
  
  estimate a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information;
  
  generate a data representation, representative of the captured images;
  
  reconstruct a synthetic view of the user, based on the generated data representation and the generated tracking information; and
  
  .display to the user, on the display screen during the setup time, the synthetic view of the user;
  
  thereby enabling the user, while setting up the self-portrait, to selectively orient or position his gaze or head, or the handheld device and its camera, with realtime visual feedback.

98. A program product for use with a handheld digital processing device, for facilitating composition of a photograph of a scene by a user utilizing the handheld device to take the photograph, the handheld device having a digital processor, a display screen on a first side for displaying images to the user, and at least one camera on a second, opposite side of the. handheld device, for capturing imams, the program product comprising digital processor executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processor cause the digital processor to:
- capture images of the scene, utilizing the at least one camera, at a photograph setup time clawing which the user is setting up the photograph;
  
  estimate a location of the user'"'"'s head or eyes relative to the handheld device during the setup thereby generating tracking information;
  
  generate a data representation, representative of the captured images;
  
  reconstruct a synthetic view of the scene, based on the generated data representation and the generated tracking information, the synthetic view being reconstructed such that the scale and perspective of the synthetic view has a selected correspondence to the user'"'"'s viewpoint relative to the handheld device and the scene; and
  
  display to the user, on the display screen during the setup time, the synthetic view of the scene;
  
  thereby enabling the user, while setting up the photograph, to frame the scene to be photographed, with selected scale and perspective within the display frame, with realtime visual feedback.

100. A program product for enabling display of images to a user utilizing a binocular stereo head-mounted display (HMD), the HMD having at least one camera attached or mounted on or proximate to an external portion or surface of the HMD, the HMD having, or being in communication with, a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
- capture at least two image streams using the at least one camera the captured image streams containing images of a scene,generate a data representation, representative of captured images contained in the captured image streams;
  
  reconstruct two synthetic views, based on the representation; and
  
  display the synthetic views to the user, via the HMD;
  
  the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes,so as to provide the user with a substantially natural visual experience of the perspective. binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.

102. A program product for enabling display of captured image content to a user utilizing a binocular stereo head-mounted display (HMD), the captured image content comprising at least two image streams captured or generated by at least one camera, the captured image streams containing images of a scene, and the HMD having, or being in communication with, a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
- generate a data representation, representative of captured images contained in the captured image streams;
  
  reconstruct two synthetic views, based on the representation; and
  
  display the synthetic views to a user, via the HMD;
  
  the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations. of the user'"'"'s left and right eves,so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.

104. A program product for enabling the generation of an image data stream for use by a control system of an autonomous vehicle, the vehicle having at least one camera with a view of a scene around at least a portion of the vehicle and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
- capture images of the scene around at least a portion of the vehicle, using the at least one camera;
  
  execute a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values;
  
  calculate corresponding depth information based on the disparity values; and
  
  generate from the images and corresponding depth information an image data stream for use by the control system.

106. A digital processing system for enabling a first user to view a second user with direct virtual eye contact with the second user, the digital processing system comprising:
- at least one camera haying a view of the second user'"'"'s face;
  
  a display screen for use by the first user; and
  
  a digital processing resource comprising at least one digital processor, the digital processing resource being operable to;
  
  capture images of the second user, utilizing the at least one camera;
  
  generate a data representation, representative of the captured images;
  
  reconstruct a synthetic view of the second user, based on the representation; and
  
  display the synthetic view to the first user on the display screen for use by the first user;
  
  the capturing, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user'"'"'s display screen, by the reconstructing and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, even if no camera has a direct eye contact gaze vector to the second user,

108. A digital processing system for enabling a first user to view a remote scene with the visual impression of being present with respect to the remote scene, the digital processing system comprising:
- at least two cameras, each having a view of the remote scene;
  
  a display screen for use by the first user; and
  
  a digital processing resource comprising at least one digital processor, the digital processing resource being operable to;
  
  capture images of the remote scene, utilizing the at least two cameras;
  
  execute a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values;
  
  generate a data representation, representative of the captured images and the corresponding disparity values;
  
  reconstruct a synthetic view of the remote scene, based on the representation; and
  
  display the synthetic view to the first user on the display screen;
  
  the capturing, detecting, generating, reconstructing and displaying being executed such that;
  
  (a) the first user is provided the visual impression of looking through his display screen as a physical window to the remote scene, and(b) the first user is provided an immersive visual experience of the remote scene.

110. A system operable in a handheld digital processing device, for facilitating self-portraiture of a user utilizing the handheld device to take the self portrait, the system comprising:
- a digital processor;
  
  a display screen for displaying images to the user; and
  
  at least one camera around the periphery of the display screen, the at least one camera having a view of the users face at a self portrait setup time during which the user is setting up the self portrait;
  
  the system being operable to;
  
  capture images of the user during the setup time, utilizing the at least one camera around the periphery of the display screen;
  
  estimate a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information;
  
  generate a data representation, representative of the captured images;
  
  reconstruct a synthetic view of the user, based on the generated data representation and the generated tracking information; and
  
  display to the user, on the display screen during the setup time, the synthetic view of the user;
  
  thereby enabling the user while setting up the self-portrait, to selectively orient or position his gaze or head, or the handheld device and its camera, with realtime visual feedback.

112. A system operable in a handheld digital processing device, for facilitating composition of a photograph of a scene by a user utilizing the handheld device to take the photograph, the system comprising:
- a digital processor;
  
  a display screen on a first side of the handheld device for displaying, images to the user; and
  
  at least one camera on a second, opposite side of the handheld device, for capturing images;
  
  the system being operable to;
  
  capture images of the scene, utilizing the at least one camera, at a photograph setup time during which the user is setting up the photograph;
  
  estimate a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information;
  
  generate a data representation, representative of the captured images;
  
  reconstruct a synthetic view of the scene, based on the generated data representation and the generated tracking information, the synthetic view being reconstructed such that the scale and perspective of the synthetic view has a selected correspondence to the user'"'"'s viewpoint relative to the handheld device and the scene; and
  
  display to the user, on the display screen during the setup time, the synthetic view of the scene;
  
  thereby enabling the user, while setting up the photograph, to frame the scene to be photographed, with selected scale and perspective within the display frame, with realtime visual feedback.

114. A system for enabling display of images to a user utilizing a binocular stereo bead-mounted display (HMD), the system comprising:
- at least one camera attached or mounted on or proximate to an external portion or surface of the HMD; and
  
  a digital processing resource comprising at least one digital processor;
  
  the system being operable to;
  
  capture at least two image streams using the at least one camera, the captured image streams containing images of a scene;
  
  generate a data representation, representative of captured images contained in the captured image streams;
  
  reconstruct two synthetic views, based on the representation; and
  
  display the synthetic views to the user, via the HMD;
  
  the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes,so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an MID,

116. A program product for enabling display of captured image content to a user utilizing a binocular stereo head-mounted display (HMD), the captured image content comprising at least two image streams captured or generated by at least one camera, the captured image streams containing images of a scene, and the MMD having, or being in communication with, a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed. in the digital processing resource cause the digital processing resource to:
- generate a data representation, representative of captured images contained in the captured image streams;
  
  reconstruct two synthetic views, based on the representation; and
  
  display the synthetic views to a user, via the HMD;
  
  the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes,so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.

118. An image processing system for enabling the generation of an image data stream for use by a control system of an autonomous vehicle, the image processing system comprising:
- at least one camera with a view of a scene around at least a portion of the vehicle; and
  
  a digital processing resource comprising at least one digital processor;
  
  the system being operable to;
  
  capture images of the scene around at least a portion of the vehicle, using the at least one camera;
  
  execute a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values;
  
  calculate corresponding depth information based on the disparity values; and
  
  generate from the images and corresponding depth information an image data stream for use by the control system.

120. A video capture and processing method, comprising:
- capturing images of a scene, the capturing comprising utilizing at least first and second cameras having a view of the scene, the cameras being arranged along ail axis to configure a stereo camera pair having a camera pair axis; and
  
  executing a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises;
  
  constructing a multi-level disparity histogram indicating the relative probability of a given. disparity value being correct for a given pixel, the constructing of a multi-level disparity histogram. comprising;
  
  executing a Fast Dense Disparity Estimate (FDDE) image pattern matching operation on successively lower-frequency downsampled versions of the input stereo images, the successively lower-frequency downsampled versions constituting a set of levels of FDDE histogram votes.
- View Dependent Claims (121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 133)
- - 121. The method of claim 120 wherein each level is assigned a level number, and wherein each successively higher level is characterized by a lower image resolution.
  - 122. The method of claim 121 wherein downsampling comprises reducing image resolution via low-pass filtering.
  - 123. The method of claim 122 wherein downsampling comprises utilizing a weighted summation of a kernel in level [n−
    - 1] to produce a pixel value in level Int, and wherein the normalized kernel center position remains the same across all levels.
  - 124. The method of claim 123 wherein, for a given desired disparity solution at full image resolution, the FDDE votes for every image level are included in the disparity solution.
  - 125. The method of claim 124 further comprising:
    - generating a multi-level histogram comprising a set of initially independent histograms at different levels of resolution.
  - 126. The method of claim 125 wherein each histogram bin in a given level represents votes for a disparity determined by the FDDE at that level.
  - 127. The method of claim 126 wherein each histogram bin in a given level has an associated disparity uncertainty range, and wherein the disparity uncertainty range represented by each histogram bin is a selected multiple wider than the disparity uncertainty range of a bin in the preceding level.
  - 128. The method of claim 123 farther comprising:
    - applying a sub-pixel shift to the disparity values at each level during downsampling, to negate rounding error effect.
  - 129. The, method of claim 128 wherein applying a sub-pixel shift comprises applying a half pixel shift to only one of the images in a stereo pair at each level of downsampling.
  - 130. The method of claim 129 wherein applying a sub-pixel shift is implemented inline, within the weights of the filter kernel utilized to implement the downsampling from level to level.
  - 131. The method of claim 130 further comprising:
    - executing histogram integration, the histogram integration comprising;
      
      executing a recursive summation across all the FDDE levels.
  - 132. The method of claim 131 further comprising:
    - during summation, modifying the weighting of each level to control the amplitude of the effect of lower levels in overall voting, by applying selected weighting coefficients to selected levels.
  - 133. The method of claim 132 further comprising:
    - inferring a sub-pixel disparity solution from the disparity histogram, by calculating a sub-pixel offset based on the number of votes for the maximum vote disparity range and the number of votes for an adjacent, runner-up disparity range.
  - 134. The method of claim 120 further comprising:
    - maintaining in a memory unit a summation stack.
  - 133-1. The method of claim 120 wherein capturing of images comprises utilizing at least two stereo camera pairs, each pair being arranged along a respective camera pair axis, and tier each camera pair axis, executing the following:
    - executing image capture utilizing the camera pair to generate image data;
      
      executing rectification and undistorting transformations to transform the image data into RUD image space;
      
      iteratively downsampling to produce multiple, successively lower resolution levels;
      
      executing FDDE calculations for each level to compile FDDE votes for each level;
      
      gathering FDDE disparity range votes into a multi-level histogram;
      
      determining the highest tanked disparity range in the multi-level histogram; and
      
      processing the multi-level histogram disparity data to generate a final disparity result.

136. A video capture and processing method, comprising:
- capturing images of a scene, the capturing comprising utilizing at least first and second cameras having a view of the scene, the cameras being arranged along an axis to configure a stereo camera pair,executing a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, the feature correspondence function further comprising;
  
  generating a disparity solution based on the disparity values;
  
  applying an injective constraint to the disparity solution based on domain and co-domain, wherein the domain comprises pixels for a given image captured by the first camera and the co-domain comprises pixels for a corresponding image captured by the second camera, to enable correction of error in the disparity solution in response to violation of the infective constraint, wherein the injective constraint is that no element in the co-domain is referenced more than once by elements in the domain.
- View Dependent Claims (137, 138, 139, 140, 141, 142)
- - 137. The method of claim 136 wherein applying an injective constraint comprises:
    - maintaining a reference count for each pixel in the co-domain, andchecking whether the reference count for the pixels in the co-domain exceeds “
      
      1”
      
      , and if the count exceeds “
      
      1”
      
      then designating a violation and responding to the violation with a selected error correction approach.
  - 138. The method of claim 137 wherein the selected error correction approach comprises any of:
    - (a) first come, first served,(b) best match wins,(c) smallest disparity wins, or(d) seek alternative candidates.
  - 139. The method of claim 138 wherein the first come, first served approach comprises:
    - assigning priority to the first element in the domain to claim an element in the co-domain, and if a second. element in the domain claims the same co-domain element, invalidating that subsequent match and designating that subsequent match to be invalid.
  - 140. The method of claim 138 wherein the best match win approach comprises:
    - comparing the actual image matching error or corresponding histogram vote count between the two possible candidate elements in the domain against the contested element in the co-domain, and designating as winner the domain candidate with the best match.
  - 141. The method of claim 138 wherein the smallest disparity wins approach comprises:
    - if there is a contest between candidate elements in the domain for a given co-domain element, wherein each candidate element has a corresponding disparity, selecting the domain candidate with the smallest disparity and designating as invalid the others.
  - 142. The method of claim 138 wherein the seek alternative candidates approach comprises:
    - selecting and testing the next best domain candidate, based on a selected criterion, and iterating the selecting and testing until the violation is eliminated or a computational time limit is reached.

143. A video capture method that enables a first user to view a second user with direct virtual eye contact with the second user, the method comprising:
- capturing images of the second user, the capturing comprising utilizing at least one camera having a view of the second user'"'"'s face;
  
  executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values;
  
  generating a data representation, representative of the captured images and the corresponding disparity values;
  
  estimating a three-dimensional (3D) location of the first user'"'"'s head, face or eyes, thereby venerating tracking information; and
  
  reconstructing a synthetic view of the second user, based on the representation, to enable a display to the first user of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking information; and
  
  wherein the location estimating comprises;
  
  passing a captured image of the first user, the captured image including the first user'"'"'s head and thee, to a two-dimensional (2D) facial feature detector that utilizes the image to generate a first estimate of head and eye location and a rotation angle of the thee relative to an image plane;
  
  utilizing an estimated center-of-face position, face rotation angle, and head depth range based on the first estimate, to determine a best-fit rectangle that includes the head;
  
  extracting from the best-fit rectangle all 3D points that lie within the best-fit rectangle, and calculating therefrom a representative 3D head position; and
  
  if the number of valid 3D points extracted from the best-fit rectangle exceeds a selected threshold in relation to the maximum number of possible 3D points in the region, then signaling a valid 3D head position result.
- View Dependent Claims (144, 145, 146, 155)
- - 144. The method of claim 143 further comprising:
    - downsampling the captured image before passing it to the 2D facial feature detector.
  - 145. The method of claim 143 further comprising:
    - interpolating image data from video frame to video frame, based on the time that has passed from a given video frame from a previous video frame.
  - 146. The method of claim 143 further comprising:
    - converting image data to luminance values.
  - 155. The method of claim 143 wherein the location estimating further comprises:
    - determining, from the first estimate of head and eye location and face rotation angle, an estimated center-of-face position;
      
      determining an average depth value for the face by extracting three-dimensional (3D) points via the disparity values for a selected, small area located around the estimated center-of-face position;
      
      utilizing the average depth value to determine a depth range that is likely to encompass the entire head;
      
      utilizing the estimated center-of-face position, face rotation angle, and depth range to execute a 2D ray march to determine a best-fit rectangle that includes the head;
      
      calculating, for both horizontal and vertical axes, vectors that are perpendicular to each respective axis but spaced at different intervals;
      
      for each of the calculated vectors, testing the corresponding 3D points starting from a position outside the head region and working inwards, to the horizontal or vertical axis;
      
      when a 3D point is encountered that falls within the determined depth range, denominating that point as a valid extent of a best-fit head rectangle;
      
      from each ray march along each axis, determining a best-fit rectangle for the lead, and extracting therefrom all 3D points that lie within the best-fit rectangle, and calculating therefrom a weighted average; and
      
      if the number of valid 3D points extracted from the best-fit rectangle exceed a selected threshold in relation to the maximum number of possible 3D points in the region, then signaling a valid 3D head position result.

147. A video capture and processing method comprising:
- capturing images of a scene, the capturing comprising utilizing at least three cameras having a view of the scene, the cameras being arranged in a substantially “
  
  L”
  
  -shaped configuration wherein a first pair of cameras is disposed along a first axis and second pair of cameras is disposed along a second axis intersecting with, but angularly displaced from, the first axis, wherein the first and second pairs of cameras share a common camera at or near the intersection of the first and second axis, so that the first and second pairs of cameras represent respective first and second independent stereo axes that share a common camera;
  
  executing a feature correspondence function by detecting common features between corresponding images captured by the at least three cameras and measuring a relative distance in image space between the common features, to generate disparity values;
  
  generating a data representation, representative of the captured images and the corresponding disparity values; and
  
  further comprising;
  
  utilizing an unrectified, undistorted (URUD) image space to integrate disparity data for pixels between the first and second stereo axes, thereby to combine disparity data from the first and second axes, wherein the URUD space is an image space in which polynomial lens distortion has been removed from the image data but the captured image remains unrectified.
- View Dependent Claims (148, 149, 150, 151)
- - 148. The method of claim 147 further comprising:
    - executing a stereo correspondence operation on the image data in a rectified, undistorted (RUD) image space, and storing resultant disparity data in RUD space coordinate system.
  - 149. The method of claim 148 in which the resultant disparity data is stored in a URUD space coordinate system.
  - 150. The method of claim 148 further comprising:
    - generating disparity histograms from the disparity data in either RUD or URUD space, and storing the disparity histograms in a unified URUD space coordinate system.
  - 151. The method of claim 150 further comprising:
    - applying a URUD to RUD coordinate transformation to obtain per-axis disparity values.

152. A video capture and processing method comprising:
- capturing images of a scene, the capturing comprising utilizing at least one camera having a view of the scene;
  
  executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; and
  
  generating a data representation, representative of the captured images and the corresponding disparity values;
  
  wherein the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence, the disparity histogram-based method comprising;
  
  constructing a disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel; and
  
  optimizing generation of disparity values on a GPU computing structure, the op sing comprising;
  
  generating, in the GPU computing structure, a plurality of output pixel threads;
  
  for each output pixel thread, maintaining a private disparity histogram, in a storage element associated with the CPU computing structure and physically proximate to the computation units of the CPU computing structure.
- View Dependent Claims (153, 154)
- - 153. The method of claim 152 wherein the private disparity histogram is stored such that each pixel thread writes to and reads from the corresponding private disparity histogram on a dedicated portion of shared local memory in the GPU
  - 154. The method of claim 153 wherein:
    - shared local memory in the GPU is organized at least in part into memory words;
      
      the private disparity histogram is characterized by a series of histogram bins indicating the number of votes for a given disparity range; and
      
      if a maximum possible number of votes iii the private disparity histogram is known, multiple histogram bins can be packed into a single word of the shared local memory, and accessed using bitwise GPU access operations.

156. A program product for use with a digital processing system, for enabling image capture and processing, the digital processing system comprising at least first and second cameras having a view of a scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing. resource to:
- capture images of the scene, utilizing the at least first and second cameras; and
  
  execute a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises;
  
  constructing a multi-level disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel, the constructing of a multi-level disparity histogram comprising;
  
  executing a Fast Dense Disparity Estimate (FDDE) image pattern matching operation on successively lower-frequency downsampled versions of the input stereo images, the successively lower-frequency downsampled versions constituting a set of levels of FDDE histogram votes.
- View Dependent Claims (157)
- - 157. The program product of claim 156 wherein the digital processing system comprises at least two stereo camera pairs, each pair being arranged along a respective camera pair axis, and wherein the digital processor-executable program instructions further comprise instructions which when executed in the digital processing resource cause the digital processing resource to execute, for each camera pair axis, the following:
    - execute image capture utilizing the camera pair to generate image data;
      
      execute rectification and undistorting transformations to transform the image data into RUD image space;
      
      iteratively downsample to produce multiple, successively lower resolution levels;
      
      execute FDDE calculations for each level to compile FDDE votes for each level;
      
      gather FDDE disparity range votes into a multi-level histogram;
      
      determine the highest ranked disparity range in the multi-level histogram; and
      
      process the multi-level histogram disparity data to generate a final disparity result.

158. A program product for use with a digital processing system, the digital processing system comprising at least first and second cameras haying a view of a scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource tocapture images of the scene, utilizing the at least first and second cameras:
- andexecute a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises;
  
  generating a disparity solution based on the disparity values; and
  
  applying an injective constraint to the disparity solution based on domain and co-domain, wherein the domain comprises pixels for a given image captured by the first camera and the co-domain comprises pixels for a corresponding image captured by the second camera, to enable correction of error in the disparity solution in response to violation of the injective constraint, wherein the injective constraint is that no element in the co-domain is referenced more than once by elements in the domain.
- View Dependent Claims (159)
- - 159. The program product of claim 158 wherein the digital processor-executable program instructions further comprise instructions which when executed in the digital processing resource cause the digital procession resource to:
    - maintain a reference count for each pixel in the co-domain, andcheck whether the reference count for the pixels in the co-domain exceeds “
      
      1”
      
      , and if the count exceeds “
      
      1”
      
      then designate a violation and responding to the violation with a selected error correction approach.

160. A program product for use with a digital processing system, for enabling a first user to view a second user with direct virtual eye contact with the second user, the digital processing system comprising at least one camera having a view of the second user'"'"'s face, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
- capture images of the second user, utilizing the at least one camera;
  
  execute a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values;
  
  generate a data representation, representative of the captured images and the corresponding disparity values;
  
  estimate a three-dimensional (3D) location of the first user'"'"'s head, face or eyes, thereby generating tracking information; and
  
  reconstruct a synthetic view of the second user, based on the representation, to enable a display to the first user of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking information; and
  
  wherein the 3D location estimating comprises;
  
  passing a captured image of the first user, the captured image including the first user'"'"'s head and face, to a two-dimensional (2D) facial feature detector that utilizes the image to generate a first estimate of head and eye location and a rotation angle of the face relative to an image plane;
  
  utilizing an estimated center-of-face position, face rotation angle, and head depth range based on the first estimate, to determine a best-fit rectangle that includes the head;
  
  extracting from the best-fit rectangle all 3D points that lie within the best-fit rectangle, and calculating therefrom a representative 3D head position; and
  
  if the number of valid 3D points extracted from the best-fit rectangle exceeds a selected threshold in relation to the maximum number of possible 3D points in the region, then signaling a valid 3D head position result.

161. A program product for use with a digital processing system, for enabling capture and processing of images of a scene, the digital processing system comprising (i) at least three cameras having a view of the scene, the cameras being arranged in a substantially “
- L”
  
  -shaped configuration wherein a first pair of cameras is disposed along a first axis and second pair of cameras is disposed along a second axis intersecting with, but angularly displaced from, the first axis, wherein the first and second pairs of cameras share a common camera at or near the intersection of the first and second axis, so that the first and second pairs of cameras represent respective first and second independent, stereo axes that share a common camera, and (ii) a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to;
  
  capture images of the scene, utilizing the at least three cameras;
  
  execute a feature correspondence function by detecting common features between corresponding images captured by the at least three cameras and measuring a relative distance in image space between the common features, to generate disparity values;
  
  generate a data representation, representative of the captured images and the corresponding disparity values; and
  
  utilize an unrectified, undistorted (URUD) image space to integrate disparity data for pixels between the first and second stereo axes, thereby to combine disparity data from the first and second axes, wherein the URUD space is an image space in which polynomial lens distortion has been removed from the image data but the captured image remains unrectified.
- View Dependent Claims (162)
- - 162. The program product of claim 161 wherein the digital processor-executable program instructions further comprise instructions which when executed in the digital processing resource cause the digital processing resource to execute a stereo correspondence operation on the image data in a rectified, undistorted (RUD) image space, and store resultant disparity data in a RUD space coordinate system.

163. A program product for use with a digital processing system, for enabling image capture and. processing, the digital processing system comprising at least one camera having a view of a scene, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
- capture images of the scene, utilizing the at least one camera;
  
  execute a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; and
  
  generate a data representation, representative of the captured images and the corresponding disparity values;
  
  wherein the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence, the disparity histogram-based method comprising;
  
  constructing a disparity histogram indicating the relative probability of a given disparity value being correct fora given pixel; and
  
  optimizing generation of disparity values on a GPU computing structure, the optimizing comprising;
  
  generating, in the GPU computing structure, a plurality of output pixel threads;
  
  for each output pixel thread, maintaining a private disparity histogram, in a storage element associated with the GPU computing structure and physically proximate to the computation units of the GPU computing structure.

164. A video capture and processing system, the system comprising:
- at least first and second cameras having a view of a scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis; and
  
  a digital processor operable to receive image data from the cameras and process the received image data;
  
  the system being operable to;
  
  capture images of the scene, utilizing the at least first and second cameras; and
  
  execute, utilizing the processor, a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises;
  
  constructing, utilizing the processor, a multi-level disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel, the constructing of a multi-level disparity histogram comprising;
  
  executing, utilizing the processor, a Fast Dense Disparity Estimate (FDDE) image pattern matching operation on successively lower-frequency downsampled versions of the input stereo images, the successively lower-frequency downsampled versions constituting a set of levels of FDDE histogram votes.

165. A video capture and processing system, the system comprising:
- at least first and second cameras having a view of a scene, the cameras being arranged along an axis to configure a stereo camera pair, anda digital processor operable to receive image data from the cameras and process the received image data;
  
  the system being operable to;
  
  capture images of the scene, utilizing the at least first and second cameras;
  
  execute, utilizing the processor, a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, the feature correspondence function further comprising;
  
  generating, utilizing the processor, a disparity solution based on the disparity values;
  
  applying, utilizing the processor, an injective constraint to the disparity solution based on domain and co-domain, wherein the domain composes pixels for a given image captured by the first camera and the co-domain comprises pixels for a corresponding image captured by the second camera, to enable correction of error in the disparity solution in response to violation of the injective constraint, wherein the injective constraint is that no element in the co-domain is referenced more than once b elements in the domain.

166. A video capture system that enables a first user to view a second user with direct virtual eye contact with the second user, the system comprising:
- at least one camera having a view of the second user'"'"'s face; and
  
  a digital processor operable to receive image data from the at least one camera and process the received image data;
  
  the system being operable to;
  
  capture images of the second user, utilizing the at least one camera;
  
  execute utilizing the processor, a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common ;
  
  features, to generate disparity values;
  
  generate, utilizing the processor, a data representation, representative of the captured images and tie corresponding disparity values;
  
  estimate, utilizing the processor, a three-dimensional (3D) location of the first user'"'"'s head, face or eyes, thereby generating tracking information; and
  
  reconstruct, utilizing the processor, a synthetic view of the second user, based on the, representation, to enable a display to the first user of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking information; and
  
  wherein the location estimating comprises;
  
  passing a captured image of the first user, the captured image including the first user'"'"'s head and face, to a two-dimensional (2D ) facial feature detector that utilizes the image to generate a first estimate of head and eye location and a rotation angle of the face relative to an image plane;
  
  utilizing an estimated center-of-face position, face rotation angle, and head depth range based on the first;
  
  estimate, to determine a best-fit rectangle that includes the head;
  
  extracting from the best-fit rectangle all 3D points that lie within the best-fit rectangle, and calculating therefrom a representative 3D head position; and
  
  if the number of valid 3D points extracted from the best-fit rectangle exceeds a selected threshold in relation to the maximum number of possible 3D points in the region, then signaling a valid 3D head position result.

167. A video capture and processing system, the system comprising:
- at least three cameras having a view of a scene, the cameras being arranged in a substantially “
  
  L”
  
  -shaped configuration wherein a first pair of cameras is disposed along a first axis and second pair of cameras is disposed along a second axis intersecting with, but angularly displaced from, the first axis, wherein the first and second pairs of cameras share a common camera at or near the intersection of the first and second axis, so that the first and second pairs of cameras represent respective first and second. independent stereo axes that share a common camera; and
  
  a digital processor operable to receive image data from the at least three cameras and process the received image data;
  
  the system being operable to;
  
  capture images of the scene, utilizing the at least three cameras;
  
  execute, utilizing the processor, a feature correspondence function by detecting common features between corresponding imams captured by the at least three cameras and measuring a relative distance in image space between the common features, to generate disparity values;
  
  generate, utilizing the processor, a data representation, representative of the captured images and the corresponding disparity values; and
  
  further comprising;
  
  utilization, by the processor, of an unrectified, undistorted (URUD) image space to integrate disparity data for pixels between the first and second stereo axes, thereby to combine disparity data from the first and second axes, wherein the URUD space is an image space in which polynomial lens distortion has been removed from the image data but the captured image remains unrectified.

168. A video capture and processing method system, the system comprising:
- at least one camera having a view of the scene; and
  
  a digital processor operable to receive image data from the at least one camera and process the received image data;
  
  the system being operable to;
  
  capture images of the scene, utilizing the at least one camera;
  
  execute, utilizing the processor, a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; and
  
  generate, utilizing the processor, a data representation, representative of the captured images and the corresponding disparity values;
  
  wherein the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence, the disparity histogram-based method comprising;
  
  constructing, utilizing the processor, a disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel; and
  
  optimizing generation of disparity values on a GPU computing structure, the optimizing comprising;
  
  generating, in the GPU computing structure, a plurality of output pixel threads;
  
  for each output pixel thread, maintaining a private disparity histogram, in a storage element associated with the CPU computing structure and physically proximate to the computation units of the GPU computing structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Mine One GmbH
Original Assignee
Mine One GmbH
Inventors
McCombe, James A., Herken, Rofl, Smith, Brian W.

Granted Patent

US 10,551,913 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/012   Head tracking input arrange...

G06F 3/013   Eye tracking input arrangem...

G06F 3/04815   Interaction with a metaphor...

G06T 11/00   2D [Two Dimensional] image ...

G06T 2207/10048   Infrared image

G06T 7/50   Depth or shape recovery

H04N 13/117   the virtual viewpoint locat...

H04N 13/239   using two 2D image sensors ...

H04N 13/243   using three or more 2D imag...

H04N 13/254   in combination with electro...

H04N 13/271   wherein the generated image...

H04N 13/344   with head-mounted left-righ...

H04N 2013/0081   Depth or disparity estimati...

H04N 23/80   Camera processing pipelines...

H04N 5/33   Transforming infrared radia...

H04N 7/147   Communication arrangements,...

H04N 7/18   Closed-circuit television [...

VIRTUAL 3D METHODS, SYSTEMS AND SOFTWARE

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

89 Citations

168 Claims

Specification

Solutions

Use Cases

Quick Links

VIRTUAL 3D METHODS, SYSTEMS AND SOFTWARE

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

89 Citations

168 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links