VIRTUAL 3D METHODS, SYSTEMS AND SOFTWARE
First Claim
Patent Images
1. A video communication method that enables a first user to view a second user with direct virtual eye contact with the second user, the method comprising:
- capturing images of the second user, the capturing comprising utilizing at least one camera having a view of the second user'"'"'s face;
generating a data representation, representative of the captured images;
reconstructing a synthetic view of the second user, based on the representation; and
displaying the synthetic view to the first user on a display screen used by the first user;
the capturing, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user'"'"'s display screen, by the reconstructing and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, even if camera has a direct eye contact gaze vector to the second user.
4 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems and computer program products (“software”) enable a virtual three-dimensional visual experience (referred to herein as “V3D”) videoconferencing and other applications, and capturing, processing and displaying of images and image streams.
89 Citations
168 Claims
-
1. A video communication method that enables a first user to view a second user with direct virtual eye contact with the second user, the method comprising:
-
capturing images of the second user, the capturing comprising utilizing at least one camera having a view of the second user'"'"'s face; generating a data representation, representative of the captured images; reconstructing a synthetic view of the second user, based on the representation; and displaying the synthetic view to the first user on a display screen used by the first user; the capturing, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user'"'"'s display screen, by the reconstructing and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, even if camera has a direct eye contact gaze vector to the second user. - View Dependent Claims (2, 3, 4, 5, 6, 26, 27, 30, 31, 34, 75, 76, 79, 89, 90, 91, 93, 91, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119)
-
-
7. A video communication method that enables a user to view a remote scene in a manner that gives the user a visual impression of being present with respect to the remote scene, the method comprising:
-
capturing images of the remote scene, the capturing comprising utilizing at least two cameras each having a view of the remote scene; executing a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in mage space between the common features, to generate disparity values; generating a data representation, representative of the captured images and the corresponding disparity values, reconstructing a synthetic view of the remote scene, based on the representation; and displaying the synthetic view to the first user on a display screen used by the first user; the capturing, detecting, generating, reconstructing and displaying being executed such that; (a) the user is provided the visual impression of looking through his display screen as a physical window to the remote scene, and (b) the user is provided an immersive visual experience of the remote scene.
-
-
8. method of facilitating self-portraiture of a user utilizing a handheld device to take the self-portrait, the handheld mobile device having a display screen for displaying images to the user, the method comprising:
-
providing at least one camera around the periphery of the display screen, the at least one camera haying a view of the user'"'"'s face at a self portrait, setup time during which the user is setting up the self-portrait; capturing images of the user during the setup time, utilizing the at least one camera around the periphery of the display screen, estimating a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information; generating a data representation, representative of the captured images; reconstructing a synthetic view of the user, based on the generated data representation and the generated tracking information; displaying to the user, on the display screen during the setup time, the synthetic view of the user; thereby enabling the user, while setting up the self-portrait, to selectively orient or position his gaze or head, or the handheld device and its camera, with realtime visual feedback. - View Dependent Claims (9)
-
-
10. A method of facilitating composition of a photograph of a scene, by a user utilizing a handheld device to take the photograph, the handheld device having a display screen on a first side for displaying images to the user, and at least one camera on a second, opposite side of the handheld device, for capturing images, the method comprising:
-
capturing images of the scene, utilizing the at least one camera, at a photograph setup time during which the user is setting up the photograph; estimating a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information; generating a data representation, representative of the captured images; reconstructing a synthetic view of the scene, based on the generated data representation and the generated tracking information, the synthetic view being reconstructed such that the scale and perspective of the synthetic view has a selected correspondence to the user'"'"'s viewpoint relative to the handheld device and the scene; and displaying to the user, on the display screen during the setup time, the synthetic view of the scene; thereby enabling the user, while setting up the photograph, to frame the scene to be photographed, with selected scale and perspective within the display frame, with realtime visual feedback. - View Dependent Claims (11, 12, 23, 24, 25, 28, 29, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 77, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88)
-
-
13. A method of displaying images to a user utilizing a binocular stereo head-mounted display (HMD), the method comprising:
-
capturing at least two image streams using at least one camera attached or mounted on or proximate to an external portion or surface of the HIM, the captured image streams containing images of a scene; generating a data representation, representative of captured images contained in the captured image streams; reconstructing two synthetic views, based on the representation; and displaying the synthetic views to the user, via the HMD; the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes, so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
14. A method of capturing and displaying image content on a binocular stereo head-mounted display (HMD), the method comprising:
-
capturing at least two image streams using at least one camera, the captured image streams containing images of a scene; generating a data representation, representative of captured images contained in the captured image streams; reconstructing two synthetic views, based on the representation; and displaying the synthetic views to a user via the HMD; the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes, so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.
-
-
20. A method of generating an image data stream for use by a control system of an autonomous vehicle, the method comprising:
-
capturing images of a scene around at least a portion of the vehicle, the capturing comprising utilizing at least one camera having a view of the scene; executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; calculating corresponding depth information based on the disparity values; and generating from the images and corresponding depth information an image data stream for use by the control system. - View Dependent Claims (21, 22)
-
-
92. A program product for use with a digital processing system, for enabling a first user to view a second user with direct virtual eye contact with the second user, the digital processing system comprising at least one camera having a view of the second user'"'"'s face, a display screen for use by the first user, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
-
capture images of the second user, utilizing the at least one camera; generate a data representation, representative of the captured images; reconstruct a synthetic view of the second user, based on the representation; and display the synthetic view to the first user on the display screen for use by the first user; the capturing, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user'"'"'s display screen, by the reconstructing and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, even if no camera has a direct eye contact gaze vector to the second user.
-
-
94. A program product for use with a digital processing system, for enabling a first user to view a remote scene with the visual impression of being present with respect to the remote scene, the digital processing system comprising at least two cameras, each having a view of the remote scene, a display screen for use by the first user, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
-
capture images of the remote scene, utilizing the at least two cameras; execute a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values; generate a data representation, representative of the captured images and the corresponding disparity values; reconstruct a synthetic view of the remote scene, based on the representation; and display the synthetic view to the first user on the display screen; the capturing, detecting, generating, reconstructing and displaying being executed such that; (a) the first user is provided the visual impression of looking through his display screen as a physical window to the remote scene, and (b) the first user is provided an immersive visual experience of the remote scene.
-
-
96. A program product for use with a handheld digital processing device, for facilitating self-portraiture of a user utilizing the handheld device to take the self portrait, the handheld device having a digital processor, a display screen for displaying images to the user, and at least one camera around the periphery of the display screen, the at least one camera having a view of the user'"'"'s face at a self portrait setup time during which the user is setting up the self portrait, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processor cause the digital processor to:
-
capture images of the user during the setup time, utilizing the at least one camera around the periphery of the display screen; estimate a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information; generate a data representation, representative of the captured images; reconstruct a synthetic view of the user, based on the generated data representation and the generated tracking information; and
.display to the user, on the display screen during the setup time, the synthetic view of the user; thereby enabling the user, while setting up the self-portrait, to selectively orient or position his gaze or head, or the handheld device and its camera, with realtime visual feedback.
-
-
98. A program product for use with a handheld digital processing device, for facilitating composition of a photograph of a scene by a user utilizing the handheld device to take the photograph, the handheld device having a digital processor, a display screen on a first side for displaying images to the user, and at least one camera on a second, opposite side of the. handheld device, for capturing imams, the program product comprising digital processor executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processor cause the digital processor to:
-
capture images of the scene, utilizing the at least one camera, at a photograph setup time clawing which the user is setting up the photograph; estimate a location of the user'"'"'s head or eyes relative to the handheld device during the setup thereby generating tracking information; generate a data representation, representative of the captured images; reconstruct a synthetic view of the scene, based on the generated data representation and the generated tracking information, the synthetic view being reconstructed such that the scale and perspective of the synthetic view has a selected correspondence to the user'"'"'s viewpoint relative to the handheld device and the scene; and display to the user, on the display screen during the setup time, the synthetic view of the scene; thereby enabling the user, while setting up the photograph, to frame the scene to be photographed, with selected scale and perspective within the display frame, with realtime visual feedback.
-
-
100. A program product for enabling display of images to a user utilizing a binocular stereo head-mounted display (HMD), the HMD having at least one camera attached or mounted on or proximate to an external portion or surface of the HMD, the HMD having, or being in communication with, a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
-
capture at least two image streams using the at least one camera the captured image streams containing images of a scene, generate a data representation, representative of captured images contained in the captured image streams; reconstruct two synthetic views, based on the representation; and display the synthetic views to the user, via the HMD; the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes, so as to provide the user with a substantially natural visual experience of the perspective. binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.
-
-
102. A program product for enabling display of captured image content to a user utilizing a binocular stereo head-mounted display (HMD), the captured image content comprising at least two image streams captured or generated by at least one camera, the captured image streams containing images of a scene, and the HMD having, or being in communication with, a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
-
generate a data representation, representative of captured images contained in the captured image streams; reconstruct two synthetic views, based on the representation; and display the synthetic views to a user, via the HMD; the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations. of the user'"'"'s left and right eves, so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.
-
-
104. A program product for enabling the generation of an image data stream for use by a control system of an autonomous vehicle, the vehicle having at least one camera with a view of a scene around at least a portion of the vehicle and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
-
capture images of the scene around at least a portion of the vehicle, using the at least one camera; execute a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; calculate corresponding depth information based on the disparity values; and generate from the images and corresponding depth information an image data stream for use by the control system.
-
-
106. A digital processing system for enabling a first user to view a second user with direct virtual eye contact with the second user, the digital processing system comprising:
-
at least one camera haying a view of the second user'"'"'s face; a display screen for use by the first user; and a digital processing resource comprising at least one digital processor, the digital processing resource being operable to; capture images of the second user, utilizing the at least one camera; generate a data representation, representative of the captured images; reconstruct a synthetic view of the second user, based on the representation; and display the synthetic view to the first user on the display screen for use by the first user; the capturing, generating, reconstructing and displaying being executed such that the first user can have direct virtual eye contact with the second user through the first user'"'"'s display screen, by the reconstructing and displaying of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, even if no camera has a direct eye contact gaze vector to the second user,
-
-
108. A digital processing system for enabling a first user to view a remote scene with the visual impression of being present with respect to the remote scene, the digital processing system comprising:
-
at least two cameras, each having a view of the remote scene; a display screen for use by the first user; and a digital processing resource comprising at least one digital processor, the digital processing resource being operable to; capture images of the remote scene, utilizing the at least two cameras; execute a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values; generate a data representation, representative of the captured images and the corresponding disparity values; reconstruct a synthetic view of the remote scene, based on the representation; and
display the synthetic view to the first user on the display screen;the capturing, detecting, generating, reconstructing and displaying being executed such that; (a) the first user is provided the visual impression of looking through his display screen as a physical window to the remote scene, and (b) the first user is provided an immersive visual experience of the remote scene.
-
-
110. A system operable in a handheld digital processing device, for facilitating self-portraiture of a user utilizing the handheld device to take the self portrait, the system comprising:
-
a digital processor; a display screen for displaying images to the user; and at least one camera around the periphery of the display screen, the at least one camera having a view of the users face at a self portrait setup time during which the user is setting up the self portrait; the system being operable to; capture images of the user during the setup time, utilizing the at least one camera around the periphery of the display screen; estimate a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information; generate a data representation, representative of the captured images; reconstruct a synthetic view of the user, based on the generated data representation and the generated tracking information; and display to the user, on the display screen during the setup time, the synthetic view of the user; thereby enabling the user while setting up the self-portrait, to selectively orient or position his gaze or head, or the handheld device and its camera, with realtime visual feedback.
-
-
112. A system operable in a handheld digital processing device, for facilitating composition of a photograph of a scene by a user utilizing the handheld device to take the photograph, the system comprising:
-
a digital processor; a display screen on a first side of the handheld device for displaying, images to the user; and at least one camera on a second, opposite side of the handheld device, for capturing images; the system being operable to; capture images of the scene, utilizing the at least one camera, at a photograph setup time during which the user is setting up the photograph; estimate a location of the user'"'"'s head or eyes relative to the handheld device during the setup time, thereby generating tracking information; generate a data representation, representative of the captured images; reconstruct a synthetic view of the scene, based on the generated data representation and the generated tracking information, the synthetic view being reconstructed such that the scale and perspective of the synthetic view has a selected correspondence to the user'"'"'s viewpoint relative to the handheld device and the scene; and display to the user, on the display screen during the setup time, the synthetic view of the scene; thereby enabling the user, while setting up the photograph, to frame the scene to be photographed, with selected scale and perspective within the display frame, with realtime visual feedback.
-
-
114. A system for enabling display of images to a user utilizing a binocular stereo bead-mounted display (HMD), the system comprising:
-
at least one camera attached or mounted on or proximate to an external portion or surface of the HMD; and a digital processing resource comprising at least one digital processor; the system being operable to; capture at least two image streams using the at least one camera, the captured image streams containing images of a scene; generate a data representation, representative of captured images contained in the captured image streams; reconstruct two synthetic views, based on the representation; and display the synthetic views to the user, via the HMD; the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes, so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an MID,
-
-
116. A program product for enabling display of captured image content to a user utilizing a binocular stereo head-mounted display (HMD), the captured image content comprising at least two image streams captured or generated by at least one camera, the captured image streams containing images of a scene, and the MMD having, or being in communication with, a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed. in the digital processing resource cause the digital processing resource to:
-
generate a data representation, representative of captured images contained in the captured image streams; reconstruct two synthetic views, based on the representation; and display the synthetic views to a user, via the HMD; the reconstructing and displaying being executed such that each of the synthetic views has a respective view origin corresponding to a respective virtual camera location, wherein the respective view origins are positioned such that the respective virtual camera locations coincide with respective locations of the user'"'"'s left and right eyes, so as to provide the user with a substantially natural visual experience of the perspective, binocular stereo and occlusion aspects of the scene, substantially as if the user were directly viewing the scene without an HMD.
-
-
118. An image processing system for enabling the generation of an image data stream for use by a control system of an autonomous vehicle, the image processing system comprising:
-
at least one camera with a view of a scene around at least a portion of the vehicle; and a digital processing resource comprising at least one digital processor; the system being operable to; capture images of the scene around at least a portion of the vehicle, using the at least one camera; execute a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; calculate corresponding depth information based on the disparity values; and generate from the images and corresponding depth information an image data stream for use by the control system.
-
-
120. A video capture and processing method, comprising:
-
capturing images of a scene, the capturing comprising utilizing at least first and second cameras having a view of the scene, the cameras being arranged along ail axis to configure a stereo camera pair having a camera pair axis; and executing a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises; constructing a multi-level disparity histogram indicating the relative probability of a given. disparity value being correct for a given pixel, the constructing of a multi-level disparity histogram. comprising; executing a Fast Dense Disparity Estimate (FDDE) image pattern matching operation on successively lower-frequency downsampled versions of the input stereo images, the successively lower-frequency downsampled versions constituting a set of levels of FDDE histogram votes. - View Dependent Claims (121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 133)
-
-
136. A video capture and processing method, comprising:
-
capturing images of a scene, the capturing comprising utilizing at least first and second cameras having a view of the scene, the cameras being arranged along an axis to configure a stereo camera pair, executing a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, the feature correspondence function further comprising; generating a disparity solution based on the disparity values; applying an injective constraint to the disparity solution based on domain and co-domain, wherein the domain comprises pixels for a given image captured by the first camera and the co-domain comprises pixels for a corresponding image captured by the second camera, to enable correction of error in the disparity solution in response to violation of the infective constraint, wherein the injective constraint is that no element in the co-domain is referenced more than once by elements in the domain. - View Dependent Claims (137, 138, 139, 140, 141, 142)
-
-
143. A video capture method that enables a first user to view a second user with direct virtual eye contact with the second user, the method comprising:
-
capturing images of the second user, the capturing comprising utilizing at least one camera having a view of the second user'"'"'s face; executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; generating a data representation, representative of the captured images and the corresponding disparity values; estimating a three-dimensional (3D) location of the first user'"'"'s head, face or eyes, thereby venerating tracking information; and reconstructing a synthetic view of the second user, based on the representation, to enable a display to the first user of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking information; and
wherein the location estimating comprises;passing a captured image of the first user, the captured image including the first user'"'"'s head and thee, to a two-dimensional (2D) facial feature detector that utilizes the image to generate a first estimate of head and eye location and a rotation angle of the thee relative to an image plane; utilizing an estimated center-of-face position, face rotation angle, and head depth range based on the first estimate, to determine a best-fit rectangle that includes the head; extracting from the best-fit rectangle all 3D points that lie within the best-fit rectangle, and calculating therefrom a representative 3D head position; and if the number of valid 3D points extracted from the best-fit rectangle exceeds a selected threshold in relation to the maximum number of possible 3D points in the region, then signaling a valid 3D head position result. - View Dependent Claims (144, 145, 146, 155)
-
-
147. A video capture and processing method comprising:
-
capturing images of a scene, the capturing comprising utilizing at least three cameras having a view of the scene, the cameras being arranged in a substantially “
L”
-shaped configuration wherein a first pair of cameras is disposed along a first axis and second pair of cameras is disposed along a second axis intersecting with, but angularly displaced from, the first axis, wherein the first and second pairs of cameras share a common camera at or near the intersection of the first and second axis, so that the first and second pairs of cameras represent respective first and second independent stereo axes that share a common camera;executing a feature correspondence function by detecting common features between corresponding images captured by the at least three cameras and measuring a relative distance in image space between the common features, to generate disparity values; generating a data representation, representative of the captured images and the corresponding disparity values; and
further comprising;utilizing an unrectified, undistorted (URUD) image space to integrate disparity data for pixels between the first and second stereo axes, thereby to combine disparity data from the first and second axes, wherein the URUD space is an image space in which polynomial lens distortion has been removed from the image data but the captured image remains unrectified. - View Dependent Claims (148, 149, 150, 151)
-
-
152. A video capture and processing method comprising:
-
capturing images of a scene, the capturing comprising utilizing at least one camera having a view of the scene; executing a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; and generating a data representation, representative of the captured images and the corresponding disparity values; wherein the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence, the disparity histogram-based method comprising; constructing a disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel; and optimizing generation of disparity values on a GPU computing structure, the op sing comprising; generating, in the GPU computing structure, a plurality of output pixel threads; for each output pixel thread, maintaining a private disparity histogram, in a storage element associated with the CPU computing structure and physically proximate to the computation units of the CPU computing structure. - View Dependent Claims (153, 154)
-
-
156. A program product for use with a digital processing system, for enabling image capture and processing, the digital processing system comprising at least first and second cameras having a view of a scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing. resource to:
-
capture images of the scene, utilizing the at least first and second cameras; and execute a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises; constructing a multi-level disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel, the constructing of a multi-level disparity histogram comprising; executing a Fast Dense Disparity Estimate (FDDE) image pattern matching operation on successively lower-frequency downsampled versions of the input stereo images, the successively lower-frequency downsampled versions constituting a set of levels of FDDE histogram votes. - View Dependent Claims (157)
-
-
158. A program product for use with a digital processing system, the digital processing system comprising at least first and second cameras haying a view of a scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to
capture images of the scene, utilizing the at least first and second cameras: - and
execute a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises; generating a disparity solution based on the disparity values; and applying an injective constraint to the disparity solution based on domain and co-domain, wherein the domain comprises pixels for a given image captured by the first camera and the co-domain comprises pixels for a corresponding image captured by the second camera, to enable correction of error in the disparity solution in response to violation of the injective constraint, wherein the injective constraint is that no element in the co-domain is referenced more than once by elements in the domain. - View Dependent Claims (159)
- and
-
160. A program product for use with a digital processing system, for enabling a first user to view a second user with direct virtual eye contact with the second user, the digital processing system comprising at least one camera having a view of the second user'"'"'s face, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
-
capture images of the second user, utilizing the at least one camera; execute a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; generate a data representation, representative of the captured images and the corresponding disparity values; estimate a three-dimensional (3D) location of the first user'"'"'s head, face or eyes, thereby generating tracking information; and reconstruct a synthetic view of the second user, based on the representation, to enable a display to the first user of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking information; and
wherein the 3D location estimating comprises;passing a captured image of the first user, the captured image including the first user'"'"'s head and face, to a two-dimensional (2D) facial feature detector that utilizes the image to generate a first estimate of head and eye location and a rotation angle of the face relative to an image plane; utilizing an estimated center-of-face position, face rotation angle, and head depth range based on the first estimate, to determine a best-fit rectangle that includes the head; extracting from the best-fit rectangle all 3D points that lie within the best-fit rectangle, and calculating therefrom a representative 3D head position; and if the number of valid 3D points extracted from the best-fit rectangle exceeds a selected threshold in relation to the maximum number of possible 3D points in the region, then signaling a valid 3D head position result.
-
-
161. A program product for use with a digital processing system, for enabling capture and processing of images of a scene, the digital processing system comprising (i) at least three cameras having a view of the scene, the cameras being arranged in a substantially “
- L”
-shaped configuration wherein a first pair of cameras is disposed along a first axis and second pair of cameras is disposed along a second axis intersecting with, but angularly displaced from, the first axis, wherein the first and second pairs of cameras share a common camera at or near the intersection of the first and second axis, so that the first and second pairs of cameras represent respective first and second independent, stereo axes that share a common camera, and (ii) a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to;capture images of the scene, utilizing the at least three cameras; execute a feature correspondence function by detecting common features between corresponding images captured by the at least three cameras and measuring a relative distance in image space between the common features, to generate disparity values; generate a data representation, representative of the captured images and the corresponding disparity values; and
utilize an unrectified, undistorted (URUD) image space to integrate disparity data for pixels between the first and second stereo axes, thereby to combine disparity data from the first and second axes, wherein the URUD space is an image space in which polynomial lens distortion has been removed from the image data but the captured image remains unrectified. - View Dependent Claims (162)
- L”
-
163. A program product for use with a digital processing system, for enabling image capture and. processing, the digital processing system comprising at least one camera having a view of a scene, and a digital processing resource comprising at least one digital processor, the program product comprising digital processor-executable program instructions stored on a non-transitory digital processor-readable medium, which when executed in the digital processing resource cause the digital processing resource to:
-
capture images of the scene, utilizing the at least one camera; execute a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; and generate a data representation, representative of the captured images and the corresponding disparity values; wherein the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence, the disparity histogram-based method comprising; constructing a disparity histogram indicating the relative probability of a given disparity value being correct fora given pixel; and optimizing generation of disparity values on a GPU computing structure, the optimizing comprising; generating, in the GPU computing structure, a plurality of output pixel threads; for each output pixel thread, maintaining a private disparity histogram, in a storage element associated with the GPU computing structure and physically proximate to the computation units of the GPU computing structure.
-
-
164. A video capture and processing system, the system comprising:
-
at least first and second cameras having a view of a scene, the cameras being arranged along an axis to configure a stereo camera pair having a camera pair axis; and a digital processor operable to receive image data from the cameras and process the received image data; the system being operable to; capture images of the scene, utilizing the at least first and second cameras; and execute, utilizing the processor, a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, wherein the feature correspondence function comprises; constructing, utilizing the processor, a multi-level disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel, the constructing of a multi-level disparity histogram comprising; executing, utilizing the processor, a Fast Dense Disparity Estimate (FDDE) image pattern matching operation on successively lower-frequency downsampled versions of the input stereo images, the successively lower-frequency downsampled versions constituting a set of levels of FDDE histogram votes.
-
-
165. A video capture and processing system, the system comprising:
-
at least first and second cameras having a view of a scene, the cameras being arranged along an axis to configure a stereo camera pair, and a digital processor operable to receive image data from the cameras and process the received image data; the system being operable to; capture images of the scene, utilizing the at least first and second cameras; execute, utilizing the processor, a feature correspondence function by detecting common features between corresponding images captured by the respective cameras and measuring a relative distance in image space between the common features, to generate disparity values, the feature correspondence function further comprising; generating, utilizing the processor, a disparity solution based on the disparity values; applying, utilizing the processor, an injective constraint to the disparity solution based on domain and co-domain, wherein the domain composes pixels for a given image captured by the first camera and the co-domain comprises pixels for a corresponding image captured by the second camera, to enable correction of error in the disparity solution in response to violation of the injective constraint, wherein the injective constraint is that no element in the co-domain is referenced more than once b elements in the domain.
-
-
166. A video capture system that enables a first user to view a second user with direct virtual eye contact with the second user, the system comprising:
-
at least one camera having a view of the second user'"'"'s face; and a digital processor operable to receive image data from the at least one camera and process the received image data; the system being operable to; capture images of the second user, utilizing the at least one camera; execute utilizing the processor, a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common ;
features, to generate disparity values;generate, utilizing the processor, a data representation, representative of the captured images and tie corresponding disparity values; estimate, utilizing the processor, a three-dimensional (3D) location of the first user'"'"'s head, face or eyes, thereby generating tracking information; and reconstruct, utilizing the processor, a synthetic view of the second user, based on the, representation, to enable a display to the first user of a synthetic view of the second user in which the second user appears to be gazing directly at the first user, wherein the reconstructing of a synthetic view of the second user comprises reconstructing the synthetic view based on the generated data representation and the generated tracking information; and
wherein the location estimating comprises;passing a captured image of the first user, the captured image including the first user'"'"'s head and face, to a two-dimensional (2D ) facial feature detector that utilizes the image to generate a first estimate of head and eye location and a rotation angle of the face relative to an image plane; utilizing an estimated center-of-face position, face rotation angle, and head depth range based on the first;
estimate, to determine a best-fit rectangle that includes the head;extracting from the best-fit rectangle all 3D points that lie within the best-fit rectangle, and calculating therefrom a representative 3D head position; and if the number of valid 3D points extracted from the best-fit rectangle exceeds a selected threshold in relation to the maximum number of possible 3D points in the region, then signaling a valid 3D head position result.
-
-
167. A video capture and processing system, the system comprising:
-
at least three cameras having a view of a scene, the cameras being arranged in a substantially “
L”
-shaped configuration wherein a first pair of cameras is disposed along a first axis and second pair of cameras is disposed along a second axis intersecting with, but angularly displaced from, the first axis, wherein the first and second pairs of cameras share a common camera at or near the intersection of the first and second axis, so that the first and second pairs of cameras represent respective first and second. independent stereo axes that share a common camera; anda digital processor operable to receive image data from the at least three cameras and process the received image data; the system being operable to; capture images of the scene, utilizing the at least three cameras; execute, utilizing the processor, a feature correspondence function by detecting common features between corresponding imams captured by the at least three cameras and measuring a relative distance in image space between the common features, to generate disparity values; generate, utilizing the processor, a data representation, representative of the captured images and the corresponding disparity values; and
further comprising;utilization, by the processor, of an unrectified, undistorted (URUD) image space to integrate disparity data for pixels between the first and second stereo axes, thereby to combine disparity data from the first and second axes, wherein the URUD space is an image space in which polynomial lens distortion has been removed from the image data but the captured image remains unrectified.
-
-
168. A video capture and processing method system, the system comprising:
-
at least one camera having a view of the scene; and a digital processor operable to receive image data from the at least one camera and process the received image data; the system being operable to; capture images of the scene, utilizing the at least one camera; execute, utilizing the processor, a feature correspondence function by detecting common features between corresponding images captured by the at least one camera and measuring a relative distance in image space between the common features, to generate disparity values; and generate, utilizing the processor, a data representation, representative of the captured images and the corresponding disparity values; wherein the feature correspondence function utilizes a disparity histogram-based method of integrating data and determining correspondence, the disparity histogram-based method comprising; constructing, utilizing the processor, a disparity histogram indicating the relative probability of a given disparity value being correct for a given pixel; and optimizing generation of disparity values on a GPU computing structure, the optimizing comprising; generating, in the GPU computing structure, a plurality of output pixel threads; for each output pixel thread, maintaining a private disparity histogram, in a storage element associated with the CPU computing structure and physically proximate to the computation units of the GPU computing structure.
-
Specification