Method and apparatus for estimating scene structure and ego-motion from multiple images of a scene using correlation
First Claim
1. A method for estimating both three-dimensional (3D) scene structure and ego-motion from a batch of images of the scene obtained by a camera as it moves through the scene, the method comprising the steps of:
- defining a reference image and a plurality of inspection images in the batch of images;
providing an initial estimate of the ego-motion and the scene structure for the batch of images;
responsive to the initial estimate of ego-motion and scene structure, globally correlating each of the inspection images to the reference image to define a global ego-motion constraint for all of the inspection images relative to the reference image;
refining the initial estimate of ego-motion based on the global ego-motion constraint;
responsive to the initial estimate of ego-motion and scene structure, locally correlating each of the inspection images to the reference image to define a plurality of local structure constraints for all of the inspection images relative to the reference image;
responsive to the plurality of local structure constraints, refining the initial estimate of scene structure in respective regions of the reference image corresponding to the plurality of local structure constraints.
2 Assignments
0 Petitions
Accused Products
Abstract
A system that estimates both the ego-motion of a camera through a scene and the structure of the scene by analyzing a batch of images of the scene obtained by the camera employs a correlation-based, iterative, multi-resolution algorithm. The system defines a global ego-motion constraint to refine estimates of inter-frame camera rotation and translation. It also uses local window-based correlation to refine the current estimate of scene structure. The batch of images is divided into a reference image and a group of inspection images. Each inspection image in the batch of images is aligned to the reference image by a warping transformation. The correlation is determined by analyzing respective Gaussian/Laplacian decompositions of the reference image and warped inspection images. The ego-motion constraint includes both rotation and translation parameters. These parameters are determined by globally correlating surfaces in the respective inspection images to the reference image. Scene structure is determined on a pixel-by-pixel basis by correlating multiple pixels in a support region among all of the images. The correlation surfaces are modeled as quadratic or other parametric surfaces to allow easy recognition and rejection of outliers and to simplify computation of incremental refinements for ego-motion and structure. The system can employ information from other sensors to provide an initial estimate of ego-motion and/or scene structure. The system operates using images captured by either single-camera rigs or multiple-camera rigs.
125 Citations
26 Claims
-
1. A method for estimating both three-dimensional (3D) scene structure and ego-motion from a batch of images of the scene obtained by a camera as it moves through the scene, the method comprising the steps of:
-
defining a reference image and a plurality of inspection images in the batch of images;
providing an initial estimate of the ego-motion and the scene structure for the batch of images;
responsive to the initial estimate of ego-motion and scene structure, globally correlating each of the inspection images to the reference image to define a global ego-motion constraint for all of the inspection images relative to the reference image;
refining the initial estimate of ego-motion based on the global ego-motion constraint;
responsive to the initial estimate of ego-motion and scene structure, locally correlating each of the inspection images to the reference image to define a plurality of local structure constraints for all of the inspection images relative to the reference image;
responsive to the plurality of local structure constraints, refining the initial estimate of scene structure in respective regions of the reference image corresponding to the plurality of local structure constraints. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
forming a wavelet decomposition of the reference image and each of the warped inspection images to provide a plurality of resolution levels for each of the reference image and the warped inspection images; and
correlating each resolution level of each inspection image to a corresponding resolution level of the reference image, using correlation results of lower resolution levels to guide the correlating of higher resolution levels.
-
-
4. A method according to claim 2, wherein the step of defining the local structure constraint includes the steps of:
-
forming a wavelet decomposition of the reference image and each of the warped inspection images to provide a plurality of corresponding resolution levels for each of the reference image and the warped inspection images; and
selecting a point in the reference image;
defining a window of points around the selected point;
correlating, in each resolution level of each inspection image, a window of points corresponding to the defined window of points to a respective window of points in the corresponding resolution level of the reference image, wherein the correlation results of lower resolution levels are used to guide the correlating of higher resolution levels.
-
-
5. A method according to claim 1, wherein the estimate of ego-motion includes estimates of rotation and translation and wherein:
-
the step of globally correlating each of the inspection images to the reference image includes the steps of;
for each inspection image of the plurality of inspection images, determining a correlation surface defining the correlation between the inspection image and the reference image to provide a respective plurality of correlation surfaces fitting each correlation surface of the plurality of correlation surfaces to a respective parametric surface to provide a respective plurality of parameterized correlation surfaces;
classifying each parameterized correlation surface as a good correlation surface or as a bad correlation surface; and
summing the good correlation surfaces to the relative exclusion of the bad correlation surfaces to provide a cumulative correlation surface; and
the step of refining the initial estimate of ego-motion includes the steps of;
assigning fixed values for the estimates of translation and scene structure;
calculating a differential adjustment to the rotation estimate;
assigning fixed values for the estimates of rotation and scene structure; and
calculating a differential adjustment to the translation estimate.
-
-
6. A method according to claim 5, wherein the step of classifying each correlation surface as a good correlation surface or as a bad correlation surface includes the steps of:
-
determining whether each parameterized correlation surface corresponds to an elliptic paraboloid;
designating the parameterized correlation surfaces that correspond to elliptic paraboloids as good correlation surfaces and the quadratic correlation surfaces that do not correspond to elliptic paraboloids as bad correlation surfaces.
-
-
7. A method according to claim 1, wherein the step of providing an initial estimate of the ego-motion and scene structure uses information provided by sensing modalities that are independent of the camera.
-
8. A method according to claim 1, wherein the batch of images is provided by a single camera.
-
9. A method according to claim 1, wherein the batch of images are stereo images provided by two cameras having a fixed separation.
-
10. A method according to claim 1, wherein the step of providing an initial estimate of ego-motion and scene structure obtains the initial estimate of scene structure by preparing a depth map using the reference image and the inspection image that is the stereo image corresponding to the reference image.
-
11. Apparatus for estimating both three-dimensional (3D) scene structure and ego-motion from a batch of images of the scene comprising:
-
at least one camera which obtains a batch of images including a reference image and a plurality of inspection images as the at least one camera moves through the scene;
means for providing an initial estimate of the ego-motion and the scene structure for the batch of images;
a correlation processor, responsive to the initial estimate of ego-motion and scene structure,
1) to globally correlate each of the inspection images to the reference image, the correlation processor defining a global ego-motion constraint for all of the inspection images relative to the reference image and
2) to locally correlate each of the inspection images to the reference image to define a plurality of local structure constraints for all of the inspection images relative to the reference image;
a processor, coupled to the correlation processor to define a differential ego-motion estimate from the global ego-motion constraint and to define a differential structure estimate from the plurality of local structure constraints;
a plurality of adders which add the differential ego-motion estimate to the initial ego-motion estimate to provide a refined ego-motion estimate and which add the differential structure estimate to the initial structure estimate to provide a refined structure estimate. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
a pyramid processor which performs a wavelet decomposition of the reference image and each of the warped inspection images to provide a plurality of corresponding resolution levels for each of the reference image and the warped inspection images;
wherein, the correlation processor correlates each resolution level of each inspection image to a corresponding resolution level of the reference image.
-
-
14. Apparatus according to claim 13, wherein the correlation processor, for each point in the reference image, defines a window of points around the selected point, correlates, in each resolution level of each inspection image, a window of points corresponding to the defined window of points to a respective window of points in the corresponding resolution level of the reference image.
-
15. Apparatus according to claim 11, further comprising additional sensing modalities that provide information regarding one of scene structure and ego-motion, and the apparatus further includes a processor which processes the information provided by the additional sensing modalities to provide the initial estimates of ego-motion and scene structure.
-
16. Apparatus according to claim 15, wherein the additional sensing modalities are selected from a group consisting essentially of a light amplification for detection and ranging (LADAR) system, an inertial navigation system and an odometry system.
-
17. Apparatus according to claim 11, wherein the at least one camera consists of a single camera.
-
18. Apparatus according to claim 11, wherein the at least one camera consists of two cameras having a fixed separation and the batch of images are corresponding stereo images.
-
19. Apparatus according to claim 18, further comprising a processor which receives the reference image and corresponding stereo image from the two cameras and processes the two images to generate a depth map of the scene wherein the depth map is provided as the initial estimate of scene structure.
-
20. An article of manufacture comprising a carrier containing computer program instructions, the computer program instructions controlling a general purpose computer to estimate both three-dimensional (3D) scene structure and ego-motion from a batch of images of the scene obtained by a camera as it moves through the scene, the computer program instructions causing the computer to perform the steps of:
-
defining a reference image and a plurality of inspection images in the batch of images;
providing an initial estimate of the ego-motion and the scene structure for the batch of images;
responsive to the initial estimate of ego-motion and scene structure, globally correlating each of the inspection images to the reference image to define a global ego-motion constraint for all of the inspection images relative to the reference image;
refining the initial estimate of ego-motion based on the global ego-motion constraint;
responsive to the initial estimate of ego-motion and scene structure, locally correlating each of the inspection images to the reference image to define a plurality of local structure constraints for all of the inspection images relative to the reference image;
responsive to the plurality of local structure constraints, refining the initial estimate of scene structure in respective regions of the reference image corresponding to the plurality of local structure constraints. - View Dependent Claims (21, 22, 23)
-
-
24. A method of generating a depth map of a scene from a sequence of images of the scene obtained by a camera moving through the scene, the method comprising the steps of:
-
a) selecting a first batch of images from the sequence of images;
b) processing the first batch of images to generate estimates of ego-motion of the camera and structure of the scene;
c) projecting the estimated structure of the first batch of images into a world coordinate system;
d) selecting a further batch of images from the sequence of images, the further batch of images having a reference image that is included in a previous batch of images;
e) using the estimated ego-motion for the previous batch of images, mapping the structure estimate for the previous batch of images into a coordinate system defined by the reference image of the further batch of images;
f) processing the further batch of images to generate further estimates of ego-motion of the camera and structure of the scene;
g) projecting the further estimated structure into the world coordinate system to be combined with the previously projected estimated structure;
h) repeating steps d) through g) until a last batch of images in the sequence of images has been processed; and
i) providing the combined projected estimated structure as the depth map. - View Dependent Claims (25, 26)
defining a reference image and a plurality of inspection images in the further batch of images;
receiving the mapped structure estimate as an initial estimate of the scene structure for the further batch of images;
responsive to the initial estimate of scene structure, globally correlating each of the inspection images to the reference image to define a global ego-motion constraint for all of the inspection images relative to the reference image;
refining the initial estimate of ego-motion based on the global ego-motion constraint;
responsive to the initial estimate of ego-motion and scene structure, locally correlating each of the inspection images to the reference image to define a plurality of local structure constraints for all of the inspection images relative to the reference image;
responsive to the plurality of local structure constraints, refining the initial estimate of scene structure in respective regions of the reference image corresponding to the plurality of local structure constraints.
-
Specification