IMAGE-BASED LOCALIZATION
First Claim
1. A computer-implemented process for computing the 3D position and 3D orientation of a video camera used to capture sequential image frames of an environment for which a three dimensional (3D) reconstruction has been pre-computed to identify 3D points in the environment that correspond to two dimensional (2D) points in previously-captured images of the environment, comprising:
- using a computer to perform the following process actions;
constructing an indexed database comprising multiple representative database descriptors for each 3D point in the 3D reconstruction, ones of said database descriptors for each 3D point being computed at different scales and from multiple ones of said previously-captured images of the environment;
after the indexed database is constructed, inputting image frames from the video camera as they are captured;
for each consecutive image frame input,identifying tracked keypoints representing 2D image frame locations in the image frame, said tracked keypoints being either newly added or previously depicted and identified in one or more previously-input image frames,computing a database descriptor for each newly added tracked keypoint identified in the image frame, said database descriptor being of the same type as computed for the indexed database,identifying 3D points in the 3D reconstruction of the environment that correspond to the tracked keypoints, said identifying of 3D points comprising, for each tracked keypoint, matching the database descriptor computed for the tracked keypoint to one or more descriptors in the indexed database and determining the 3D point associated with the matched database descriptors in the indexed database, andestimating the 3D position and 3D orientation of the video camera.
3 Assignments
0 Petitions
Accused Products
Abstract
Image-based localization technique embodiments are presented which provide a real-time approach for image-based video camera localization within large scenes that have been reconstructed offline using structure from motion or similar techniques. From monocular video, a precise 3D position and 3D orientation of the camera can be estimated on a frame by frame basis using only visual features.
-
Citations
20 Claims
-
1. A computer-implemented process for computing the 3D position and 3D orientation of a video camera used to capture sequential image frames of an environment for which a three dimensional (3D) reconstruction has been pre-computed to identify 3D points in the environment that correspond to two dimensional (2D) points in previously-captured images of the environment, comprising:
-
using a computer to perform the following process actions; constructing an indexed database comprising multiple representative database descriptors for each 3D point in the 3D reconstruction, ones of said database descriptors for each 3D point being computed at different scales and from multiple ones of said previously-captured images of the environment; after the indexed database is constructed, inputting image frames from the video camera as they are captured; for each consecutive image frame input, identifying tracked keypoints representing 2D image frame locations in the image frame, said tracked keypoints being either newly added or previously depicted and identified in one or more previously-input image frames, computing a database descriptor for each newly added tracked keypoint identified in the image frame, said database descriptor being of the same type as computed for the indexed database, identifying 3D points in the 3D reconstruction of the environment that correspond to the tracked keypoints, said identifying of 3D points comprising, for each tracked keypoint, matching the database descriptor computed for the tracked keypoint to one or more descriptors in the indexed database and determining the 3D point associated with the matched database descriptors in the indexed database, and estimating the 3D position and 3D orientation of the video camera. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented process for computing the 3D position and 3D orientation of a video camera used to capture sequential image frames of an environment for which a three dimensional (3D) reconstruction has been pre-computed to identify 3D points in the environment that correspond to two dimensional (2D) points in previously-captured images of the environment, comprising:
-
using a computer to perform the following process actions; constructing an indexed database comprising multiple representative database descriptors for each 3D point in the 3D reconstruction, ones of said database descriptors for each 3D point being computed at different scales and from multiple ones of said previously-captured images of the environment; after the indexed database is constructed, inputting image frames from the video camera as they are captured; for each consecutive image frame input, identifying tracked keypoints representing 2D image frame locations in the image frame, said tracked keypoints being either newly added or previously depicted and identified in one or more previously-input image frames, computing a database descriptor for each newly added tracked keypoint identified in the image frame, said database descriptor being of the same type as computed for the indexed database, identifying 3D points in the 3D reconstruction of the environment that correspond to the tracked keypoints to produce 2D-3D point correspondences each of which identifies the coordinates of a 3D point in the 3D reconstruction that corresponds the 2D tracked keypoint, said identifying of 3D points comprising, for each tracked keypoint, matching the database descriptor computed for the tracked keypoint to one or more descriptors in the indexed database and determining the 3D point associated with the matched database descriptors in the indexed database, initially estimating the 3D position and 3D orientation of the video camera using a random sample consensus (RANSAC) procedure with three-point pose estimation to identify a set of inliers among the 2D-3D point correspondences, refining the initial estimate of the 3D position and 3D orientation of the video camera using a non-linear least squares optimization procedure to produce a current estimate of the video camera 3D position and 3D orientation for the image frame under consideration, determining whether the number of 2D-3D point correspondences identified as inliers exceeds a prescribed correspondences number, whenever it is determined that the number of 2D-3D point correspondences identified as inliers exceeds the prescribed correspondences number, updating the current translational and angular velocities of the video camera using a Kalman filter updating procedure.
-
-
18. A computer-implemented process for computing the 3D position and 3D orientation of a video camera used to capture sequential image frames of an environment for which a three dimensional (3D) reconstruction has been pre-computed to identify 3D points in the environment that correspond to two dimensional (2D) points in previously-captured images of the environment, comprising:
-
using a computer to perform the following process actions; constructing an indexed database comprising multiple representative database descriptors for each 3D point in the 3D reconstruction, ones of said database descriptors for each 3D point being computed at different scales and from multiple ones of said previously-captured images of the environment; after the indexed database is constructed, inputting image frames from the video camera as they are captured; for each consecutive image frame input, identifying tracked keypoints representing 2D image frame locations in the image frame, said tracked keypoints being either newly added or previously depicted and identified in one or more previously-input image frames, computing a database descriptor for each newly added tracked keypoint identified in the image frame, said database descriptor being of the same type as computed for the indexed database, identifying 3D points in the 3D reconstruction of the environment that correspond to the tracked keypoints to produce 2D-3D point correspondences each of which identifies the coordinates of a 3D point in the 3D reconstruction that corresponds the 2D tracked keypoint, said identifying of 3D points comprising, for each tracked keypoint, matching the database descriptor computed for the tracked keypoint to one or more descriptors in the indexed database and determining the 3D point associated with the matched database descriptors in the indexed database, grouping the previously-captured images of the environment used to produce the 3D reconstruction into overlapping clusters based on the 3D position and 3D orientation of the camera used to capture each of the previously-captured images, assigning a unique group identifier to each cluster of images; for each cluster of images, identifying the database descriptors associated with each image in the cluster using said indexed database, tallying the number of 2D-3D point correspondences, determining if the tallied number exceeds a prescribed 2D-3D point correspondences threshold number; whenever it is determined that the tallied number does not exceed the prescribed 2D-3D point correspondences threshold number, for each database descriptor computed for a tracked keypoint identified in the current image frame, performing a search of the indexed database to obtain a listing of nearest neighbor descriptors, identifying a distance measure between each nearest neighbor descriptor in the listing and the database descriptor under consideration, said difference measure increasing in value the more a nearest neighbor descriptor differs from the database descriptor under consideration, for each nearest neighbor descriptor, determining if its distance measure is less than a prescribed distance, whenever the distance measure of a nearest neighbor descriptor in the listing is less than the prescribed distance, identifying the 3D point that corresponds to the nearest neighbor descriptor,
identifying each cluster of images that the identified 3D point belongs to,
assigning a score to each identified cluster that is equal to the prescribed distance divided by the distance measure of the nearest neighbor descriptor under consideration,ascertaining the highest score assigned to the identified clusters, computing a minimum score, said minimum score being defined as a prescribed percentage of the ascertained highest score, identifying those clusters having a score that is equal to or exceeds the minimum score and identifying the database images belonging to the identified clusters, eliminating those nearest neighbor descriptors that are not associated with the identified database images, for each 3D point corresponding to one or more remaining nearest neighbor descriptors, computing a matching strength value, said matching strength value being defined as the summation of the scores of the nearest neighbor descriptors associated with the 3D point, and for the keypoint corresponding to the descriptor under consideration, identifying the 3D point having the greatest matching strength value and the 3D point having the second greatest matching strength value, determining if the second greatest matching strength value divided by the greatest matching strength value for the 3D points exceeds a prescribed ratio value, whenever it is determined that the second greatest matching strength value divided by the greatest matching strength value exceeds the prescribed ratio value, assigning the 3D point having the greatest matching strength value to the keypoint to form a 2D-3D point correspondence, computing an estimate of the current 3D position and 3D orientation of a video camera by applying the last-estimated translational and angular velocities to the last-estimated 3D position and 3D orientation of the video camera for the immediate preceding frame, performing a geometric verification procedure on each newly formed 2D-3D point correspondence using the last-estimated camera pose and eliminating those newly formed 2D-3D point correspondences that do not match within a prescribed tolerance, re-tallying the number of 2D-3D point correspondences, determining if the re-tallied number of 2D-3D point correspondences exceeds said prescribed 2D-3D point correspondences threshold number, whenever it is determined the re-tallied number of 2D-3D point correspondences exceeds said prescribed 2D-3D point correspondences threshold number, estimating the 3D position and 3D orientation of the video camera using the 2D-3D correspondences between at least some of the identified 3D points and their corresponding 2D tracked keypoints, and estimating the current translational and angular velocities of the 3D position and 3D orientation of a video camera based on the current estimate of the 3D position and 3D orientation of the video camera. - View Dependent Claims (19, 20)
-
Specification