STRUCTURE DEPTHAWARE WEIGHTING IN BUNDLE ADJUSTMENT

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A method for performing photogrammetric 3D model reconstruction, comprising obtaining, by 3D model reconstruction software executing on an electronic device, a set images of a scene that include an object of interest taken by a camera from different viewpoints;
 automatically selecting keypoints in the sets of images;
automatically matching corresponding keypoints that appear in more than one of the set of images;
estimating 3D points in 3D space for features of the scene represented by corresponding keypoints in the images;
performing bundle adjustment operations to simultaneously refine the estimated 3D points and camera parameters for the set of images, the bundle adjustment operations performed as an optimization with a loss function that penalizes reprojection error, wherein a depthaware weighting is applied to the reprojection error of each 3D point in the optimization;
utilizing the refined estimated 3D positions and camera parameters produced by the bundle adjustment operations in a dense 3D reconstruction to produce a 3D model that includes the object; and
displaying the 3D model, by the 3D model reconstruction software executing on the electronic device, on a display screen.
1 Assignment
0 Petitions
Accused Products
Abstract
In various embodiments, techniques are provided for photogrammetric 3D model reconstruction that modify the optimization performed in bundle adjustment operations of an automatic SfM stage to apply a depthaware weighting to reprojection error of each 3D point used in the optimization. The reprojection error of each 3D point may be weighted based on a function of distance, density of a cluster, or a combination of distance and density. A loss function may be scaled to account for the weighting, and normalizations applied. Such weighting may force consideration of 3D points on an object of interest in the foreground and improve convergence of the optimization to global optima. In such manner, accurate and complete 3D models may be reconstructed of even illtextured or very thin objects in the foreground of a scene with a highly textured background, while not consuming excessive processing and storage resources or requiring tedious workflows.
0 Citations
No References
No References
20 Claims
 1. A method for performing photogrammetric 3D model reconstruction, comprising obtaining, by 3D model reconstruction software executing on an electronic device, a set images of a scene that include an object of interest taken by a camera from different viewpoints;
automatically selecting keypoints in the sets of images; automatically matching corresponding keypoints that appear in more than one of the set of images; estimating 3D points in 3D space for features of the scene represented by corresponding keypoints in the images; performing bundle adjustment operations to simultaneously refine the estimated 3D points and camera parameters for the set of images, the bundle adjustment operations performed as an optimization with a loss function that penalizes reprojection error, wherein a depthaware weighting is applied to the reprojection error of each 3D point in the optimization; utilizing the refined estimated 3D positions and camera parameters produced by the bundle adjustment operations in a dense 3D reconstruction to produce a 3D model that includes the object; and displaying the 3D model, by the 3D model reconstruction software executing on the electronic device, on a display screen.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
 13. A system for performing photogrammetric threedimensional (3D) model reconstruction, comprising:
a camera configured to capture a set images of a scene that include an object of interest taken from different viewpoints; and one or more electronic devices configured to execute 3D model reconstruction software, the 3D model reconstruction software when executed operable to; automatically select keypoints in the sets of images, automatically match corresponding keypoints that appear in more than one of the set of images, estimate 3D points in 3D space for features of the scene represented by corresponding keypoints in the images, perform bundle adjustment operations to simultaneously refine the estimated 3D points and camera parameters for the set of images, the bundle adjustment operations performed as an optimization with a loss function that penalizes reprojection error, wherein a depthaware weighting is applied to the reprojection error of each 3D point in the optimization, and utilize the refined estimated 3D positions and camera parameters produced by the bundle adjustment operations in a dense 3D reconstruction to produce a 3D model that includes the object; and a display device of the one or more electronic devices operable to display the 3D model.  View Dependent Claims (14, 15, 16)
 17. A nontransitory electronic device readable medium storing software for execution on one or more processors or one or more electronic devices, the software when executed operable to:
obtain a set images of a scene that include an object of interest taken by a camera from different viewpoints; photogrammetrically reconstruct a threedimensional (3D) model from the set of images utilizing an automatic structurefrommotion (SfM) stage and a dense 3D reconstruction stage, the automatic SfM stage including bundle adjustment operations that simultaneously refine estimated 3D points and camera parameters for the set of images by performing an optimization with a loss function that penalizes reprojection error, wherein a depthaware weighting is applied to the reprojection error of each 3D point in the optimization, the dense 3D reconstruction stage to utilize the refined estimated 3D positions and camera parameters produced by the bundle adjustment operations to produce a 3D model that includes the object; and is output the 3D model.  View Dependent Claims (18, 19, 20)
1 Specification
The present application claims priority to EP Application No. 18306437.7 by Nicolas Gros, titled “Structure DepthAware Weighting in Bundle Adjustment”, filed on Oct. 31, 2018 with the French Receiving Office, the contents of which are incorporated by reference herein in their entirety.
The present disclosure relates to photogrammetric three dimensional (3D) model reconstruction, and more specifically to techniques for improved photogrammetric 3D model reconstruction for illtextured or very thin objects in the foreground.
It is often desirable to create a 3D model of existing infrastructure (e.g., buildings, roads, bridges, telecommunications towers, electrical power networks, etc.). While such 3D models may be created manually utilizing a computer aided design (CAD) application, such process can be quite time consuming. Accordingly, there is increasing interest in automatic model generation software, including photogrammetric 3D model reconstruction software.
Photogrammetric 3D model reconstruction software may receive a set of images (e.g., twodimensional (2D) photographs) of a scene taken from different viewpoints, analyze those images, and automatically compute the relative poses of the images in 3D space and 3D geometry of the scene. This process may involve automatically detecting and matching points across images that correspond to the same feature in the scene, and using this information to determine 3D points and camera parameters. The overall operation is often divided into two distinct stages: an automatic structurefrommotion (SfM) stage and a dense 3D reconstruction stage. The automatic SfM stage typically involves SfM techniques that compute camera parameters of each of the images and generate a lowdensity (i.e. sparse) 3D point cloud.
The 3D reconstruction stage typically involves a dense 3D reconstruction that produces a 3D model from the sparse 3D point cloud and camera parameters. The stage may apply multiview stereo (MVS) reconstruction techniques to produce a high resolution dataset and utilize photogrammetry algorithms to produce the final 3D model.
While existing photogrammetric 3D model reconstruction software may create accurate, complete 3D models in some use cases, it may struggle in other use cases involving illtextured or very thin objects in the foreground of a scene and highly textured backgrounds. For example, consider the case of a scene including a piece of infrastructure, such as a telecommunications tower, in the foreground, similar to as shown in
Much of the difficulty existing 3D model reconstruction software has with illtextured or very thin objects in the foreground results from limitations of the automatic SfM stage. In general, SfM techniques employed in the stage do not well handle the presence of noisy or incomplete data with a density of signal not homogeneously spread over various depth planes. Various measures have been attempted to address this issue, but have proven ineffective or have introduced other undesirable problems. For example, some attempts have been made to finetune image capture and processing parameters. These measures have included increasing the amount of data processed by the 3D model reconstruction software (i.e. capturing more images, extracting and processing more keypoints, etc.), relaxing outlier detection thresholds, and reducing the parameter space via camera internal calibration, among others. However, such measures have not guarantee improved accuracy and completeness, have often added noise to the source data, and have often increased hardware resource demands (i.e. processing time, storage requirements, etc.). Since photogrammetric 3D model reconstruction is already a hardware resource intensive operation that can burden the processing and storage capabilities of electronic devices, further increasing hardware resource demands may be highly undesirable. Other attempts have been made that rely upon some form of manual user intervention. These measures have required users to manually identify correspondence of keypoints on the object of interest, preinstrument the object of interest to enforce a highdensity of points thereon (e.g., applying specific patterns to the object of interest), and various types of iterative and incremental workflows. Such measures have introduced burdens upon the user, making photogrammetric 3D model reconstruction a tedious process, and have proven impractical (or sometimes impossible). For example, preinstrumenting a tall telecommunications tower may not be practical, or even possible.
Accordingly, there is a need for techniques for photogrammetric 3D model reconstruction that may produce accurate and complete 3D models of even illtextured or very thin objects in the foreground, without introducing other undesirable problems (e.g., increased hardware resource demands, tedious workflows, etc.).
In various embodiments described below, techniques are provided for photogrammetric 3D model reconstruction that modify the optimization performed in bundle adjustment operations of an automatic SfM stage to apply a depthaware weighting to reprojection error of each 3D point used in the optimization. The reprojection error of each 3D point may be weighted based on a function of its distance, density of a cluster to which it belongs, or a combination of distance and density. A loss function of the optimization may be scaled to account for the weighting, and normalizations applied. Such weighting may force consideration of 3D points on an object of interest in the foreground, enabling reconstruction of a more accurate and complete 3D model without increased hardware resource demands, tedious workflows, or other undesirable sideeffects.
In one specific embodiment, 3D model reconstruction software executing on an electronic device performs a photogrammetric 3D model reconstruction by first obtaining a set images of a scene that include an object of interest (e.g., a piece of infrastructure, such as a telecommunications tower, in the foreground) taken by a camera (e.g., of the electronic device or of a separate device, such as an aerial drone) from different viewpoints. The 3D model reconstruction software, or backend 3D model reconstruction processing software executing on a server or cloud services, implements an automatic SfM stage that automatically selects keypoints in the sets of images, automatically matches corresponding keypoints that appear in more than one of the set of images, estimates 3D points in 3D space for features of the scene represented by corresponding keypoints in the images and performs bundle adjustment operations to simultaneously refine the estimated 3D points and camera parameters for the set of images. The bundle adjustment operations involve an optimization with a loss function that penalizes reprojection error, wherein a depthaware weighting is applied to the reprojection error of each 3D point in the optimization. The depthaware weighting may be a weighting based on a function of distance between the respective 3D point and the camera, such as an inverse of distance between the respective 3D point and the camera; a weighting based on a function of density of a distancerelated cluster of the respective 3D point, such as an inverse of a number of points of the cluster of the respective 3D point; or a weighting based on a function that combines distance between the respective 3D point and the camera and density of the cluster of the respective 3D point (with an additional normalization). Such weighting may operate to increase the number of 3D points retained in bundle adjustment operations on the object of interest (e.g., the piece of infrastructure, such as the telecommunications tower, in the foreground). The 3D model reconstruction software, or backend 3D model reconstruction processing software executing on the server or cloud services, then implements a dense 3D reconstruction stage that utilizes the refined estimated 3D positions and camera parameters from the bundle adjustment operations to produce a 3D model (e.g., a textured 3D mesh) that includes the object. Thereafter, the 3D model reconstruction software may output the 3D model, for example, display the 3D model on a display screen.
It should be understood that a variety of additional features and alternative embodiments may be implemented other than those discussed in this Summary. This Summary is intended simply as a brief introduction to the reader for the further description that follows, and does not indicate or imply that the examples mentioned herein cover all aspects of the disclosure, or are necessary or essential aspects of the disclosure.
The application refers to the accompanying drawings of example embodiments, of which:
The chipset 420 further includes an input/output controller hub 465 coupled to the memory controller hub by an internal bus 467. Among other functions, the input/output controller hub 465 may support a variety of types of peripheral buses, for connecting to other system components. The system components may include one or more I/O devices 470, such as a keyboard, a mouse, a touch sensor, a camera, etc., one or more persistent storage devices 475, such as a hard disk drive, a solidstate drive, or another type of persistent data store, one or more network interfaces 180, such as an Ethernet interface, a WiFi interface, Bluetooth interface, etc., among other system components. The network interface(s) 480 may allow communication with other electronic devices over a computer network, such as the Internet, to enable various types distributed, or cloud computing arrangements.
Working together, the components of the electronic device 400 (and other electronic devices in distributed or cloud computing arrangements) may execute software operating upon data that are both persistently stored in storage devices, such as storage devices 475 and loaded into memory, such as memory 430, when needed. For example, 3D model reconstruction software 490 may be provided that utilizes a set of multiple images (e.g., 2D photographs) of a scene taken by a camera (either of the electronic device 400, or of another device, such as an aerial drone (not shown)) from different viewpoints to automatically reconstruct a 3D model (e.g., a textured 3D mesh) of the scene. The model reconstruction software 490 may take different forms, in which the processing necessary to create the 3D model is performed in different locations. In one implementation, the 3D model reconstruction software 490 may be a standalone application that performs the necessary processing using the CPU 410 of the electronic device, such as the ContextCapture™ desktop reality modeling application available from Bentley Systems, Inc. In such an implementation, an automatic SfM software process 492 and a 3D reconstruction software process 494 may be executed by the CPU 410 of the electronic device 400. In another implementation, the 3D model reconstruction software 490 may be a client application that provides user interface functionality but offloads necessary processing operations to servers or cloud computing services, such as the ContextCapture™ Console or the ContextCapture™ Mobile reality modeling application available from Bentley Systems, Inc. In such an implementation, the automatic SfM software process 492 and the 3D reconstruction software process 494 may not be part of the 3D model reconstruction software 490, but instead a portion of serverbased or cloud servicesbased backend 3D model reconstruction processing software (not shown). In such manner, processor intensive operations may be offloaded from the electronic device 400 to servers or cloud computing services (not shown).
In operation, when the 3D model reconstruction software 490 is supplied with a set of multiple images that were taken from multiple viewpoints, the automatic SfM software process 492 utilizes SfM techniques to compute camera parameters of each of the multiple images and generates a lowdensity (i.e. sparse) 3D point cloud. Thereafter the, 3D reconstruction software process 494 performs a dense 3D reconstruction that produces a 3D model (e.g., a textured 3D mesh) from the sparse 3D point cloud and camera parameters, utilizing MVS reconstruction techniques photogrammetry algorithms.
At step 520, the automatic SfM software process 492 (of the 3D model reconstruction software 490 on the electronic device 400 or of backend 3D model reconstruction processing software on a server or cloud services) automatically detects features of interest (i.e. keypoints) in the sets of images. Any of a number of known feature detection and matching algorithms may be employed, for example, a scaleinvariant feature transform (SIFT) algorithm, a speeded up robust features (SURF) algorithm, a binary robust independent elementary features (BRIEF) algorithm, etc. The keypoints typically are 2D points within the images (e.g., 2D photographs). At least some of the keypoints typically will represent features on the object of interest (e.g., on the object in the foreground, such as a telecommunications tower) while other keypoints may represent features on other portions of the scene (e.g., on the background, such as the ground or sky).
At step 530, the automatic SfM software process 492 (of the 3D model reconstruction software 490 on the electronic device 400 or of backend 3D model reconstruction processing software on a server or cloud services) automatically matches corresponding keypoints that appear in more than one of the set of images. The feature detection and matching algorithm used in step 520 may also perform this operation.
At step 540, the automatic SfM software process 492 (of the 3D model reconstruction software 490 on the electronic device 400 or of backend 3D model reconstruction processing software on a server or cloud services) estimates 3D points in 3D space for features of the scene (e.g., the feature on the object of interest or background) represented by corresponding keypoints in the images and estimates camera poses.
At step 550, the automatic SfM software process 492 (of the 3D model reconstruction software 490 on the electronic device 400 or of backend 3D model reconstruction processing software on a server or cloud services) performs bundle adjustment operations to simultaneously refine the estimated 3D positions and camera parameters describing changes in camera pose between images and optical characteristics. Bundle adjustment operations may involve an optimization where the loss function of the optimization penalizes reprojection error. In this context, “reprojection error” refers to a discrepancy in the position of a 2D reprojection of a 3D point estimated from one or more keypoints in one or more images, and a keypoint used to estimate the 3D point. Reprojection error may be measured in terms of distance. The loss function of the optimization that penalizes reprojection error may be a nonlinear least squares loss function, or another type of loss function that is more robust to noisy data, for example, a soft L1 loss function, a Huber loss function, etc. Use of a more robust loss function, as well as outlier detection algorithms (e.g., random sample consensus (RANSAC)based detection of outliers) and other techniques, may prevent a few highly inaccurate 3D points from pulling the optimization significantly towards an incorrect result.
The reprojection error may be stated mathematically as:
err_{i}=dist (x_{i}−X_{i})
where err_{i }is the reprojection error of the reprojection of 3D point i, dist is the distance formula for the coordinate system being used, x_{i }is the 2D coordinates of a keypoint in an image used to estimate 3D point i, and X_{i }is the reprojection of the 3D point i back onto the image.
The loss function of the optimization performed in the bundle adjustment may be stated mathematically as:
where min is a minimization function, loss is a loss function, err_{i }is the reprojection error of the reprojection of 3D point i, and n is the total number of 3D points. The loss function may take various forms. For example, in the case of nonlinear least squares, the loss function may be loss (z)=z (i.e. identity) where z is the quantity being optimized Likewise, in the case of Soft L1 the loss function may be loss(z)=2(√{square root over (1+z)}−1). A variety of other loss functions may alternatively be used. The loss function may be applied to the reprojection error in each of the set of images, and then a global optimization across all images determined.
Returning to
Finally, at step 570, the 3D model reconstruction software 490 on the electronic device 400 may output the 3D model, for example, display the 3D model in a user interface on a display screen 460, save the 3D model to a storage device 475, etc.
As discussed above, existing photogrammetric 3D model reconstruction software typically has difficulty with illtextured or very thin objects in the foreground (e.g., illtextured or very thin infrastructure, such as a telecommunications tower, in the foreground). Much of this difficulty is due to the automatic SfM stage, and in particular how bundle adjustment operations are performed. With conventional bundle adjustment operations, errors and inaccuracy in the estimation of 3D points that are different distances from the camera don'"'"'t have the same impact on reprojection error. A translation motion of the camera typically leads to a larger error for a 3D point that is near the camera (in the foreground) and a smaller error for a 3D point that is far from the camera (in the background).
To address these issues, step 550 of the sequence of steps 500 of
Weighting of the reprojection error may be stated mathematically as:
err_{i}=w(i)*dist (x_{i}−X_{i})
where err_{i }is the reprojection error of the reprojection of 3D point i, w(i) is a depthaware weighting function applied to 3D point i, dist is the distance formula for the coordinate system being used, x_{i }is the 2D coordinates of a keypoint in an image used to estimate 3D point i, and X_{i }is the reprojection of the 3D point i onto the image.
Scaling of the loss function of the optimization performed by bundle adjustment operations may be stated mathematically as:
loss (z, w)=w^{2 }loss (z/w^{2})
where z is the quantity being optimized and w is the depthaware weight.
In the first embodiment, where the reprojection error of each 3D point is weighted based on a function of distance, weight may be represented mathematically as:
w_{distance}(i)=1/d_{i }
where w_{distance }is the distance weighting function and d_{i }is a distance between 3D point i and the camera.
In the second embodiment, where the reprojection error of each 3D point is weighted based on a function of density, a clustering algorithm is first applied to partition the 3D points into a number of clusters based on their distance to the camera. In one implementation, the clustering algorithm may be a 1dimensional (1D) kmean clustering algorithm that partitions the 3D points into a number of clusters k, for example ten clusters (k=10). However, it should be understood other 1D clustering algorithms and other numbers of clusters may be utilized.
Weighting in this second embodiment may be represented mathematically as:
w_{density}(i)=1/p_{i }
where w_{density }is the density weighting function and p_{i }is a number of points in the cluster of 3D point i.
In the third embodiment, where reprojection error of each 3D point is weighted based on a function that combines distance and density of clusters and an additional normalization is applied, scaling factors may be selected to balance the impact of both forms of information. Such balancing may ensure 3D points in the foreground have impact, while still allowing proper operation of a robust loss function and outlier detection algorithms. That is, it may still be possible to compare various errors and statistically classify the 3D points at a given distance from the camera.
Weighting based on a combination of distance and density of clusters and additional normalization may be represented mathematically as:
where W_{combined }is the combined weighting function, w_{density }is the density weighting function, w_{distance }is the distance weighting function, S_{density }is the sum of w_{density }(i) for all 3D points i, S_{distance }is the sum of w_{distance }(i) for all 3D points i, and a, b and c are scaling factors selected to balance the forms of information with a+b+c=1.
It should be understood that a variety of further embodiments may also be used that apply depthaware weighting to the reprojection error of each 3D point used in the optimization of bundle adjustment operations.
It should be understood that such improvements may be achieved using depthaware weighting of reprojection error in bundle adjustment operations without introducing other undesirable problems, such as increased hardware resource demands (i.e. processing time, storage requirements, etc.) and tedious workflows. This may enable the 3D model reconstruction software 490 and/or automatic SfM software process 492 to consume less processing and storage capabilities of electronic devices, in comparison to other attempted techniques, improving the functioning of the electronic devices. Likewise, it may enable photogrammetric 3D model reconstruction to be used in more situations, where it was previously impractical or impossible to do so, without burdening the user to manually address issues of accuracy and completeness.
It should be understood that a wide variety of modifications and adaptations may be made to the above described techniques. Further, many of the above described techniques may be implemented in software, in hardware, or in a combination thereof. A software implementation may include electronic deviceexecutable instructions stored in a nontransitory electronic devicereadable medium, such as a volatile or persistent memory, a harddisk, a compact disk (CD), or other storage medium. A hardware implementation may include specially configured processors, application specific integrated circuits (ASICs), and/or other types of hardware components. Further, a combined software/hardware implementation may include both electronic deviceexecutable instructions stored in a nontransitory electronic devicereadable medium, as well as one or more specially configured hardware components. Above all, it should be understood that the above described techniques are meant to be taken only by way of example. What is claimed is: