Propagation-based incremental triangulation for multiple views 3D reconstruction

Wei Fang; Kui Yang; Haiyuan Li

doi:10.3788/COL202119.021101

Abstract

To balance the accuracy and efficiency in multiple-view triangulation with sequential images, a high-efficiency propagation-based incremental triangulation (INT) method, carving three-dimensional (3D) scene points by updating the incoming feature track one by one without iterations, is proposed. Based on the INT method, a more accurate iteration-limited INT method is also established with few iterations to bound the propagated errors, ensuring the accuracy of subsequent 3D reconstruction. Finally, experimental results demonstrate that the proposed methods can balance the efficiency and accuracy in different multiple-view INT situations.

Keywords

3D reconstruction incremental triangulation multiple views propagation

1. Introduction

There has been an escalation in the amount of work dealing with three-dimensional (3D) reconstruction [1 - 3], pose estimation [4 - 6], structure-from-motion (SfM) [7, 8], and simultaneous localization and mapping (SLAM) [9, 10], such as applications in robotics, augmented/virtual reality, and self-driving. A fundamental component of these works is triangulation, which refers to recovering the 3D location from multiple two-dimensional (2D) image observations with known pose and camera parameters.

Over the years, many triangulation methods have been proposed, in particular for the case of two or three views, where robust and fast methods have been developed [11, 12] . It is being verified that more observations of the same scene point can improve the estimated position, but this also increases the complexity of the problem [13] . Therefore, the procedure given an arbitrary number of views and noisy observations to perform triangulation is still a difficult task. One of the most well-known triangulation actions is the midpoint method [14], choosing the midpoint of the common perpendiculars to all of the rays as the best position estimation, which is also considered as the fastest one given multiple views by far. Nevertheless, this method is not supposed to be optimal, and the final 3D localization will potentially be inaccurate due to noisy inputs.

For more accurate triangulation performance, the achievement of optimal 3D scene points is given enough attention, but most solve for very small numbers of views, such as two or three views [14] . For two views, a closed-form solution was proposed by Hartley and Sturm [15] and made more efficient and simplified by Lindstrom [16] . These methods yield all stationary points of their cost function, which are then checked to obtain the optimal solution. For three views, the closed-form triangulation method was achieved [17], and they make use of the Grobner basis to solve polynomial equations. Nevertheless, the computational complexity increases exponentially with the number of views, and there is no suitable polynomial solver for more than three views.

Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you！Sign up now

L_{\infty}

L_{2}

2. Notations and Preliminaries

p

Figure 1.Geometric indication of multi-view triangulation.

p

3. Propagation-Based Incremental Triangulation

p

To address the biased estimation of the midpoint method by formulating their optimization problem in terms of the image reprojection error, the angular reprojection error is another popular choice, and it is supposed to own superior adaptability to different camera models [25] . Inspired by these angular reprojection errors methods, a weight that is proportional to the distance between the 3D point and the camera center is proposed. Thus, the unbiased temporal cost function for multiple-view triangulation is reformed as the following:where an explicit weight is assigned by the inversely proportional function to the distance between the 3D point and the camera:

L_{2}

\sin θ_{i}

N

p^{*}

In most SfM or SLAM applications, it is well known that temporal triangulation is a repeated work when new 2D observations are extracted from incoming images. More exactly, the 3D scene point may be triangulated in the first few images, and then these available 3D points would be applied to estimate the pose of a new coming image by perspective-n-point (PnP) [26], followed by more scene points with triangulation operations repeatedly. During these processes, INT is carried out with the feature track lasting for many frames with the same 3D point.

p_{n}

p_{ININT}

In addition to the subsequent INT, the propagation-based theme is applied when more observations of the same 3D point arrived based on a reliable initial estimation. Increasing observational information will further optimize the existing triangulation results [13] . Therefore, the proposed INT (ININT) method inherits the convergence property of the unbiased midpoint triangulation method by propagating the current image observation.

4. Experiments

Experimental results are presented to validate the accuracy and efficiency of the proposed methods. The other four benchmarks of multiple triangulation methods are compared. The first one is the native multiple-view middle point (MVMP) method; it is currently the fastest method for multiple-view triangulation, though it is labeled as inaccurate. The second method is the unbiased midpoint method (IRMP) [24] . The third one is considered a benchmark by minimizing the cost function of Eq. (2) using the naive Newton (NN) method. The fourth method is the gradient-based minimization of the reprojection errors, which is abbreviated as the standard gradient minimization of the reprojection errors (GMREs) [14] and popularly used in the recent state-of-the-art SLAM system. All of the experiments are carried out using MATLAB on a single thread on an Intel i7-8550U CPU at 1.80 GHz with 8G memory.

4.1. Experiments on synthetic data

1024 \times 1024

Figure 2.Four types of synthetic triangulation instances. (a) Type A: cameras and 3D points are randomly distributed; (b) Type B: the camera moves along a curved trajectory around 3D points; (c) Type C: the camera moves on a circle while 3D points are located at the center; (d) Type D: the camera moves along a curved trajectory towards the 3D scene.

As illustrated in Fig. 2, 100 cameras and 1000 3D points are generated as different synthetic scenarios. Then, these 3D points are projected back to the image planes with the calibrated camera matrix, corrupting by Gaussian noise with a 5 pixel covariance. Based on the initialization from unbiased temporal triangulation, the INT problem can be deduced with a successive coming image until all of the sequential images are processed.

To perform a quantitative evaluation on the proposed propagation-based INT (ININT) method, MVMP, IRMP, NN, and GMRE are applied to perform incremental multiple-view triangulation, respectively. During the INT process, these four compared methods carry out the 3D point estimation repeatedly when a new feature observation arrives, lacking the use of the previous triangulated results. In contrast, the proposed INT method considers the consequence of the unbiased temporal triangulation as the seed point, and then only the propagation is executed when encountering a more featured track. It is known that the accuracy of the 3D point is proportional to the number of its observations, and thus the last estimated 3D point with all of the observations is deemed as the most accurate one. Therefore, these 3D points derived from all their observations are projected back to the image planes again to compute the mean 2D reprojection errors. Moreover, the mean 3D errors, the Euclidean distances between generated 3D points with the triangulated ones, are also taken into consideration.

Experimental results are illustrated in Table 1; because iteration is avoided when new observations are added to the INT, the propagation-based INT method is always the fast one. Moreover, both the 3D and 2D errors on different synthetic datasets derived from the proposed INT are superior to the classic MVMP method. Although the seed point of the INT is derived from the unbiased temporal triangulation IRMP, the final 3D results after recursive propagations are close to the continuous IRMP. Compared to the proposed INT method, the GMRE method can obtain slightly better triangulation accuracy, but it takes an enormous amount of runtime. We also find that the accuracy performances of IRMP and NN methods in INT scenarios are almost the same except the runtime, which is also consistent with our previous convergence analysis.

Method	Type A			Type B			Type C
Method	Time (s)	3D Error (m)	2D Error (pixels)	Time (s)	3D Error (m)	2D Error (pixels)	Time (s)	3D Error (m)	2D Error (pixels)
MVMP	1.06	0.0192	5.1115	2.45	0.0352	5.0493	2.14	0.0292	4.9511
IRMP	4.01	0.0139	4.7698	11.76	0.0303	4.9635	8.41	0.0251	4.8888
INT	0.68	0.0146	4.7955	0.91	0.0312	4.9649	1.08	0.0258	4.8901
ININT	0.98	0.0145	4.7936	1.45	0.0307	4.9537	1.62	0.0252	4.8895
NN	6.51	0.0139	4.7698	22.60	0.0303	4.9635	14.07	0.0251	4.8888
GMRE	7.08	0.0124	4.7059	22.10	0.0272	4.9621	15.61	0.0237	4.8680

Table 1. Comparisons between Different Incremental Triangulation Performances

View all Tables

In addition to INT, the ININT method based on a few iterations is applied after the propagated 3D point. Thus, the runtime is a bit longer than the INT method with a superior 3D accuracy. Both the proposed INT and ININT methods can balance the runtime and accuracy in large-scale INTs. The experimental results also demonstrate the convergence ability of the proposed method in different situations.

For further verification of the robustness to varied noises, different Gaussian noise levels are generated to corrupt the 2D observations. Based on the Type D dataset, 2, 5, and 8 pixels covariances are simulated, respectively, while different 3D points are generated, and the performance of different INTs is illustrated in Fig. 3 . It is easy to find that the proposed INT (ININT) method is faster than all other methods while obtaining comparable 3D and 2D accuracy, except for the inaccurate MVMP method.

Figure 3.Time and accuracy analysis with different noise levels on Type D synthetic data. (a), (b), and (c) are the results of the time, 3D error, and 2D error when the Gaussian covariance is set as 2 pixels; (d), (e), and (f) are the results of the time, 3D error, and 2D error when the Gaussian covariance is set as 5 pixels; (g), (h), and (i) are the results of the time, 3D error, and 2D error when the Gaussian covariance is set as 8 pixels.

Compared to the previous synthetic dataset Types A, B, and C, the superior performance on the runtime is obvious because the number of feature tracks during INT is large in the Type D dataset. The results also illustrate that the proposed INT(ININT) methods can provide an alternative for large-scale INTs.

In addition to the efficiency and accuracy evaluation of the proposed method, the convergence performance during INT is verified in this section, where the 3D locations of the synthetic points are assumed as the ground truth, and INT is carried out when new images arrive one by one. Other methods, such as MVMP, IRMP, NN, and GMRE, are also applied to perform the multiple-view triangulation repeatedly after collecting all of the observation vectors.

Due to the length of the feature track for every 3D point being varied during INT, the overall convergence performance of the INT (ININT) method is characterized with the percentage of the views regardless of the number of views, where the total number of the feature track is seen as the 100%, and the mean 3D errors along with INT are calculated with the mean 3D error. Given different synthetic datasets, the results of INT are depicted in Fig. 4 . It is easy to find that all the triangulation methods can achieve a higher 3D accuracy with more increased feature observations, and the achievable 3D error seemly decays quadratically as more views are added to the system [13] . Compared to other methods (MVMP, IRMP, NN, and GMRE), by collecting all of the observations of the 3D points, the proposed propagation-based INT (ININT) with a recursive update about the current observation also achieves similar convergence performance.

Figure 4.Overall convergence of INT with different datasets. (a) Convergence curve of Type A dataset; (b) convergence curve of Type B dataset; (c) convergence curve of Type C dataset; (d) convergence curve of Type D dataset.

4.2. Experiments on real data

We also evaluate our INT method by the publicly available large-scale dataset [27] . During the experiments, more views are thus added to the system one by one after initialization, and then the performance of the different methods with the real datasets is illustrated in Table 2 and Fig. 5 . The results show that the proposed INT (ININT) is more efficient than the other methods. Even compared to the inaccurate MVMP, the runtime of INT has a comparative advantage. In addition, the 3D errors of the scene points are also calculated to illustrate the accuracy performance of the proposed methods. For convenient evaluation, the 3D point locations provided by the dataset are considered as the ground truth. It is also obvious to find that the proposed INT (ININT) can approach a reliable triangulation performance during INT.

Data	Total Runtime (s)						Last Mean 3D Error
Data	MVMP	IRMP	INT	ININT	NN	GMRE	MVMP	IRMP	INT	ININT	NN	GMRE
E	113.40	941.01	125.16	204.96	1477.41	1386.46	0.0148	0.0131	0.0139	0.0134	0.0131	0.0124
F	110.15	749.16	63.84	110.93	1187.04	1195.77	0.0020	0.0010	0.0014	0.0011	0.0010	0.0002
G	97.99	469.11	83.55	138.63	797.93	665.38	0.0077	0.0072	0.0075	0.0072	0.0072	0.0067
H	23.29	206.40	21.24	35.97	295.81	246.67	0.0018	0.0009	0.0012	0.0010	0.0009	0.0004

Table 2. Runtime Comparisons of Different Methods with Real Datasets^a

View all Tables

Figure 5.INT based on real datasets. (a) Lund Cathedral, (b) Orebro Castle, (c) Ystad Monestary, and (d) Skansen Kronan.

5. Conclusion

L_{2}

References

[1] C. T. Seo, S. W. Kang, M. Cho. Three-dimensional free view reconstruction in axially distributed image sensing. Chin. Opt. Lett., 15, 081102(2017).

[2] L. Li, Z. Pan, H. Cui, J. Liu, S. Yang, L. Liu, Y. Tian, W. Wang. Adaptive window iteration algorithm for enhancing 3D shape recovery from image focus. Chin. Opt. Lett., 17, 061001(2019).

[3] M. Chen, Y. Tang, X. Zou, K. Huang, L. Li, Y. He. High-accuracy multi-camera reconstruction enhanced by adaptive point cloud correction algorithm. Opt. Laser Eng., 122, 170(2019).

[4] Z. Zhang, C. Sun, P. Wang. Two-step pose estimation method based on five reference points. Chin. Opt. Lett., 10, 071501(2012).

[5] J. Zhuang, C. Hou, Y. Tang, Y. He, Q. Guo, Z. Zhong, S. Luo. Computer vision-based localization of picking points for automatic litchi harvesting applications towards natural scenarios. Biosyst. Eng., 187, 1(2019).

[6] Y. Tang, M. Chen, C. Wang, L. Luo, J. Li, G. Lian, X. Zou. Recognition and localization methods for vision-based fruit picking robots: a review. Front. Plant Sci., 11, 510(2020).

[7] J. Schonberger, J. Frahm. Structure-from-motion revisited. Proceedings of IEEE International Conference on Computer Vision, 4104(2016).

[8] J. H. Kim, B. K. Koo. Linear stratified approach using full geometric constraints for 3D scene reconstruction and camera calibration. Opt. Express, 21, 4456(2013).

[9] R. Mur-Artal, J. Montiel, J. Tardos. ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans. Robot., 31, 1147(2015).

[10] C. Forster, M. Pizzoli, D. Scaramuzza. SVO: fast semi-direct monocular visual odometry. Proceedings of IEEE International Conference on Robotics and Automation, 15(2014).

[11] S. Lee, J. Civera. Closed-form optimal two-view triangulation based on angular errors. Proceedings of IEEE International Conference on Computer Vision, 2681(2019).

[12] Z. Kukelova, T. Pajdla, M. Bujnak. Fast and stable algebraic solution to L2 three-view triangulation. Proceedings of the International Conference on 3D Vision, 326(2013).

[13] A. Scholefield, A. Ghasemi, M. Vetterli. Bound and conquer: improving triangulation by enforcing consistency. IEEE Trans. Pattern Anal. Mach. Intell., 42, 2321(2020).

[14] R. I. Hartley, A. Zisserman. Multiple View Geometry in Computer Vision(2004).

[15] R. Hartley, P. Sturm. Triangulation. Comput. Vis. Image Und., 68, 146(1997).

[16] P. Lindstrom. Triangulation made easy. Proceedings of IEEE International Conference on Computer Vision, 1554(2010).

[17] H. Stewenius, F. Schaffalitzky, D. Nister. How hard is 3-view triangulation really?. Proceedings of IEEE International Conference on Computer Vision, 686(2005).

[18] R. Hartley, F. Schaffalitzky. L∞ minimization in geometric reconstruction problems. Proceedings of IEEE Computer Vision and Pattern Recognition, 504(2004).

[19] H. Li. A practical algorithm for L∞ triangulation with outliers. Proceedings of IEEE Computer Vision and Pattern Recognition, 1(2007).

[20] G. Zhou, Q. Wang, Z. Xiao. Robust outlier removal using penalized linear regression in multiview geometry. Neurocomputing, 267, 455(2017).

[21] Q. Zhang, T. Chin. Coresets for triangulation. IEEE Trans. Pattern Anal. Mach. Intell., 40, 2095(2018).

[22] S. Recker, M. Hess-Flores, K. Joy. Fury of the swarm: efficient and very accurate triangulation for multi-view scene reconstruction. Proceedings of IEEE International Conference on Computer Vision Workshops, 652(2013).

[23] S. Lee, J. Civera. Triangulation: why optimize?. Proceedings of the British Machine Vision Conference, 1(2019).

[24] K. Yang, W. Fang, Y. Zhao, N. Deng. Iteratively reweighted midpoint method for fast multiple view triangulation. IEEE Robot. Autom. Lett., 4, 708(2019).

[25] B. Micusik, T. Pajdla. Structure from motion with wide circular field of view cameras. IEEE Trans. Pattern Anal. Mach. Intell., 28, 1135(2006).

[26] V. Lepetit, F. Moreno-Noguer, P. Fua. EPnP: an accurateo(n) solution to the pnp problem. Int. J. Comput. Vision, 81, 155(2009).

[27] O. Enqvist, F. Kahl, C. Olsson. Non-sequential structure from motion. Proceedings of IEEE International Conference on Computer Vision Workshops, 264(2011).

微信扫一扫：分享

微信扫一扫：分享