
- Chinese Optics Letters
- Vol. 19, Issue 2, 021101 (2021)
Abstract
1. Introduction
There has been an escalation in the amount of work dealing with three-dimensional (3D) reconstruction[
Over the years, many triangulation methods have been proposed, in particular for the case of two or three views, where robust and fast methods have been developed[
For more accurate triangulation performance, the achievement of optimal 3D scene points is given enough attention, but most solve for very small numbers of views, such as two or three views[
Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you!Sign up now
To ensure the global optimum, there are also some methods based on convex optimization on the
Based on the
2. Notations and Preliminaries
The midpoint method for triangulation has a clear geometric description, where the estimated 3D location is the midpoint of the common perpendicular to the two observation rays given a two-view triangulation. For more than two views, it also means the minimum distances from the 3D point to all of the observation vectors. As shown in Fig. 1, the location of the 3D point
Figure 1.Geometric indication of multi-view triangulation.
Thus, the 3D point
3. Propagation-Based Incremental Triangulation
Although the midpoint method is fast enough for multiple-view triangulation, it is considered inaccurate when the camera views are nearly parallel. Moreover, as depicted in Fig. 1, when the 3D point
To address the biased estimation of the midpoint method by formulating their optimization problem in terms of the image reprojection error, the angular reprojection error is another popular choice, and it is supposed to own superior adaptability to different camera models[
According to Eq. (2) derived from the
Thus, the cost function is unbiased for all of the points lying on the same observational vector, regardless of the depth about the 3D point to be estimated. Besides,
To minimize the unbiased
Thus, the target 3D point
In most SfM or SLAM applications, it is well known that temporal triangulation is a repeated work when new 2D observations are extracted from incoming images. More exactly, the 3D scene point may be triangulated in the first few images, and then these available 3D points would be applied to estimate the pose of a new coming image by perspective-n-point (PnP)[
To address the large-scale and sequential image situations, an efficient INT is expected for multiple-view reconstruction based on the initial 3D point estimation. Fortunately, based on the initial estimation in Eq. (1), the next 3D point
It has been confirmed that the achievable reconstruction error decays quadratically as more views are added to the system[
In addition to the subsequent INT, the propagation-based theme is applied when more observations of the same 3D point arrived based on a reliable initial estimation. Increasing observational information will further optimize the existing triangulation results[
4. Experiments
Experimental results are presented to validate the accuracy and efficiency of the proposed methods. The other four benchmarks of multiple triangulation methods are compared. The first one is the native multiple-view middle point (MVMP) method; it is currently the fastest method for multiple-view triangulation, though it is labeled as inaccurate. The second method is the unbiased midpoint method (IRMP)[
4.1. Experiments on synthetic data
In the synthetic dataset, the same camera calibration parameters are used with an image size of
Figure 2.Four types of synthetic triangulation instances. (a) Type A: cameras and 3D points are randomly distributed; (b) Type B: the camera moves along a curved trajectory around 3D points; (c) Type C: the camera moves on a circle while 3D points are located at the center; (d) Type D: the camera moves along a curved trajectory towards the 3D scene.
As illustrated in Fig. 2, 100 cameras and 1000 3D points are generated as different synthetic scenarios. Then, these 3D points are projected back to the image planes with the calibrated camera matrix, corrupting by Gaussian noise with a 5 pixel covariance. Based on the initialization from unbiased temporal triangulation, the INT problem can be deduced with a successive coming image until all of the sequential images are processed.
To perform a quantitative evaluation on the proposed propagation-based INT (ININT) method, MVMP, IRMP, NN, and GMRE are applied to perform incremental multiple-view triangulation, respectively. During the INT process, these four compared methods carry out the 3D point estimation repeatedly when a new feature observation arrives, lacking the use of the previous triangulated results. In contrast, the proposed INT method considers the consequence of the unbiased temporal triangulation as the seed point, and then only the propagation is executed when encountering a more featured track. It is known that the accuracy of the 3D point is proportional to the number of its observations, and thus the last estimated 3D point with all of the observations is deemed as the most accurate one. Therefore, these 3D points derived from all their observations are projected back to the image planes again to compute the mean 2D reprojection errors. Moreover, the mean 3D errors, the Euclidean distances between generated 3D points with the triangulated ones, are also taken into consideration.
Experimental results are illustrated in Table 1; because iteration is avoided when new observations are added to the INT, the propagation-based INT method is always the fast one. Moreover, both the 3D and 2D errors on different synthetic datasets derived from the proposed INT are superior to the classic MVMP method. Although the seed point of the INT is derived from the unbiased temporal triangulation IRMP, the final 3D results after recursive propagations are close to the continuous IRMP. Compared to the proposed INT method, the GMRE method can obtain slightly better triangulation accuracy, but it takes an enormous amount of runtime. We also find that the accuracy performances of IRMP and NN methods in INT scenarios are almost the same except the runtime, which is also consistent with our previous convergence analysis.
Method | Type A | Type B | Type C | ||||||
---|---|---|---|---|---|---|---|---|---|
Time (s) | 3D Error (m) | 2D Error (pixels) | Time (s) | 3D Error (m) | 2D Error (pixels) | Time (s) | 3D Error (m) | 2D Error (pixels) | |
MVMP | 1.06 | 0.0192 | 5.1115 | 2.45 | 0.0352 | 5.0493 | 2.14 | 0.0292 | 4.9511 |
IRMP | 4.01 | 0.0139 | 4.7698 | 11.76 | 0.0303 | 4.9635 | 8.41 | 0.0251 | 4.8888 |
INT | 0.68 | 0.0146 | 4.7955 | 0.91 | 0.0312 | 4.9649 | 1.08 | 0.0258 | 4.8901 |
ININT | 0.98 | 0.0145 | 4.7936 | 1.45 | 0.0307 | 4.9537 | 1.62 | 0.0252 | 4.8895 |
NN | 6.51 | 0.0139 | 4.7698 | 22.60 | 0.0303 | 4.9635 | 14.07 | 0.0251 | 4.8888 |
GMRE | 7.08 | 0.0124 | 4.7059 | 22.10 | 0.0272 | 4.9621 | 15.61 | 0.0237 | 4.8680 |
Table 1. Comparisons between Different Incremental Triangulation Performances
In addition to INT, the ININT method based on a few iterations is applied after the propagated 3D point. Thus, the runtime is a bit longer than the INT method with a superior 3D accuracy. Both the proposed INT and ININT methods can balance the runtime and accuracy in large-scale INTs. The experimental results also demonstrate the convergence ability of the proposed method in different situations.
For further verification of the robustness to varied noises, different Gaussian noise levels are generated to corrupt the 2D observations. Based on the Type D dataset, 2, 5, and 8 pixels covariances are simulated, respectively, while different 3D points are generated, and the performance of different INTs is illustrated in Fig. 3. It is easy to find that the proposed INT (ININT) method is faster than all other methods while obtaining comparable 3D and 2D accuracy, except for the inaccurate MVMP method.
Figure 3.Time and accuracy analysis with different noise levels on Type D synthetic data. (a), (b), and (c) are the results of the time, 3D error, and 2D error when the Gaussian covariance is set as 2 pixels; (d), (e), and (f) are the results of the time, 3D error, and 2D error when the Gaussian covariance is set as 5 pixels; (g), (h), and (i) are the results of the time, 3D error, and 2D error when the Gaussian covariance is set as 8 pixels.
Compared to the previous synthetic dataset Types A, B, and C, the superior performance on the runtime is obvious because the number of feature tracks during INT is large in the Type D dataset. The results also illustrate that the proposed INT(ININT) methods can provide an alternative for large-scale INTs.
In addition to the efficiency and accuracy evaluation of the proposed method, the convergence performance during INT is verified in this section, where the 3D locations of the synthetic points are assumed as the ground truth, and INT is carried out when new images arrive one by one. Other methods, such as MVMP, IRMP, NN, and GMRE, are also applied to perform the multiple-view triangulation repeatedly after collecting all of the observation vectors.
Due to the length of the feature track for every 3D point being varied during INT, the overall convergence performance of the INT (ININT) method is characterized with the percentage of the views regardless of the number of views, where the total number of the feature track is seen as the 100%, and the mean 3D errors along with INT are calculated with the mean 3D error. Given different synthetic datasets, the results of INT are depicted in Fig. 4. It is easy to find that all the triangulation methods can achieve a higher 3D accuracy with more increased feature observations, and the achievable 3D error seemly decays quadratically as more views are added to the system[
Figure 4.Overall convergence of INT with different datasets. (a) Convergence curve of Type A dataset; (b) convergence curve of Type B dataset; (c) convergence curve of Type C dataset; (d) convergence curve of Type D dataset.
4.2. Experiments on real data
We also evaluate our INT method by the publicly available large-scale dataset[
Data | Total Runtime (s) | Last Mean 3D Error | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
MVMP | IRMP | INT | ININT | NN | GMRE | MVMP | IRMP | INT | ININT | NN | GMRE | |
E | 113.40 | 941.01 | 125.16 | 204.96 | 1477.41 | 1386.46 | 0.0148 | 0.0131 | 0.0139 | 0.0134 | 0.0131 | 0.0124 |
F | 110.15 | 749.16 | 63.84 | 110.93 | 1187.04 | 1195.77 | 0.0020 | 0.0010 | 0.0014 | 0.0011 | 0.0010 | 0.0002 |
G | 97.99 | 469.11 | 83.55 | 138.63 | 797.93 | 665.38 | 0.0077 | 0.0072 | 0.0075 | 0.0072 | 0.0072 | 0.0067 |
H | 23.29 | 206.40 | 21.24 | 35.97 | 295.81 | 246.67 | 0.0018 | 0.0009 | 0.0012 | 0.0010 | 0.0009 | 0.0004 |
Table 2. Runtime Comparisons of Different Methods with Real Datasets
Figure 5.INT based on real datasets. (a) Lund Cathedral, (b) Orebro Castle, (c) Ystad Monestary, and (d) Skansen Kronan.
5. Conclusion
An efficient INT method based on propagation for sequential multiple views is proposed. Instead of collecting all of the observations of the 3D point to be located, the proposed INT method only updates the observations of the current view, greatly outperforming the current popular
References
[7] J. Schonberger, J. Frahm. Structure-from-motion revisited. Proceedings of IEEE International Conference on Computer Vision, 4104(2016).
[10] C. Forster, M. Pizzoli, D. Scaramuzza. SVO: fast semi-direct monocular visual odometry. Proceedings of IEEE International Conference on Robotics and Automation, 15(2014).
[11] S. Lee, J. Civera. Closed-form optimal two-view triangulation based on angular errors. Proceedings of IEEE International Conference on Computer Vision, 2681(2019).
[12] Z. Kukelova, T. Pajdla, M. Bujnak. Fast and stable algebraic solution to L2 three-view triangulation. Proceedings of the International Conference on 3D Vision, 326(2013).
[14] R. I. Hartley, A. Zisserman. Multiple View Geometry in Computer Vision(2004).
[15] R. Hartley, P. Sturm. Triangulation. Comput. Vis. Image Und., 68, 146(1997).
[16] P. Lindstrom. Triangulation made easy. Proceedings of IEEE International Conference on Computer Vision, 1554(2010).
[17] H. Stewenius, F. Schaffalitzky, D. Nister. How hard is 3-view triangulation really?. Proceedings of IEEE International Conference on Computer Vision, 686(2005).
[18] R. Hartley, F. Schaffalitzky. L∞ minimization in geometric reconstruction problems. Proceedings of IEEE Computer Vision and Pattern Recognition, 504(2004).
[19] H. Li. A practical algorithm for L∞ triangulation with outliers. Proceedings of IEEE Computer Vision and Pattern Recognition, 1(2007).
[22] S. Recker, M. Hess-Flores, K. Joy. Fury of the swarm: efficient and very accurate triangulation for multi-view scene reconstruction. Proceedings of IEEE International Conference on Computer Vision Workshops, 652(2013).
[23] S. Lee, J. Civera. Triangulation: why optimize?. Proceedings of the British Machine Vision Conference, 1(2019).
[26] V. Lepetit, F. Moreno-Noguer, P. Fua. EPnP: an accurateo(n) solution to the pnp problem. Int. J. Comput. Vision, 81, 155(2009).
[27] O. Enqvist, F. Kahl, C. Olsson. Non-sequential structure from motion. Proceedings of IEEE International Conference on Computer Vision Workshops, 264(2011).

Set citation alerts for the article
Please enter your email address