• Optics and Precision Engineering
  • Vol. 32, Issue 24, 3603 (2024)
Qiqi KOU1, Weichen WANG2, Chenggong HAN2, Chen LÜ2, Deqiang CHENG2, and Yucheng JI3、*
Author Affiliations
  • 1School of Computer Science and Technology,China University of Mining and Technology, Xuzhou226,China
  • 2School of Information and Control Engineering,China University of Mining and Technology, Xuzhou1116,China
  • 3Department Big Data Center,Ministry of Emergency Management, Beijing10001, China
  • show less
    DOI: 10.37188/OPE.20243224.3603 Cite this Article
    Qiqi KOU, Weichen WANG, Chenggong HAN, Chen LÜ, Deqiang CHENG, Yucheng JI. Multi-frame self-supervised monocular depth estimation with multi-scale feature enhancement[J]. Optics and Precision Engineering, 2024, 32(24): 3603 Copy Citation Text show less
    Process of self-supervised depth estimation algorithm
    Fig. 1. Process of self-supervised depth estimation algorithm
    Cost volume construction of multi-frame depth estimation network
    Fig. 2. Cost volume construction of multi-frame depth estimation network
    Our depth estimation network architecture
    Fig. 3. Our depth estimation network architecture
    Activation Module Based onVision Attention(Act-VAN)
    Fig. 4. Activation Module Based onVision Attention(Act-VAN)
    Large kernel attention
    Fig. 5. Large kernel attention
    Structure enhancement module
    Fig. 6. Structure enhancement module
    Structure of dynamic upsampling
    Fig. 7. Structure of dynamic upsampling
    Comparison of visualization results on the KITTI dataset
    Fig. 8. Comparison of visualization results on the KITTI dataset
    Comparison of visualization results on the CityScapes dataset
    Fig. 9. Comparison of visualization results on the CityScapes dataset
    Comparison of results for module design on SqRel
    Fig. 10. Comparison of results for module design on SqRel

    Method

    M

    ResolutionIndicator of Error (Lower is better)Accuracy of Prediction (Higher is better)
    AbsRelSqRelRMSERMSElogδ<1.25δ<1.252δ<1.253
    Monodepth28640×1920.1150.9034.8630.1930.8770.9590.981
    Johnston et al.26640×1920.1060.8614.6990.1850.8890.9620.982
    Packnet-SFM27640×1920.1110.7854.6010.1890.8780.9600.982
    VA-Depth28640×1920.1120.8644.8040.1900.8770.9590.982
    Zeeshan et al.29640×1920.1130.9034.8630.1930.8770.9590.981
    STDepthFormer30640×1920.1100.8054.6780.1870.8780.9610.983
    Patil et al.31640×1920.1110.8214.6500.1870.8830.9610.982
    Suanders et al.32640×1920.1000.7474.4550.1770.8950.9660.984
    Wang et al.33640×1920.1060.7994.6620.1870.8890.9610.983
    Manydepth13640×1920.0980.7704.4590.1760.9000.9650.983
    Ours640×1920.0950.7434.3740.1750.9030.9650.983
    Monodepth281 024×3200.1150.8824.7010.1900.8790.9610.982
    Packnet-SFM271 280×3840.1070.8024.5380.1860.8890.9620.981
    Shu et al.341 024×3200.1040.7294.4810.1790 8930.9650.984
    Wang et al.331 024×3200.1060.7734.4910.1850.8900.9620.982
    Manydpeth131 024×3200.0930.7154.2450.1720.9090.9660.983
    Ours1 024×3200.0900.7034.2130.1700.9130.9660.983
    Table 1. Test results on the KITTI dataset
    MethodAbsRelSqRelRMSERMSElog
    Monodepth280.1291.5696.8760.187
    Li et al.350.1191.2906.9800.190
    Manydepth130.1141.1936.2230.170
    Ours0.1111.1656.1960.168
    Table 2. Test results on the CityScapes dataset
    ModelParams/MFLOPs/G
    Monodepth2814.38.0
    Manydepth1314.410.2
    Ours22.016.3
    Table 3. Model parametric and computational complexity
    ExperimentAct-VANSEMDySampleIndicator of Error (Lower is better)Accuracy of Prediction (Higher is better)
    AbsRelSqRelRMSERMSElogδ<1.25δ<1.252δ<1.253
    1×××0.0980.7704.4590.1760.9000.9650.983
    2××0.0980.7654.4410.1760.9000.9650.983
    3××0.0960.7534.3790.1750.9020.9650.983
    4×0.0960.7554.4360.1760.9010.9650.983
    50.0950.7434.3740.1750.9030.9650.983
    Table 4. Ablation experiment of modules
    ExperimentMethodIndicator of Error (Lower is better)Accuracy of Prediction (Higher is better)
    AbsRelSqRelRMSERMSElogδ<1.25δ<1.252δ<1.253
    1Baseline0.0980.7704.4590.1760.9000.9650.983
    2VAN0.0970.7654.4950.1760.9010.9650.983
    3Act-VAN0.0960.7564.3920.1760.9020.9650.983
    4SEM(w/o)(LKA)0.0960.7614.4400.1760.9010.9650.983
    5SEM0.0960.7594.4370.1760.9010.9650.983
    Table 5. Ablation experiment of modules design
    Qiqi KOU, Weichen WANG, Chenggong HAN, Chen LÜ, Deqiang CHENG, Yucheng JI. Multi-frame self-supervised monocular depth estimation with multi-scale feature enhancement[J]. Optics and Precision Engineering, 2024, 32(24): 3603
    Download Citation