• Laser & Optoelectronics Progress
  • Vol. 62, Issue 8, 0815010 (2025)
Lei Xiao, Peng Hu*, and Junjie Ma
Author Affiliations
  • College of Artificial Intelligence, Anhui University of Science & Technology, Huainan 232001, Anhui , China
  • show less
    DOI: 10.3788/LOP241870 Cite this Article Set citation alerts
    Lei Xiao, Peng Hu, Junjie Ma. Self-Supervised Monocular Depth Estimation Model Based on Global Information Correlation Under Influence of Local Attention[J]. Laser & Optoelectronics Progress, 2025, 62(8): 0815010 Copy Citation Text show less
    References

    [1] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network[EB/OL]. https://arxiv.org/abs/1406.2283

    [2] Yang C Z, Xiang S, Deng H P et al. Depth estimation for phase-coding light field based on neural network[J]. Laser & Optoelectronics Progress, 60, 1211002(2023).

    [3] Zhou T H, Brown M, Snavely N et al. Unsupervised learning of depth and ego-motion from video[C], 6612-6619(2017).

    [4] Wang J J, Liu Y, Wu Y H et al. Monocular depth estimation method based on plane coefficient representation with adaptive depth distribution[J]. Acta Optica Sinica, 43, 1415001(2023).

    [5] Casser V, Pirk S, Mahjourian R et al. Unsupervised monocular depth and ego-motion learning with structure and semantics[C], 381-388(2019).

    [6] Godard C, Mac Aodha O, Brostow G J. Unsupervised monocular depth estimation with left-right consistency[C], 6602-6611(2017).

    [7] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C], 770-778(2016).

    [8] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL]. https://arxiv.org/abs/1409.1556v6

    [9] Lü X Y, Liu L, Wang M M et al. HR-depth: high resolution self-supervised monocular depth estimation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 2294-2301(2021).

    [10] Dosovitskiy A, Beyer L, Kolesnikov A et al. An image is worth[EB/OL], 16-16. https://arxiv.org/abs/2010.11929

    [11] Bae J, Moon S, Im S. MonoFormer: towards generalization of self-supervised monocular depth estimation with Transformers[EB/OL]. https://arxiv.org/abs/2205.11083v1

    [12] Varma A, Chawla H, Zonooz B et al. Transformers in self-supervised monocular depth estimation with unknown camera intrinsics[EB/OL]. https://arxiv.org/abs/2202.03131

    [13] Lin X, Guo Y, Zhao Y Q et al. Depth estimation method of light field based on attention mechanism of neighborhood pixel[J]. Acta Optica Sinica, 43, 2115003(2023).

    [14] Liu Z, Lin Y T, Cao Y et al. Swin Transformer: hierarchical vision Transformer using shifted windows[EB/OL]. https://arxiv.org/abs/2103.14030v2

    [15] Geiger A, Lenz P, Stiller C et al. Vision meets robotics: the KITTI dataset[J]. The International Journal of Robotics Research, 32, 1231-1237(2013).

    [16] Liu F Y, Shen C H, Lin G S et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2024-2039(2016).

    [17] Laina I, Rupprecht C, Belagiannis V et al. Deeper depth prediction with fully convolutional residual networks[C], 239-248(2016).

    [18] Fu H, Gong M M, Wang C H et al. Deep ordinal regression network for monocular depth estimation[C], 2002-2011(2018).

    [19] Garg R, Vijay Kumar B G, Carneiro G et al. Unsupervised CNN for single view depth estimation: geometry to the rescue[M]. Computer vision-ECCV 2016, 9912, 740-756(2016).

    [20] Jung H, Park E, Yoo S. Fine-grained semantics-aware representation enhancement for self-supervised monocular depth estimation[C], 12622-12632(2021).

    [21] Poggi M, Aleotti F, Tosi F et al. On the uncertainty of self-supervised monocular depth estimation[C], 3224-3234(2020).

    [22] Godard C, Mac Aodha O, Firman M et al. Digging into self-supervised monocular depth estimation[C], 3827-3837(2019).

    [23] Yan J X, Zhao H, Bu P H et al. Channel-wise attention-based network for self-supervised monocular depth estimation[C], 464-473(2021).

    [24] Zhou Z K, Fan X N, Shi P F et al. R-MSFM: recurrent multi-scale feature modulation for monocular depth estimating[C], 12757-12766(2021).

    [25] Strudel R, Garcia R, Laptev I et al. Segmenter: transformer for semantic segmentation[C], 7242-7252(2021).

    [26] Ranftl R, Bochkovskiy A, Koltun V. Vision transformers for dense prediction[C], 12159-12168(2021).

    [27] Hendrycks D, Gimpel K. Bridging nonlinearities and stochastic regularizers with Gaussian error linear units[EB/OL]. https://www.semanticscholar.org/paper/Bridging-Nonlinearities-and-Stochastic-Regularizers-Hendrycks-Gimpel/4361e64f2d12d63476fdc88faf72a0f70d9a2ffb

    [28] Ali A, Touvron H, Caron M et al. XCiT: cross-covariance image transformers[C], 20014-20027(2021).

    [29] Wang Z, Bovik A C, Sheikh H R et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 13, 600-612(2004).

    [30] Loshchilov I, Hutter F. Decoupled weight decay regularization[EB/OL]. https://arxiv.org/abs/1711.05101

    [31] Loshchilov I, Hutter F. SGDR: stochastic gradient descent with warm restarts[C](2017).

    [32] Klingner M, Termöhlen J A, Mikolajczyk J et al. Self-supervised monocular depth estimation: solving the dynamic object problem by semantic guidance[M]. Computer vision-ECCV 2020, 12365, 582-600(2020).

    [33] Guizilini V, Ambruș R, Pillai S et al. 3D packing for self-supervised monocular depth estimation[C], 2482-2491(2020).

    [34] Choi J, Jung D, Lee D et al. SAFENet: self-supervised monocular depth estimation with semantic-aware feature extraction[EB/OL]. https://arxiv.org/abs/2010.02893v3

    [35] Johnston A, Carneiro G. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume[C], 4755-4764(2020).

    [36] Zhou H, Greenwood D, Taylor S. Self-supervised monocular depth estimation with internal feature fusion[C], 2-8(2021).

    Lei Xiao, Peng Hu, Junjie Ma. Self-Supervised Monocular Depth Estimation Model Based on Global Information Correlation Under Influence of Local Attention[J]. Laser & Optoelectronics Progress, 2025, 62(8): 0815010
    Download Citation