• Laser & Optoelectronics Progress
  • Vol. 61, Issue 18, 1837015 (2024)
Wen Guo1, Hong Yang1, and Chang Liu2,*
Author Affiliations
  • 1School of science, Beijing Information Science and Technology University, Beijing 100029, China
  • 2Institute of Applied Mathematics, Beijing Information Science and Technology University, Beijing 100101, China
  • show less
    DOI: 10.3788/LOP240534 Cite this Article Set citation alerts
    Wen Guo, Hong Yang, Chang Liu. Semantic Segmentation of Dual-Source Remote Sensing Images Based on Gated Attention and Multiscale Residual Fusion[J]. Laser & Optoelectronics Progress, 2024, 61(18): 1837015 Copy Citation Text show less

    Abstract

    The semantic segmentation of remote sensing images is a crucial step in the analysis of geographic-object-based remote sensing images. Combining remote sensing image data with elevation data effectively enhances feature complementarity, thereby improving pixel-level segmentation accuracy. This study proposes a dual-source remote sensing image semantic segmentation model, STAM-SegNet, that leverages the Swin Transformer backbone network to extract multiscale features. The proposed model integrates an adaptive gating attention mechanism and a multiscale residual fusion strategy. The adaptive gated attention mechanism includes gated channel attention and gated spatial attention mechanisms. Gated channel attention enhances the correlation between dual-source data features through competition/cooperation mechanisms, effectively extracting complementary features of dual-source data. In contrast, gated spatial attention uses spatial contextual information to dynamically filter out high-level semantic features and select accurate detail features. The multiscale feature residual fusion strategy captures multiscale contextual information via multiscale refinement and residual structure, thereby emphasizing detailed features, such as shadows and boundaries, and improving the model's training speed. Experiments conducted on the Vaihingen and Potsdam datasets demonstrate that the proposed model achieved an average F1-score of 89.66% and 92.75%, respectively, surpassing networks such as DeepLabV3+, UperNet, DANet, TransUNet, and Swin-UNet in terms of segmentation accuracy.
    x˜=F{xinput1,xinput2|α,β,γ},α,β,γRc
    sc=αxinput2=αi=1Hj=1W(xinputi,j)212
    s˜c=c12scs2=cscc=1C(sc)2+ε12
    x˜=xinput1+tan (βs˜c+γ)
    x˜o=F{xinput|μ},μRc
    k1=Sf7×7MaxPoolxi1;AvgPoolxi2=Sf7×7xmax ;xavg
    k2=k1μ
    xo=SBDSC3×3(k2)
    xo1=xo xi1
    xo2=(1-xo) xi2
    x˜o=xo1+xo2
    Rprecision=NTPNTP+NFP
    Rrecall=NTPNTP+NFN
    F1=2×Rprecision×RrecallRprecision+Rrecall
    ROA=NTP+NTNNTP+NFN+NTN+NFP
    Wen Guo, Hong Yang, Chang Liu. Semantic Segmentation of Dual-Source Remote Sensing Images Based on Gated Attention and Multiscale Residual Fusion[J]. Laser & Optoelectronics Progress, 2024, 61(18): 1837015
    Download Citation