• Opto-Electronic Engineering
  • Vol. 51, Issue 10, 240174 (2024)
Hongmin Zhang*, Qianqian Tian, Dingding Yan, and Lingyu Bu
Author Affiliations
  • School of Electrical and Electronic Engineering,Chongqing University of Technology,Chongqing 400054,China
  • show less
    DOI: 10.12086/oee.2024.240174 Cite this Article
    Hongmin Zhang, Qianqian Tian, Dingding Yan, Lingyu Bu. GLCrowd: a weakly supervised global-local attention model for congested crowd counting[J]. Opto-Electronic Engineering, 2024, 51(10): 240174 Copy Citation Text show less

    Abstract

    To address the challenges of crowd counting in dense scenes,such as complex backgrounds and scale variations,we propose a weakly supervised crowd counting model for dense scenes,named GLCrowd,which integrates global and local attention mechanisms. First,we design a local attention module combined with deep convolution to enhance local features through context weights while leveraging feature weight sharing to capture high-frequency local information. Second,the Vision Transformer (ViT) self-attention mechanism is used to capture low-frequency global information. Finally,the global and local attention mechanisms are effectively fused,and counting is accomplished through a regression token. The model was tested on the Shanghai Tech Part A,Shanghai Tech Part B,UCF-QNRF,and UCF_CC_50 datasets,achieving MAE values of 64.884,8.958,95.523,and 209.660,and MSE values of 104.411,16.202,173.453,and 282.217,respectively. The results demonstrate that the proposed GLCrowd model exhibits strong performance in crowd counting within dense scenes.
    Hongmin Zhang, Qianqian Tian, Dingding Yan, Lingyu Bu. GLCrowd: a weakly supervised global-local attention model for congested crowd counting[J]. Opto-Electronic Engineering, 2024, 51(10): 240174
    Download Citation