• Infrared Technology
  • Vol. 47, Issue 4, 468 (2025)
Zhihui YE1, Jian WU1, Xiaozhong ZHAO1, Wenjuan WANG1, and Xinguang SHAO2
Author Affiliations
  • 1China Tobacco Zhejiang Industrial Co. LTD., Hangzhou 310008, China
  • 2Polytechnic Instiute, Zhejiang University, Hangzhou 310058, China
  • show less
    DOI: Cite this Article
    YE Zhihui, WU Jian, ZHAO Xiaozhong, WANG Wenjuan, SHAO Xinguang. Multimodal Object Detection Based on Feature Interaction and Adaptive Grouping Fusion[J]. Infrared Technology, 2025, 47(4): 468 Copy Citation Text show less

    Abstract

    To improve the performance of object detection methods in complex scenes, a multimodal object detection model based on feature interaction and adaptive grouping fusion is proposed by combining deep learning algorithms with multimodal information fusion technology. The model uses infrared and visible object images as inputs, constructs a symmetrical dual-branch feature extraction structure based on the PP-LCNet network, and introduces a feature interaction module to ensure complementary information between different modal object features during the extraction process. Secondly, a binary grouping attention mechanism was designed. Global pooling combined with the sign function was used to group the output features of the interaction module into their respective object categories, and spatial attention mechanisms were used to enhance the object information in each group of features. Finally, based on the group-enhanced features, similar feature groups at different scales were extracted, and multi-scale fusion was carried out through adaptive weighting from deep to shallow. Object prediction was then achieved based on the fused features at each scale. The experimental results show that the proposed method significantly improves multimodal feature interaction, key feature enhancement, and multi-scale fusion. Moreover, in complex scenarios, the model exhibits higher robustness and can be better applied to different scenarios.
    YE Zhihui, WU Jian, ZHAO Xiaozhong, WANG Wenjuan, SHAO Xinguang. Multimodal Object Detection Based on Feature Interaction and Adaptive Grouping Fusion[J]. Infrared Technology, 2025, 47(4): 468
    Download Citation