Multimodal LiDAR Enhancement Algorithm Based on Multiscale Features

Yikai Luo; Linyuan He; shiping Ma

doi:10.3788/LOP240778

Abstract

LiDAR is widely used to scan the surrounding environment, obtain measurement data, and construct a three-dimensional (3D) point cloud in vehicle environment perception tasks. However, it cannot perceive semantic information in the environment, which limits its effectiveness in 3D object detection. Consequently, in this study, we design a multi-modal fusion LiDAR-enhancement algorithm based on multiscale features and introduce some innovations under the Transformer framework to enhance the 3D object detection effect of LiDAR in complex environments. In the encoder, multiscale semantic features extracted by a semantic-aware aggregation module will be used for cross-modal feature fusion, whereas scale self-attention and proposal-guided initialization in the decoder will be used to make the prediction process more efficient. We also design a triangular loss function to improve the regression of the prediction box position, which restricts the regression position of the prediction box between 2D and 3D labels with triangular geometric constraints to obtain better prediction results. The experiments conducted on the nuScenes dataset have demonstrated the effectiveness and robustness of the proposed model.

微信扫一扫：分享

微信扫一扫：分享