Image data has been widely used in various applications due to advancements in data acquisition platforms such as satellites, drones, and mobile measurement vehicles, as well as sensors like multispectral and hyperspectral cameras, synthetic aperture radar (SAR), and lidar scanners. These advancements enable high-resolution data collection across different wavelengths and perspectives, enhancing applications in remote sensing, real-time localization, and medical diagnostics. However, integrating multimodal image data from various sensors presents challenges due to nonlinear radiation distortion (NRD) and geometric variations like rotation and scale differences. Traditional multimodal image matching methods often struggle with these complexities. Conventional approaches either convert images to a common modality or enhance feature robustness against modality differences. While recent advances in deep learning have improved matching performance, practical applications still face challenges due to the lack of comprehensive datasets, difficulties with complex data, and high computational demands. To address these issues, we propose a nonlinear radiation and geometric invariant matching (NRGM) method. NRGM effectively handles NRD, scale variation, and rotation by using multi-directional and multi-scale filtering to build direction index maps with stable local structures. A robust principal direction estimation method achieves rotation invariance, and a novel matching framework combining geometric invariant and template matching improves accuracy. This approach significantly enhances multimodal image matching by overcoming both geometric and radiometric distortion.
NRGM adopts a two-stage framework involving feature matching and template matching. In the feature matching stage, images are transformed into the frequency domain using Log-Gabor filters, and key features are detected via phase congruency and weighted moment maps to improve robustness against illumination variation. The primary direction estimation technique involves extracting a directional index map from Log-Gabor filter responses, summing the index values within a local region, and analyzing the histogram to determine the feature’s primary orientation. Feature correspondences are established using nearest-neighbor distance, with outlier removal refined by the fast sample consensus algorithm. In the template matching stage, high-dimensional template features are constructed from Log-Gabor responses, and a three-dimensional phase correlation strategy is employed for precise matching, effectively aligning features despite variations in scale and rotation. NRGM integrates robust feature detection, accurate direction estimation, and precise template matching, delivering high-quality results even in the presence of severe NRD and geometric distortions.
A comprehensive evaluation of NRGM is presented, including parameter settings, qualitative and quantitative comparisons with advanced algorithms such as SIFT, RIFT, ASS, GIFT, HOWP, MatchFormer, and SemLA, as well as robustness testing. A diverse set of multimodal images—such as visible light, infrared, and depth images—is used to assess NRGM’s performance. Sensitivity analysis identifies optimal parameters for NRGM: scale s=4, orientations o=12, and window size l=84, resulting in the highest number of correct matches (NCM) and the lowest root mean square error (RMSE) (Table 2). Figs. 8?10 demonstrate that SIFT performs well for RGB-NIR pairs but struggles with modality differences, while RIFT, despite its robustness to NRD, fails to handle scale variations effectively. HOWP, MatchFormer, and SemLA show instability with multimodal images. While ASS and GIFT perform reliably, they exhibit limitations with Optical-SAR and Optical-IR pairs. In contrast, NRGM excels by correctly matching all 9 image pairs. Its advanced feature detection, descriptor, and matching enhancement strategies ensure high precision and robustness, making it highly effective across diverse modalities and conditions. Tables 3?5 provide detailed quantitative results for different algorithms across visual, medical, and remote sensing datasets. SIFT performs well with RGB-NIR pairs but struggles with larger modality differences. HOWP, MatchFormer, and SemLA show some robustness but deliver inconsistent results across modalities. RIFT, ASS, and GIFT demonstrate higher reliability, with GIFT particularly excelling with Optical-SAR and Optical-IR pairs. NRGM, however, outperforms all algorithms, successfully matching all image pairs and achieving superior metrics in NCM, Precision, Recall, and RMSE. NRGM's advanced feature detection, feature description, and enhancement strategies make it highly effective and precise in addressing diverse multimodal challenges. In addition, NRGM demonstrates stable performance with periodic accuracy fluctuations during image rotations. Table 6 shows average run times for SIFT, RIFT, HOWP, ASS, MatchFormer, SemLA, GIFT, and NRGM on various datasets. While SIFT, implemented in C++, remains the fastest, SemLA, MatchFormer, and HOWP perform efficiently but struggle on challenging multimodal datasets. RIFT has the lowest efficiency due to its iterative optimization. ASS, GIFT, and NRGM share similar efficiency levels, outperforming RIFT, with NRGM showing advantages in both accuracy and efficiency.
A novel method, NRGM, is introduced for multimodal image matching, designed to effectively handle various image modalities, scales, rotations, and other geometric transformations. NRGM leverages multi-scale and multi-directional Log-Gabor filter responses, providing the algorithm with inherent robustness against noise. The method begins by detecting prominent and highly repetitive feature points on the phase congruency maps. It then utilizes directional index information to describe local image structures and estimate the principal orientation, ensuring that NRGM remains invariant to image rotations. Finally, NRGM enhances matching performance by constructing template features from multi-scale and multi-directional filter results. Extensive qualitative and quantitative experiments across diverse image modalities validate NRGM's effectiveness. Future research will focus on using convolutional neural networks to create more precise orientation index maps, improving the estimation of principal orientation and feature descriptor construction. To mitigate the high computational cost associated with Log-Gabor filters, alternative lightweight filters or methods will be explored to extract multi-scale and multi-directional image information, aiming to improve the algorithm's computational efficiency.