With the advancement of Earth observation technology, there is an increasingly urgent need to develop remote sensing edge intelligence applications. These applications aim to perform object detection and analysis directly on edge devices such as satellites or drones, thereby conserving transmission bandwidth, processing time, and resource consumption. Deep learning, renowned for its powerful feature extraction capabilities, has been extensively researched and applied in optical remote sensing image object detection. However, the continuous pursuit of higher detection accuracy has led to deep learning object detection models grappling with issues such as high complexity, a large number of parameters, massive scale, and low algorithmic efficiency. Due to constraints on volume, weight, and power consumption, edge devices often lack large storage and computational resources, limiting the deployment and application of many high-precision deep learning models on them. Therefore, the design of intelligent algorithm models that are faster, more accurate, and more lightweight, has attracted more and more attention in the field of remote sensing. We focus on edge intelligence applications and address the lightweight optimization problem in existing object detection tasks in optical remote sensing images. We pay attention to the detection of diverse object shapes in remote sensing images and propose a deformable convolution-based lightweight model (DCBLM), based on deformable convolution, using YOLOv8n as the baseline model. By employing deformable convolution for feature extraction, optimizing multi-scale feature fusion strategies, and introducing the minimum-point-distance-based intersection over union (MPDIoU) loss function to address shortcomings of the original loss function, the model achieves lightweight optimization while enhancing accuracy. DCBLM reduces the number of model parameters, computational complexity and memory usage, and improves the deployment flexibility of the model in practical applications.
We propose a lightweight model, DCBLM, based on deformable convolution, using the lightweight network YOLOv8n as the baseline model. The C2f deformable convolution feature extraction (C2f_DCFE) module enables the network to dynamically adapt to varying shapes, sizes, and positions of objects, achieving efficient feature extraction while reducing the number of parameters. The cross-scale feature fusion module (CFFM) effectively integrates multi-level features, addressing the issue of feature redundancy in the neck network, thereby enhancing the efficiency of feature fusion and significantly decreasing the number of parameters. The improved MPDIoU loss function specifically mitigates the failure of the loss function when the predicted bounding box and the ground truth bounding box share the same aspect ratio, effectively improving the detection accuracy of the model.
As is shown in Table 4, DCBLM outperforms other lightweight detection methods in the selected dataset. Compared to the baseline model YOLOv8n, DCBLM achieves a 0.8-percentage-point improvement in detection accuracy, while reducing number of parameters, computational load, and model size by 39.5%, 22.2%, and 36.5%, respectively. Table 6 illustrates that for drone-based multi-angle livestock detection in grassland environments, DCBLM also excels, with a 0.9-percentage-point increase in detection accuracy and reductions in number of parameters, computational load, and model size by 39.5%, 22.2%, and 36.5%, respectively. These improvements significantly improve the model’s deployment flexibility. This is attributed to the enhanced C2f_DCFE module, which enables dynamic adaption to varying shapes, sizes, and positions of objects, achieving efficient feature extraction with fewer number of parameters. The CFFM effectively integrates multi-level features, further reducing the number of parameters. Additionally, the MPDIoU loss function enables more accurate object localization, effectively improving detection accuracy. Visualization results in Figs. 6 and 7 demonstrate DCBLM’s superiority over YOLOv8n across different scenarios, validating proposed improvements. Furthermore, inference experiments on an unknown dataset show that DCBLM exhibits lower maximum utilization of graphics processing unit (GPU) than YOLOv8n, indicating that it reduces computational demands, alleviates computational bottlenecks, and enhances inference efficiency. Moreover, DCBLM achieves a mean average precision (mAP) above 60% across all three datasets, with an improvement of over 2 percentage points compared to YOLOv8n. These results highlight that DCBLM offers superior detection accuracy and lightweight performance, with enhanced capabilities for detecting densely distributed small objects and morphologically diverse objects. The model demonstrates excellent applicability for both general and specialized object detection tasks.
In this study, a lightweight model DCBLM based on deformable convolution is proposed. The C2f_DCFE module enables the optimized backbone network to dynamically adapt to varying shapes, sizes, and positions of objects, acquiring more accurate feature information while reducing the number of parameters. The CFFM enhances the efficiency of feature fusion by uniformly reducing the number of channels in feature maps at different scales, achieving effective integration of multi-level features and further reducing the number of parameters. The MPDIoU loss function specifically addresses the issue of loss function failure when the predicted bounding box and the ground truth bounding box share the same aspect ratio, effectively improving detection accuracy and simplifying computations. Experimental results demonstrate that DCBLM exhibits superior detection accuracy and lightweight performance, showing excellent applicability for both general and specialized object detection tasks. Future work will involve optimization and validation across multiple domains and scenarios based on practical application requirements, aiming to further improve the performance and adaptability of the model.