Abstract:
To address the issue that the drastic change in target scale leads to a decline in detection accuracy in complex scenarios, while existing methods rely on increasing the number of parameters at the expense of real-time performance, a multi-scale inverted residual pyramid feature focusing diffusion detection network, MIRPFFD-Net, is proposed. Firstly, a backbone network based on the Residual Multi-Scale Feature Extraction module (RMSFE) is constructed, which effectively alleviates the problem of small target feature information being overwhelmed by adjacent large targets, and improves the feature representation ability. Secondly, an Inverted Residual Pyramid Feature Focusing Module (IRP_FFM) is designed as the neck structure, used to capture rich cross-scale feature information, achieving efficient extraction and fusion of multi-scale features. At the same time, a Feature Diffusion Mechanism (FDM) is introduced to optimize the neck network, diffusing rich contextual semantic features to each detection scale, significantly improving the detection accuracy of multi-scale targets while maintaining real-time performance. Experimental results on the SIMD, RDD2022 and VisDrone2019 public datasets show that the average accuracy of the proposed network reaches 82.7%, 57.5% and 38.5% respectively, which is 2.9%, 1.9% and 7.7% higher than that of YOLOv8n, respectively, significantly improving the detection performance in complex scenarios while maintaining real-time performance.