• MIRPFFD-Net:多尺度倒残差金字塔特征聚焦扩散检测网络

    MIRPFFD-Net: Multi-scale inverse residual pyramid feature focusing diffusion detection network

    • 针对复杂场景下目标尺度变化剧烈导致检测精度下降,而现有方法依赖增加参数量、牺牲实时性的问题,提出了一种多尺度倒残差金字塔特征聚焦扩散检测网络——MIRPFFD-Net。首先,构建基于残差多尺度特征提取模块(Residual Multi-Scale Feature Extraction, RMSFE)的主干网络,有效缓解小目标特征信息被邻近大目标淹没问题,提升了特征表征能力。其次,设计倒残差金字塔特征聚焦模块(Inverted Residual Pyramid Feature Focusing Module, IRP_FFM)作为颈部结构,用于捕获丰富的跨尺度特征信息,实现多尺度特征的高效提取与融合。同时,引入特征扩散机制(Feature Diffusion Mechanism, FDM)优化颈部网络,将富含上下文语义的特征扩散至各检测尺度,全面提升多尺度目标的检测精度。在SIMD、RDD2022和VisDrone2019公共数据集上的实验结果表明:所提网络的平均精度分别达到了82.7%、57.5%和38.5%,相较YOLOv8n分别提升了2.9%、1.9%和7.7%,在保持实时性的前提下显著提升了复杂场景下的检测性能。

       

      Abstract: To address the issue that the drastic change in target scale leads to a decline in detection accuracy in complex scenarios, while existing methods rely on increasing the number of parameters at the expense of real-time performance, a multi-scale inverted residual pyramid feature focusing diffusion detection network, MIRPFFD-Net, is proposed. Firstly, a backbone network based on the Residual Multi-Scale Feature Extraction module (RMSFE) is constructed, which effectively alleviates the problem of small target feature information being overwhelmed by adjacent large targets, and improves the feature representation ability. Secondly, an Inverted Residual Pyramid Feature Focusing Module (IRP_FFM) is designed as the neck structure, used to capture rich cross-scale feature information, achieving efficient extraction and fusion of multi-scale features. At the same time, a Feature Diffusion Mechanism (FDM) is introduced to optimize the neck network, diffusing rich contextual semantic features to each detection scale, significantly improving the detection accuracy of multi-scale targets while maintaining real-time performance. Experimental results on the SIMD, RDD2022 and VisDrone2019 public datasets show that the average accuracy of the proposed network reaches 82.7%, 57.5% and 38.5% respectively, which is 2.9%, 1.9% and 7.7% higher than that of YOLOv8n, respectively, significantly improving the detection performance in complex scenarios while maintaining real-time performance.

       

    /

    返回文章
    返回