• 基于FPGA的灵活量化卷积器

    A flexible quantized convolution engine based on FPGA

    • 针对在资源受限的边缘设备上部署CNN时面临的精度损失、计算吞吐量及卷积运算效率问题,提出了一种灵活的量化卷积器。该方法通过HA-MPLF量化策略,将BN(Batch Normalization)层折叠至卷积层,并为每层过滤器分配最优精度,以在精度和计算性能间取得平衡。同时,提出一种基于卷积分解的计算方法,有效支持不同大小过滤器的处理。在FPGA(Field-Programmable Gate Array)平台上,该量化卷积器采用通道优先的运算策略,结合DSP(Digital Signal Processor)打包和级联技术,显著提升资源利用效率。在ZCU102 FPGA上进行实验验证,结果表明:该方法在MobileNet-V2、ResNet18和ResNet50上的精度分别达到90.13%、89.51%和93.33%,并实现了吞吐量的显著提升,为边缘设备上的CNN部署提供了一种高效解决方案。

       

      Abstract: A flexible quantized convolution engine is proposed to address precision loss, computation throughput, and convolution efficiency issues when deploying Convolutional Neural Networks (CNNs) on resource-constrained edge devices. The method utilizes the HA-MPLF quantization strategy to fold the Batch Normalization (BN) layer into the convolution layer and assigns optimal precision to each filter, balancing accuracy and computation performance. Meanwhile, a convolution operation method based on convolution decomposition is proposed to efficiently handle convolution kernels of different sizes. On the FPGA platform, the quantized convolution engine adopts a channel-first computation strategy, combining DSP packing and cascading techniques to significantly improve resource utilization. Experimental validation on the ZCU102 FPGA shows that the method achieves accuracies of 90.13%, 89.51%, and 93.33% for MobileNet-V2, ResNet18, and ResNet50, respectively, and significantly improves throughput, providing an efficient solution for CNN deployment on edge devices.

       

    /

    返回文章
    返回