Abstract:
A flexible quantized convolution engine is proposed to address precision loss, computation throughput, and convolution efficiency issues when deploying Convolutional Neural Networks (CNNs) on resource-constrained edge devices. The method utilizes the HA-MPLF quantization strategy to fold the Batch Normalization (BN) layer into the convolution layer and assigns optimal precision to each filter, balancing accuracy and computation performance. Meanwhile, a convolution operation method based on convolution decomposition is proposed to efficiently handle convolution kernels of different sizes. On the FPGA platform, the quantized convolution engine adopts a channel-first computation strategy, combining DSP packing and cascading techniques to significantly improve resource utilization. Experimental validation on the ZCU102 FPGA shows that the method achieves accuracies of 90.13%, 89.51%, and 93.33% for MobileNet-V2, ResNet18, and ResNet50, respectively, and significantly improves throughput, providing an efficient solution for CNN deployment on edge devices.