Abstract:
The surveillance cameras on the construction site are generally installed at high positions, which leads to a large number of obscured and small objects in the collected data wearing safety helmets. These objects have fewer pixels and their features are easily confused with the background, which is prone to miss and false detection. To solve these problems, this paper proposed the CAH-YOLO algorithm. Firstly, the Contextual Efficient Layer Aggregation Network (C-ELAN) is introduced into the backbone module of YOLOv7-Tiny, utilizing large kernel convolutions to enhance contextual feature extraction for objects. Secondly, the Adaptive Space Channel Fusion (ASCF) network is added to the neck module of YOLOv7-Tiny, enhancing the expression of useful information by fully integrating information from different layers. Finally, the downsampling modules of YOLOv7-Tiny are replaced with the Haar Wavelet-based Downsampling (HWD) modules and a HWD group module are added, which reduces the number of parameters while retaining more useful information. Experiments show that the CAH-YOLO algorithm improves the mAP@0.5 by 2%, mAP@0.75 by 4.5% and mAP@s for small objects by 3.2%.