Abstract:
To address the challenges of low target resolution, complex backgrounds, and the high cost and time-consuming nature of obtaining high-quality rotated bounding box annotations in remote sensing images, a multi-scale label optimization method for semi-supervised remote sensing object detection is proposed. The SoftTeacher model effectively leverages large amounts of unlabeled and diverse data, while also identifying previously unlabeled targets in the original dataset. By employing the Segment Anything Model (SAM), deep learning-based image segmentation is achieved, and high-quality labels are generated through mask-based optimization. The proposed method first generates pseudo-labels through semi-supervised learning, then applies multi-scale processing to the label feature boxes before inputting them into the SAM model for optimization. The optimized labels are used to augment the original dataset, which is then employed for fully supervised training. Experimental results demonstrate that the selected semi-supervised object detection model, SoftTeacher, outperforms fully supervised detection models, with the optimized dataset samples showing more accurate results compared to the original pseudo-labeled dataset. When the augmented dataset is used for fully supervised training, the mean Average Precision (mAP) improves from 51.4% to 53.5%. Additionally, comparative experiments with existing common object detectors during the fully supervised training phase further validate that the proposed method effectively enhances the accuracy of remote sensing object detection under conditions of insufficient labeled data.