Abstract:
Existing Transformer-based methods perform poorly when processing complex remote sensing scenes, and are prone to artifacts and detail loss, especially in terms of local information capture and spatial relationship modeling. To solve the above problems, a Multi-scale Hybrid Attention Network (MsHAN) is proposed. The network designs a Large Kernel Multi-scale Attention Mechanism (LKMSA), a Multi-scale Dynamic Window Hole Attention Module (MSDWDA) and a Spatial Feedforward Module (SFM), which comprehensively improves the performance of remote sensing image super-resolution reconstruction. LKMSA combines large kernel convolution and multi-scale mechanism to significantly improve the modeling ability of long-distance dependencies and the effect of detail recovery. MSDWDA effectively enhances local detail capture and global consistency and suppresses artifact accumulation through dynamic window division and multi-scale hole convolution. SFM improves the modeling ability of spatial information while reducing computational complexity by optimizing the Feed-Forward Network (FFN) structure. On the AID, UCMerced and NWPU-RESISC45 datasets, MsHAN is compared with the existing commonly used and latest super-resolution reconstruction methods (such as EDSR, RCAN, MAN, etc.). The results show that it achieved excellent performance in various evaluation indicators. Taking the PSNR indicator as an example, MsHAN improved by 0.05 dB and 0.11 dB on the AID and UCMerced datasets respectively compared to the latest MAN method. These results indicate that the proposed method has significant advantages in detail recovery and overall image quality.