Abstract:
The core challenge of document-level neural machine translation (DocNMT) lies in effectively capturing the global semantic of the document and maintaining translation consistency, especially avoiding topic drift and context semantic discontinuity. To address this, a semantic consistency document-level neural machine translation method based on topic fusion is proposed. Firstly, by introducing topic words, the topic drift in the translation process is effectively reduced, and the logical connection between sentences is enhanced, thereby improving the consistency and accuracy of the translation. Secondly, a prompt learning strategy guided by topic word soft templates is adopted. The Bidirectional Encoder Representations from Transformers (BERT) model is used to encode the topic words, and a topic word-aware multi-representation dynamic fusion mechanism is introduced to fuse these topic information with the source language information, achieving the effect of topic migration. Finally, a semantic consistency loss function based on topic words is proposed to balance the contribution of source language information and topic information, avoiding the model's excessive reliance on topic words. Experimental results show that on four public datasets, the proposed method outperforms sentence-level models on the s-BLEU score by an average of more than 3 points; compared with the existing DcNMT model, it performs well in all indicators, especially on the News dataset, where the s-BLEU and d-BLEU scores increase by 0.66 and 0.28 respectively, verifying the effectiveness of this method in improving the quality, consistency, and accuracy of document translation.