• 基于主题融合的语义一致性篇章神经机器翻译

    Semantic consistency document-level neural machine translation based on topic fusion

    • 篇章级神经机器翻译(Document-level Neural Machine Translation, DocNMT)的核心挑战在于有效捕捉文档全局语义并维持翻译一致性,尤其需避免主题漂移和上下文语义断裂。为此,提出了一种基于主题融合的语义一致性篇章神经机器翻译方法。首先,通过引入主题词,有效减少翻译过程中的主题漂移,增强句子间的逻辑联系,从而提高翻译的一致性和准确性。其次,采用主题词软模版引导的提示学习策略,利用BERT模型对主题词进行编码,并引入一种主题词感知的多表征动态融合机制将这些主题信息与源语言信息进行融合,实现了主题迁移的效果。最后,提出了基于主题词的语义一致性损失函数,平衡源语言信息和主题信息的贡献,避免模型过度依赖主题词。实验结果表明:在四个公开数据集上,所提方法相比句子级模型在s-BLEU分数上平均提高了3分以上;与现有DcNMT模型相比,各项指标表现出色,尤其在News数据集上s-BLEU和d-BLEU分别提升0.66和0.28;验证了该方法在提高篇章翻译质量、一致性和准确性方面的有效性。

       

      Abstract: The core challenge of document-level neural machine translation (DocNMT) lies in effectively capturing the global semantic of the document and maintaining translation consistency, especially avoiding topic drift and context semantic discontinuity. To address this, a semantic consistency document-level neural machine translation method based on topic fusion is proposed. Firstly, by introducing topic words, the topic drift in the translation process is effectively reduced, and the logical connection between sentences is enhanced, thereby improving the consistency and accuracy of the translation. Secondly, a prompt learning strategy guided by topic word soft templates is adopted. The Bidirectional Encoder Representations from Transformers (BERT) model is used to encode the topic words, and a topic word-aware multi-representation dynamic fusion mechanism is introduced to fuse these topic information with the source language information, achieving the effect of topic migration. Finally, a semantic consistency loss function based on topic words is proposed to balance the contribution of source language information and topic information, avoiding the model's excessive reliance on topic words. Experimental results show that on four public datasets, the proposed method outperforms sentence-level models on the s-BLEU score by an average of more than 3 points; compared with the existing DcNMT model, it performs well in all indicators, especially on the News dataset, where the s-BLEU and d-BLEU scores increase by 0.66 and 0.28 respectively, verifying the effectiveness of this method in improving the quality, consistency, and accuracy of document translation.

       

    /

    返回文章
    返回