• 基于局部语义特征的威胁实体识别方法

    Threat entity recognition method based on locally focused semantic features

    • 从非结构化威胁情报中高效提取网络安全相关的实体(如黑客组织、攻击方法和漏洞信息),对于提升安全分析师的工作效率至关重要。然而,威胁情报的独特性和复杂性使得准确的信息提取面临显著挑战。文本观察到,威胁情报中的威胁实体分布呈现局部聚焦特性,而现有威胁情报命名实体识别方法大多侧重于将词语与整个句子上下文结合或强调词本身,忽视了情报文本的数据分布特性。针对上述问题,提出了一种针对威胁情报文本的命名实体识别模型——SNER(Security Named Entity Recognition)。为应对实体类型的局部上下文依赖问题,该模型将词语扩展为对应子序列,以有效捕获情报句子中的局部语义信息,并将提取的特征整合用于预测实体标签。在DNRTI和Malware DB等网络安全数据集上的实验结果表明:所提方法在性能上具有显著优势,验证了其有效性。

       

      Abstract: Effectively extracting cybersecurity-related entities (such as hacker organizations, attack methodologies, and vulnerability information) from unstructured threat intelligence is crucial for enhancing the productivity of security analysts. However, the distinct characteristics and inherent complexity of threat intelligence pose considerable challenges to achieving accurate information extraction. We observe that the distribution of threat entities within such intelligence exhibits local clustering characteristics. While most existing Named Entity Recognition (NER) methods for threat intelligence focus on associating words with sentence-level context or emphasizing lexical features alone, they often overlook the underlying data distribution patterns in threat intelligence texts. To address these limitations, we propose a named entity recognition model tailored for threat intelligence texts, termed SNER (Security Named Entity Recognition). To better capture the local contextual dependencies specific to entity types, this model expands individual words into corresponding subsequences, thereby effectively extracting localized semantic information within intelligence sentences. The extracted features are then integrated for predicting entity labels. Experimental results on cybersecurity datasets including DNRTI and Malware DB demonstrate that the proposed method achieves superior performance, validating its effectiveness.

       

    /

    返回文章
    返回