Abstract:
Effectively extracting cybersecurity-related entities (such as hacker organizations, attack methodologies, and vulnerability information) from unstructured threat intelligence is crucial for enhancing the productivity of security analysts. However, the distinct characteristics and inherent complexity of threat intelligence pose considerable challenges to achieving accurate information extraction. We observe that the distribution of threat entities within such intelligence exhibits local clustering characteristics. While most existing Named Entity Recognition (NER) methods for threat intelligence focus on associating words with sentence-level context or emphasizing lexical features alone, they often overlook the underlying data distribution patterns in threat intelligence texts. To address these limitations, we propose a named entity recognition model tailored for threat intelligence texts, termed SNER (Security Named Entity Recognition). To better capture the local contextual dependencies specific to entity types, this model expands individual words into corresponding subsequences, thereby effectively extracting localized semantic information within intelligence sentences. The extracted features are then integrated for predicting entity labels. Experimental results on cybersecurity datasets including DNRTI and Malware DB demonstrate that the proposed method achieves superior performance, validating its effectiveness.