challenge the security of enterprises with sophisticated and rapidly evolving exploits, cyber threat intelligence (CTI) has emerged as a promising solution to enhance resilience against threats by understanding and limiting your focus on enemies that target your industry, efficiently leveraging the limited resources that hinder all SOCs. While many previous research works attempt to semantically extract unstructured threat intelligence data, none of them is applied for Chinese data source. As China is both the largest source and victim of cyberattacks, lacking visibility of Chinese data source creates a blind spot of CTI. Problem Results and Evaluation To understand the effectiveness of CTRS, some topics extracted by CTRS are listed in the table below, where the original Chinese and translated English are both listed for reference. The results are manually reviewed by security experts. The evaluation shown in Table II indicates that the ratio of finding related topic is 75% for vulnerability articles and 85% for Enterprise Security articles. The figure illustrated below demonstrates the architecture of the proposed system: Chinese Threat Intelligence ANalysis sysTem (CTI ANT) First, the Article Scraper(AS) automatically retrieves articles and MITRE ATT&CK® [1] techniques from Chinese cybersecurity forums[2]. Then, these articles are inspected by the Article Preprocessor (AP) to remove irrelevant noise information. The processed cybersecurity articles are dispatched to the CSAC and the CTRS. 1. In CTRS, the articles are vectorized and disintegrated through vector decomposition. Then, the CTRS clusters the technical tokens based on the their inter-similarity and extracted the top cybersecurity topics for the target categories: a) Vulnerabilities and b) Enterprise Security. 2. The CSAC is established through a machine learning data pipeline. Receiving the vectorized articles from the CTRS, the CSAC generated term importance for each token and trained a Topic Classifier (TC) to predict the theme of each article. The processed MITRE ATT&CK techniques are dispatched to the MD. 3.In the ATT&CK classifier, the MD experiments various classifiers to evaluate which classifier yields the highest ATT&CK detection accuracy. The classifier with the highest precision is incorporated into the web API for creating an interchangeable interface for producing cyberattack detection reports.
[email protected], {cl.yang, ck.chen}@cycraft.com Chia-En Tsai, Cheng-Lin Yang, Chong-Kuan Chen CTI ANT: Hunting for Chinese Threat Intelligence Abstract CTI ANT Architecture Component Details In this research, I have constructed a CTI system called CTI ANT. It is the FIRST automatic Chinese CTI Analysis framework for extracting and analyzing threat intelligence from unstructured Chinese data sources via Natural Language Processing (NLP) algorithms. CTI ANT consists 3 subsystems: 1) ”Cyber Security Article Classifier (CSAC)” to determine the topic and theme for articles 2) ”Cyber Topic Recommendation System (CTRS)” to cluster cybersecurity keywords based on inter- similarity and recommend prevalent security subjects 3) ”MITRE ATT&CK Detector (MD)” for recognizing and labeling cyberattack descriptions in articles based on ATT&CK framework Contribution Reference Here we highlight the contributions from our study: 1) Cyber Security Article Classifier (CSAC): We have established an automatic classification system that assists security analysts to quickly identify the theme of cyber threat data, a significant step towards Chinese CTI gathering and updating. 2) Cyber Topic Recommendation System (CTRS): The CTRS results assist threat analysts in identifying key threat actors to deploy appropriate security controls. Additionally, the results have revealed intrinsic connections across various seemingly-unrelated keywords. 3) MITRE ATT&CK Detector (MD): Automatically recognizing MITRE ATT&CK techniques in Chinese APT reports facilitates the design of better cyber defense mechanisms. Through the visualization of MITRE ATT&CK detections in heatmap format, we further uncovered the imbalance of Chinese MITRE ATT&CK data and proposed adjustment strategies to enable higher- efficiency results for future Chinese CTI inspection. Cyber Security Article Classifier (CSAC) Results MITRE ATT&CK Detector (MD) Results Cyber Topic Recommendation System (CTRS) Results In our study, we evaluated the classification precision and recall between the following two cybersecurity categories: Vulnerabilities and Enterprise Security, as shown in Figure 1. For the MITRE ATT&CK classifier, the result is depicted in Figure 3. To deeply introspect the result of MITRE ATT&CK classifier, we visualize the result into the heatmap presentation listed in the table below. The colors of diagonal cells are lighter; this indicates the classifier makes the correct decisions [1] MITRE, Threat Report ATT&CK® Mapping (TRAM), 2019 (accessed October 4, 2020). [Online]. Available: https: //github.com/mitre-attack/tram [2] FreeBuf, 2012 (accessed October 15, 2020), https://www. freebuf.com/. VULHUB, 2012 (accessed October 15, 2020), http://vulhub. org.cn/attack. [3] Simplified Chinese Stop Word list, 2019 (accessed October 15, 2020), https://github.com/goto456/stopwords/blob/ master/cn_stopwords.txt. •Article Scraper (AS) is a Python crawler designed to retrieve project-related data from Chinese security forums •Term importance Genterator (TIG) is implemented with TF-IDF (Term Frequency-Inverse Document Frequency) •Topic Classifier (TC) is implemented with SGD (Stochastic Gradient Descent) •Vector Decomposition (VD) is implemented with SVD (Singular Value Decomposition) •Cybersecurity Topic Extractor (CTE) utilized the numpy.argsort method to select the representative keywords for each cybersecurity Topic •ATT&CK Classifier (AC) experiments with Decision Tree, SGD, and Naï ve Bayes Classifiers and utilizes SGD, which yields the highest precision •Web API incorporates the ATT&CK Classifier and is implemented with Flask