Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CTI ANT Research Poster

Chia-En Tsai
October 30, 2020

CTI ANT Research Poster

As the sophistication and frequency of targeted cyberattacks continue to rise, so does the demand for accurate and actionable Cyber Threat Intelligence (CTI). While previous research works attempt to extract and analyze CTI, none have been applied to leverage Chinese data. As China is both the largest source and victim of cyberattacks, lacking visibility of Chinese sources creates amajor blind spot for CTI. Additionally, highly-active Chinese security forums provide fertile sources for intelligence.

In this research, I have constructed a CTI system called CTI ANT. It is the FIRST automatic Chinese CTI Analysis framework for extracting and analyzing threat intelligence from unstructured Chinese data sources via Natural Language Processing (NLP) algorithms. CTI ANT consists 3
subsystems: a classifier (CSAC) for recognizing the theme of cyber threat data; a recommendation system (CTRS) that identifies trending keywords for analysts to recognize key threat actors; and a MITRE ATT&CK Detector (MD) to label cyberattack techniques in threat reports.

Evaluation confirmed that CSAC and CTRS have achieved excellent results with accuracies of 93% and 80%, respectively. Moreover, MD presents precise cyberattack detection and ID labeling. I also included security expert reviews for verification. By precise analysis and intelligence retrieval within massive Chinese CTI sources, CTI ANT has been verified to provide instant, accurate
intelligence to security experts.

Chia-En Tsai

October 30, 2020
Tweet

More Decks by Chia-En Tsai

Other Decks in Research

Transcript

  1. RESEARCH POSTER PRESENTATION DESIGN © 2019 www.PosterPresentations.com As cybercriminals continually

    challenge the security of enterprises with sophisticated and rapidly evolving exploits, cyber threat intelligence (CTI) has emerged as a promising solution to enhance resilience against threats by understanding and limiting your focus on enemies that target your industry, efficiently leveraging the limited resources that hinder all SOCs. While many previous research works attempt to semantically extract unstructured threat intelligence data, none of them is applied for Chinese data source. As China is both the largest source and victim of cyberattacks, lacking visibility of Chinese data source creates a blind spot of CTI. Problem Results and Evaluation To understand the effectiveness of CTRS, some topics extracted by CTRS are listed in the table below, where the original Chinese and translated English are both listed for reference. The results are manually reviewed by security experts. The evaluation shown in Table II indicates that the ratio of finding related topic is 75% for vulnerability articles and 85% for Enterprise Security articles. The figure illustrated below demonstrates the architecture of the proposed system: Chinese Threat Intelligence ANalysis sysTem (CTI ANT) First, the Article Scraper(AS) automatically retrieves articles and MITRE ATT&CK® [1] techniques from Chinese cybersecurity forums[2]. Then, these articles are inspected by the Article Preprocessor (AP) to remove irrelevant noise information. The processed cybersecurity articles are dispatched to the CSAC and the CTRS. 1. In CTRS, the articles are vectorized and disintegrated through vector decomposition. Then, the CTRS clusters the technical tokens based on the their inter-similarity and extracted the top cybersecurity topics for the target categories: a) Vulnerabilities and b) Enterprise Security. 2. The CSAC is established through a machine learning data pipeline. Receiving the vectorized articles from the CTRS, the CSAC generated term importance for each token and trained a Topic Classifier (TC) to predict the theme of each article. The processed MITRE ATT&CK techniques are dispatched to the MD. 3.In the ATT&CK classifier, the MD experiments various classifiers to evaluate which classifier yields the highest ATT&CK detection accuracy. The classifier with the highest precision is incorporated into the web API for creating an interchangeable interface for producing cyberattack detection reports. [email protected], {cl.yang, ck.chen}@cycraft.com Chia-En Tsai, Cheng-Lin Yang, Chong-Kuan Chen CTI ANT: Hunting for Chinese Threat Intelligence Abstract CTI ANT Architecture Component Details In this research, I have constructed a CTI system called CTI ANT. It is the FIRST automatic Chinese CTI Analysis framework for extracting and analyzing threat intelligence from unstructured Chinese data sources via Natural Language Processing (NLP) algorithms. CTI ANT consists 3 subsystems: 1) ”Cyber Security Article Classifier (CSAC)” to determine the topic and theme for articles 2) ”Cyber Topic Recommendation System (CTRS)” to cluster cybersecurity keywords based on inter- similarity and recommend prevalent security subjects 3) ”MITRE ATT&CK Detector (MD)” for recognizing and labeling cyberattack descriptions in articles based on ATT&CK framework Contribution Reference Here we highlight the contributions from our study: 1) Cyber Security Article Classifier (CSAC): We have established an automatic classification system that assists security analysts to quickly identify the theme of cyber threat data, a significant step towards Chinese CTI gathering and updating. 2) Cyber Topic Recommendation System (CTRS): The CTRS results assist threat analysts in identifying key threat actors to deploy appropriate security controls. Additionally, the results have revealed intrinsic connections across various seemingly-unrelated keywords. 3) MITRE ATT&CK Detector (MD): Automatically recognizing MITRE ATT&CK techniques in Chinese APT reports facilitates the design of better cyber defense mechanisms. Through the visualization of MITRE ATT&CK detections in heatmap format, we further uncovered the imbalance of Chinese MITRE ATT&CK data and proposed adjustment strategies to enable higher- efficiency results for future Chinese CTI inspection. Cyber Security Article Classifier (CSAC) Results MITRE ATT&CK Detector (MD) Results Cyber Topic Recommendation System (CTRS) Results In our study, we evaluated the classification precision and recall between the following two cybersecurity categories: Vulnerabilities and Enterprise Security, as shown in Figure 1. For the MITRE ATT&CK classifier, the result is depicted in Figure 3. To deeply introspect the result of MITRE ATT&CK classifier, we visualize the result into the heatmap presentation listed in the table below. The colors of diagonal cells are lighter; this indicates the classifier makes the correct decisions [1] MITRE, Threat Report ATT&CK® Mapping (TRAM), 2019 (accessed October 4, 2020). [Online]. Available: https: //github.com/mitre-attack/tram [2] FreeBuf, 2012 (accessed October 15, 2020), https://www. freebuf.com/. VULHUB, 2012 (accessed October 15, 2020), http://vulhub. org.cn/attack. [3] Simplified Chinese Stop Word list, 2019 (accessed October 15, 2020), https://github.com/goto456/stopwords/blob/ master/cn_stopwords.txt. •Article Scraper (AS) is a Python crawler designed to retrieve project-related data from Chinese security forums •Term importance Genterator (TIG) is implemented with TF-IDF (Term Frequency-Inverse Document Frequency) •Topic Classifier (TC) is implemented with SGD (Stochastic Gradient Descent) •Vector Decomposition (VD) is implemented with SVD (Singular Value Decomposition) •Cybersecurity Topic Extractor (CTE) utilized the numpy.argsort method to select the representative keywords for each cybersecurity Topic •ATT&CK Classifier (AC) experiments with Decision Tree, SGD, and Naï ve Bayes Classifiers and utilizes SGD, which yields the highest precision •Web API incorporates the ATT&CK Classifier and is implemented with Flask