helping security team quickly identify articles related to their daily missions Setting up recommendation system of prevalent attack methods Recognizing attack technique in articles and labeling with MITRE ATT&CK technique Classifying articles from largest Chinese security website 5
classification results example: 可是、因为... cutting large articles to meaningful word segments Purpose: determining article categories with specific keywords REMOVING STOP WORDS TOKENIZING 10
一些、不但、而且... Common technical terms in cybersecurity articles: “代码”,”項目”,”信息”... Tools: Jieba Chinese Tokenizing Library REMOVING STOP WORDS TOKENIZING 11
a file in a file set Feature scale down the impact of general and common tokens in a file set (empirically less informative) Principle word importance increases the more it appears in a file, word importance decreases if the word exists in many files Purpose for classifier to identify important word tokens and use them as classification basis 16
Can filter out some common, irrelevant words while retaining the important words of the article Drawbacks: • The position information of the word cannot be reflected. When the keyword is extracted, the position information of the word (such as the title, the beginning of or the end of an article) should be given a higher weight 18
looks like example from sklearn Function Linearly divide many different types of data into different categories Feature only picks one sample for each step in determining classification boundary → efficient Purpose classification between vulnerabilities and enterprise security 20
being processed by the network for each step • It is computationally fast as only one sample is processed at a time Drawbacks: • Frequent updates are computationally expensive due to using all resources for processing one training sample at a time 23
& enterprise security articles helping security team quickly identify articles related to their daily missions Setting up recommendation system of prevalent cyber topics Recognizing attack technique in articles and labeling with MITRE ATT&CK technique Classifying articles from largest Chinese security website 30
several component matrices to expose many properties of the original matrix Example japanese research: animal clustering using SVD https://www.frontiersin.org/articles/10.3389/fpsyt.2018.00087/full 31
helping security team quickly identify articles related to their daily missions Setting up recommendation system of prevalent attack methods Recognizing attack technique in articles and labeling with MITRE ATT&CK technique Classifying articles from largest Chinese security website 35
and development centers • ATT&CK is a framework of observed and known adversarial tactics, techniques, and procedures (TTP) from cybercriminals • ATT&CK maps and indexes virtually everything regarding an intrusion from both the attack and defense sides https://medium.com/cycraft/cycraft-classroom-mitre-att-ck-vs-cyber-kill-c hain-vs-diamond-model-1cc8fa49a20f 36
can cause a large change in the structure of the decision tree. • Decision tree often involves higher time to train the model. Advantages: • Easy to understand: presents visually all of the decision alternatives in a format that is easy to understand • Versatile: A multitude of business problems can be analyzed and solved with Decision Tree 49
system of prevalent attack methods Recognizing attack technique in articles and labeling with MITRE ATT&CK technique Classifying articles from largest Chinese security website Countvec, TF-IDF, SGD ,Naive Bayes, Decision Tree 54