Malware Detection Muhammad Najmi Ahmad Zabidi International Islamic University Malaysia IEEE Control & System Graduate Research Colloquium 2012 Shah Alam, Malaysia 16th July 2012 Muhammad Najmi Ahmad Zabidi ICSRGC 2012 1/26
grad student at Universiti Teknologi Malaysia, Skudai, Johor Bahru, Malaysia My current employer is International Islamic University Malaysia, Kuala Lumpur Research area - malware detection, narrowing on Windows executables Muhammad Najmi Ahmad Zabidi ICSRGC 2012 2/26
software maliciousness is defined on the risks exposed to the user sometimes, when in vague, the term ‘‘Potentially Unwanted Program/Application’’ (PUP/PUA) being used Muhammad Najmi Ahmad Zabidi ICSRGC 2012 3/26
In this case we have developed a Python based tool, called as pi-ngaji, an open source tool for static malware analysis Dynamic analysis In this case we will execute the malware in a Windows environment and dump the API traces into a text file Muhammad Najmi Ahmad Zabidi ICSRGC 2012 4/26
API since Some are weak, irrelevant features Considered as ‘‘noise’’ Feature selection, ranking method is chosen Muhammad Najmi Ahmad Zabidi ICSRGC 2012 9/26
Feature Ranking Selection with Information Gain Classification and Clustering The features The following are the features Application Programming Interface (API) calls XOR’ed strings Anti virtualization/virtual machine detector Binary entropy is also interesting Muhammad Najmi Ahmad Zabidi ICSRGC 2012 10/26
Feature Ranking Selection with Information Gain Classification and Clustering Binary file structure Figure: Structure of a PE file[Pietrek, 1994] Muhammad Najmi Ahmad Zabidi ICSRGC 2012 11/26
Feature Ranking Selection with Information Gain Classification and Clustering Figure: PE components, simplified Muhammad Najmi Ahmad Zabidi ICSRGC 2012 12/26
Feature Ranking Selection with Information Gain Classification and Clustering API calls Features are as follows: Example of Features GetSystemTimeAsFileTime SetUnhandledExceptionFilte GetCurrentProces TerminateProcess LoadLibraryExW GetVersionExW GetProcAddress Muhammad Najmi Ahmad Zabidi ICSRGC 2012 13/26
Feature Ranking Selection with Information Gain Classification and Clustering Anti Debugger/AntiVM strings IsDebuggerPresent VMCheck.dll Muhammad Najmi Ahmad Zabidi ICSRGC 2012 14/26
Feature Ranking Selection with Information Gain Classification and Clustering Sample execution Analyzing e665297bf9dbb2b2790e4d898d70c9e9 Analyzing registry... [+] Malware is Adding a Key at Hive: HKEY_LOCAL_MACHINE ^G^@Label11^@^A^AÃˇ R^Nreg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ File Execution Options\Rx.exe" /v debugger /t REG_SZ /d %systemrot%\repair\1sass.exe /f^M .... [+] Malware Seems to be IRC BOT: Verified By String : ADMIN [+] Malware Seems to be IRC BOT: Verified By String : LIST [+] Malware Seems to be IRC BOT: Verified By String : QUIT [+] Malware Seems to be IRC BOT: Verified By String : VERSION Analyzing interesting calls.. [+] Found an Interesting call to: FindWindow [+] Found an Interesting call to: LoadLibraryA [+] Found an Interesting call to: CreateProcess [+] Found an Interesting call to: GetProcAddress [+] Found an Interesting call to: CopyFile [+] Found an Interesting call to: shdocvw Muhammad Najmi Ahmad Zabidi ICSRGC 2012 16/26
Feature Ranking Selection with Information Gain Classification and Clustering Advantages on the researcher’s side Malware writers usually are ‘‘lazy’’ hence there is a tendency they will reuse the previous chunk of codes Hence, it’s easier to trace the previous family based on the commonalities Muhammad Najmi Ahmad Zabidi ICSRGC 2012 17/26
Feature Ranking Selection with Information Gain Classification and Clustering Our methods Roughly our methods consist of : 1 Feature Selection(Ranking/Pruning) 2 Supervised Classification 3 Unsupervised Classification Item 2) and 3) above also could be combined to a method known as ‘‘Semi Supervised Classification’’. Muhammad Najmi Ahmad Zabidi ICSRGC 2012 18/26
Feature Ranking Selection with Information Gain Classification and Clustering Information Gain [Zhang et al., 2007, Altaher et al., 2011, Singhal and Raul, 2012] use the following formula for IG application in malware The amount by which the entropy of X decreases reflects additional information about X provided by Y is called information gain, given by IG(X|Y ) = H(X) − H(X|Y ) [Singhal and Raul, 2012] introduced the following algorithm to ‘‘correct out’’ error the results. IG(X) = IG(X) ± n i−0 IG(Xi ) n Muhammad Najmi Ahmad Zabidi ICSRGC 2012 19/26
Feature Ranking Selection with Information Gain Classification and Clustering Information Gain (cont’d) From [Jiang et al., 2011] IG(t) = c∈{ci ,ci } t ∈{t,t} P(t , c)log P(t , c) P(t )P(c) Muhammad Najmi Ahmad Zabidi ICSRGC 2012 20/26
Feature Ranking Selection with Information Gain Classification and Clustering For research purpose the following issues are always wondered: No standard dataset, unlike Intrusion Detection System (IDS) area Fast-paced malware sample, will the datasets being used for the experiment will be questioned Last resort, stick to the existing database, try to free from any specific malware family as to make sure the method will/could work with incoming, new malware Muhammad Najmi Ahmad Zabidi ICSRGC 2012 21/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Deals with known data Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Deals with known data Supervised learning Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Deals with known data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Deals with known data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Clustering Deals with known data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Unsupervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Unsupervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Popular algorithms includes: K-means Fuzzy C Gaussian Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
Feature Ranking Selection with Information Gain Classification and Clustering Classification (supervised) chosen to deal with known corpus but incomplete data Clustering (unsupervised) chosen to deal with new inputs Muhammad Najmi Ahmad Zabidi ICSRGC 2012 23/26
Feature Ranking Selection with Information Gain Classification and Clustering Some results We managed to detect several malware samples by using the existing API traces and other features (bot commands, file/registry deletion) New malware which is more sophisticated - Stuxned/Duqu is very platform specific - attacking SCADA system hence needs more reading on detecting them. Perhaps the most obvious if any XOR’ed communication channels being used. Muhammad Najmi Ahmad Zabidi ICSRGC 2012 24/26
Ali, A. (2011). Computer Virus Detection Using Features Ranking and Machine Learning. Australian Journal of Basic and Applied Sciences, 5(9):1482--1486. Jiang, Q., Zhao, X., and Huang, K. (2011). A feature selection method for malware detection. In 2011 IEEE International Conference on Information and Automation (ICIA), pages 890--895. Pietrek, M. (1994). Peering Inside the PE: A Tour of the Win32 Portable Executable File Format. http://msdn.microsoft.com/en-us/library/ms809762.aspx. Singhal, P. and Raul, N. (2012). Malware detection module using machine learning algorithms to assist in centralized security in enterprise networks. International Journal of Network Security & Its Applications, 4. Zhang, B., Yin, J., Hao, J., Wang, S., and Zhang, D. (2007). New malicious code detection based on n-gram analysis and rough set theory. pages 626--633. Springer-Verlag, Berlin, Heidelberg. Muhammad Najmi Ahmad Zabidi ICSRGC 2012 26/26