Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Challenges in High Accuracy of Malware Detection

Challenges in High Accuracy of Malware Detection

Problems in detecting Windows based malware

More Decks by Muhammad Najmi Ahmad Zabidi

Other Decks in Technology

Transcript

  1. Intro Issues Objectives Methodology Conclusion Challenges in High Accuracy of

    Malware Detection Muhammad Najmi Ahmad Zabidi International Islamic University Malaysia IEEE Control & System Graduate Research Colloquium 2012 Shah Alam, Malaysia 16th July 2012 Muhammad Najmi Ahmad Zabidi ICSRGC 2012 1/26
  2. Intro Issues Objectives Methodology Conclusion About I am a research

    grad student at Universiti Teknologi Malaysia, Skudai, Johor Bahru, Malaysia My current employer is International Islamic University Malaysia, Kuala Lumpur Research area - malware detection, narrowing on Windows executables Muhammad Najmi Ahmad Zabidi ICSRGC 2012 2/26
  3. Intro Issues Objectives Methodology Conclusion Malware in short is a

    software maliciousness is defined on the risks exposed to the user sometimes, when in vague, the term ‘‘Potentially Unwanted Program/Application’’ (PUP/PUA) being used Muhammad Najmi Ahmad Zabidi ICSRGC 2012 3/26
  4. Intro Issues Objectives Methodology Conclusion Methods of detections Static analysis

    In this case we have developed a Python based tool, called as pi-ngaji, an open source tool for static malware analysis Dynamic analysis In this case we will execute the malware in a Windows environment and dump the API traces into a text file Muhammad Najmi Ahmad Zabidi ICSRGC 2012 4/26
  5. Intro Issues Objectives Methodology Conclusion This talk outline several challenges

    on the current methods of malware detection Muhammad Najmi Ahmad Zabidi ICSRGC 2012 5/26
  6. Intro Issues Objectives Methodology Conclusion Analysis of strings Important, although

    not foolproof Find interesting calls first Considered static analysis, since no executing of the binary Muhammad Najmi Ahmad Zabidi ICSRGC 2012 6/26
  7. Intro Issues Objectives Methodology Conclusion Methods to find interesting strings

    Use strings command (on *NIX systems) Editors Checking with Import Address Table (IAT) Muhammad Najmi Ahmad Zabidi ICSRGC 2012 7/26
  8. Intro Issues Objectives Methodology Conclusion Issues Malware numbers are enormous

    Need automation in handling the detection Our proposal - use Machine Learning methods Muhammad Najmi Ahmad Zabidi ICSRGC 2012 8/26
  9. Intro Issues Objectives Methodology Conclusion Objectives Reducing features in malware

    API since Some are weak, irrelevant features Considered as ‘‘noise’’ Feature selection, ranking method is chosen Muhammad Najmi Ahmad Zabidi ICSRGC 2012 9/26
  10. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering The features The following are the features Application Programming Interface (API) calls XOR’ed strings Anti virtualization/virtual machine detector Binary entropy is also interesting Muhammad Najmi Ahmad Zabidi ICSRGC 2012 10/26
  11. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Binary file structure Figure: Structure of a PE file[Pietrek, 1994] Muhammad Najmi Ahmad Zabidi ICSRGC 2012 11/26
  12. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Figure: PE components, simplified Muhammad Najmi Ahmad Zabidi ICSRGC 2012 12/26
  13. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering API calls Features are as follows: Example of Features GetSystemTimeAsFileTime SetUnhandledExceptionFilte GetCurrentProces TerminateProcess LoadLibraryExW GetVersionExW GetProcAddress Muhammad Najmi Ahmad Zabidi ICSRGC 2012 13/26
  14. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Anti Debugger/AntiVM strings IsDebuggerPresent VMCheck.dll Muhammad Najmi Ahmad Zabidi ICSRGC 2012 14/26
  15. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering "Red Pill":"\x0f\x01\x0d\x00\x00\x00\x00\xc3", "VirtualPc trick":"\x0f\x3f\x07\x0b", "VMware trick":"VMXh", "VMCheck.dll":"\x45\xC7\x00\x01", "VMCheck.dll for VirtualPC":"\x0f\x3f\x07\x0b\xc7\x45\xfc\xff\xff\xff\xff", "Xen":"XenVMM", # Or XenVMMXenVMM "Bochs & QEmu CPUID Trick":"\x44\x4d\x41\x63", "Torpig VMM Trick": "\xE8\xED\xFF\xFF\xFF\x25\x00\x00\x00\xFF \x33\xC9\x3D\x00\x00\x00\x80\x0F\x95\xC1\x8B\xC1\xC3", "Torpig (UPX) VMM Trick": "\x51\x51\x0F\x01\x27\x00\xC1\xFB\xB5\xD5\x35 \x02\xE2\xC3\xD1\x66\x25\x32 \xBD\x83\x7F\xB7\x4E\x3D\x06\x80\x0F\x95\xC1\x8B\xC1\xC3" Source: ZeroWine source code Muhammad Najmi Ahmad Zabidi ICSRGC 2012 15/26
  16. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Sample execution Analyzing e665297bf9dbb2b2790e4d898d70c9e9 Analyzing registry... [+] Malware is Adding a Key at Hive: HKEY_LOCAL_MACHINE ^G^@Label11^@^A^AÃˇ R^Nreg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ File Execution Options\Rx.exe" /v debugger /t REG_SZ /d %systemrot%\repair\1sass.exe /f^M .... [+] Malware Seems to be IRC BOT: Verified By String : ADMIN [+] Malware Seems to be IRC BOT: Verified By String : LIST [+] Malware Seems to be IRC BOT: Verified By String : QUIT [+] Malware Seems to be IRC BOT: Verified By String : VERSION Analyzing interesting calls.. [+] Found an Interesting call to: FindWindow [+] Found an Interesting call to: LoadLibraryA [+] Found an Interesting call to: CreateProcess [+] Found an Interesting call to: GetProcAddress [+] Found an Interesting call to: CopyFile [+] Found an Interesting call to: shdocvw Muhammad Najmi Ahmad Zabidi ICSRGC 2012 16/26
  17. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Advantages on the researcher’s side Malware writers usually are ‘‘lazy’’ hence there is a tendency they will reuse the previous chunk of codes Hence, it’s easier to trace the previous family based on the commonalities Muhammad Najmi Ahmad Zabidi ICSRGC 2012 17/26
  18. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Our methods Roughly our methods consist of : 1 Feature Selection(Ranking/Pruning) 2 Supervised Classification 3 Unsupervised Classification Item 2) and 3) above also could be combined to a method known as ‘‘Semi Supervised Classification’’. Muhammad Najmi Ahmad Zabidi ICSRGC 2012 18/26
  19. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Information Gain [Zhang et al., 2007, Altaher et al., 2011, Singhal and Raul, 2012] use the following formula for IG application in malware The amount by which the entropy of X decreases reflects additional information about X provided by Y is called information gain, given by IG(X|Y ) = H(X) − H(X|Y ) [Singhal and Raul, 2012] introduced the following algorithm to ‘‘correct out’’ error the results. IG(X) = IG(X) ± n i−0 IG(Xi ) n Muhammad Najmi Ahmad Zabidi ICSRGC 2012 19/26
  20. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Information Gain (cont’d) From [Jiang et al., 2011] IG(t) = c∈{ci ,ci } t ∈{t,t} P(t , c)log P(t , c) P(t )P(c) Muhammad Najmi Ahmad Zabidi ICSRGC 2012 20/26
  21. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering For research purpose the following issues are always wondered: No standard dataset, unlike Intrusion Detection System (IDS) area Fast-paced malware sample, will the datasets being used for the experiment will be questioned Last resort, stick to the existing database, try to free from any specific malware family as to make sure the method will/could work with incoming, new malware Muhammad Najmi Ahmad Zabidi ICSRGC 2012 21/26
  22. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  23. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  24. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Deals with known data Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  25. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Deals with known data Supervised learning Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  26. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Deals with known data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  27. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Deals with known data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  28. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Clustering Deals with known data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  29. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  30. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Unsupervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  31. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Table: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Unsupervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Popular algorithms includes: K-means Fuzzy C Gaussian Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  32. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Classification (supervised) chosen to deal with known corpus but incomplete data Clustering (unsupervised) chosen to deal with new inputs Muhammad Najmi Ahmad Zabidi ICSRGC 2012 23/26
  33. Intro Issues Objectives Methodology Conclusion API calls Anti Debugger/AntiVM strings

    Feature Ranking Selection with Information Gain Classification and Clustering Some results We managed to detect several malware samples by using the existing API traces and other features (bot commands, file/registry deletion) New malware which is more sophisticated - Stuxned/Duqu is very platform specific - attacking SCADA system hence needs more reading on detecting them. Perhaps the most obvious if any XOR’ed communication channels being used. Muhammad Najmi Ahmad Zabidi ICSRGC 2012 24/26
  34. Intro Issues Objectives Methodology Conclusion The flow Feature Selection Feature

    Categorization Clustering Classification Visualization Weka, Octave/Matlab scipy, Octave/Matlab Weka, Octave/Matlab scipy, Octave/Matlab Muhammad Najmi Ahmad Zabidi ICSRGC 2012 25/26
  35. Intro Issues Objectives Methodology Conclusion Altaher, A., Ramadass, S., and

    Ali, A. (2011). Computer Virus Detection Using Features Ranking and Machine Learning. Australian Journal of Basic and Applied Sciences, 5(9):1482--1486. Jiang, Q., Zhao, X., and Huang, K. (2011). A feature selection method for malware detection. In 2011 IEEE International Conference on Information and Automation (ICIA), pages 890--895. Pietrek, M. (1994). Peering Inside the PE: A Tour of the Win32 Portable Executable File Format. http://msdn.microsoft.com/en-us/library/ms809762.aspx. Singhal, P. and Raul, N. (2012). Malware detection module using machine learning algorithms to assist in centralized security in enterprise networks. International Journal of Network Security & Its Applications, 4. Zhang, B., Yin, J., Hao, J., Wang, S., and Zhang, D. (2007). New malicious code detection based on n-gram analysis and rough set theory. pages 626--633. Springer-Verlag, Berlin, Heidelberg. Muhammad Najmi Ahmad Zabidi ICSRGC 2012 26/26