Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning-based Malicious Adversaries De...

Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools

Using Machine Learning to detect many cyber security adversaries problems

More Decks by Muhammad Najmi Ahmad Zabidi

Other Decks in Technology

Transcript

  1. Intro The issues in general Motivation Solution Experiments Tools eof()

    Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools Muhammad Najmi Ahmad Zabidi International Islamic University Malaysia MOSC 2012 Berjaya Times Square, Kuala Lumpur 9th July 2012 Muhammad Najmi Ahmad Zabidi MOSC 2012 1/34
  2. Intro The issues in general Motivation Solution Experiments Tools eof()

    About • I am a research grad student at Universiti Teknologi Malaysia, Skudai, Johor Bahru, Malaysia • My current employer is International Islamic University Malaysia, Kuala Lumpur • Research area - malware detection, narrowing on Windows executables • For past few years (since 2003), I am a Subversion(SVN) committer for KDE localization project to Malay language (but now rarely commit.. need a new intern to replace :) ) Muhammad Najmi Ahmad Zabidi MOSC 2012 2/34
  3. Intro The issues in general Motivation Solution Experiments Tools eof()

    Computing world as we knew it • Interconnected machine • Previously less connected, now ‘‘socialized’’ machines • Brought real problems to the cyberworld Muhammad Najmi Ahmad Zabidi MOSC 2012 3/34
  4. Intro The issues in general Motivation Solution Experiments Tools eof()

    Risks • Financial lost • Company/government level espionage • Privacy breach Muhammad Najmi Ahmad Zabidi MOSC 2012 4/34
  5. Intro The issues in general Motivation Solution Experiments Tools eof()

    Types of adversaries • Spam • Scam • Phishing • Malware, botnet, rookit etc • Anything else? Muhammad Najmi Ahmad Zabidi MOSC 2012 5/34
  6. Intro The issues in general Motivation Solution Experiments Tools eof()

    Spam Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
  7. Intro The issues in general Motivation Solution Experiments Tools eof()

    Spam • Annoying Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
  8. Intro The issues in general Motivation Solution Experiments Tools eof()

    Spam • Annoying • Productivity wasted in unneccesary file deletion Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
  9. Intro The issues in general Motivation Solution Experiments Tools eof()

    Spam • Annoying • Productivity wasted in unneccesary file deletion • Difficult to find important email - extreme case Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
  10. Intro The issues in general Motivation Solution Experiments Tools eof()

    Scam Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
  11. Intro The issues in general Motivation Solution Experiments Tools eof()

    Scam • Preying on naive victims Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
  12. Intro The issues in general Motivation Solution Experiments Tools eof()

    Scam • Preying on naive victims • Sounds to good to be true, but still some people believed Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
  13. Intro The issues in general Motivation Solution Experiments Tools eof()

    Scam • Preying on naive victims • Sounds to good to be true, but still some people believed • Organized crime/syndicate... with mules cooperating Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
  14. Intro The issues in general Motivation Solution Experiments Tools eof()

    Phishing Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
  15. Intro The issues in general Motivation Solution Experiments Tools eof()

    Phishing • Almost similar with scam, but different tactic Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
  16. Intro The issues in general Motivation Solution Experiments Tools eof()

    Phishing • Almost similar with scam, but different tactic • More sophisticated, but does not need mule/physical meetup Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
  17. Intro The issues in general Motivation Solution Experiments Tools eof()

    Phishing • Almost similar with scam, but different tactic • More sophisticated, but does not need mule/physical meetup • Main purpose to gain important details - online banking login name, password hence access to the victim’s account Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
  18. Intro The issues in general Motivation Solution Experiments Tools eof()

    Phishing • Almost similar with scam, but different tactic • More sophisticated, but does not need mule/physical meetup • Main purpose to gain important details - online banking login name, password hence access to the victim’s account • More secure to the criminal Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
  19. Intro The issues in general Motivation Solution Experiments Tools eof()

    Malware Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
  20. Intro The issues in general Motivation Solution Experiments Tools eof()

    Malware • Safely to say,covers trojan,virus,dialers,rabbits,worms,rootkit(bundled nowadays) Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
  21. Intro The issues in general Motivation Solution Experiments Tools eof()

    Malware • Safely to say,covers trojan,virus,dialers,rabbits,worms,rootkit(bundled nowadays) • Already infecting computers since 1980s, threat is more obvious when the Internet is coming in Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
  22. Intro The issues in general Motivation Solution Experiments Tools eof()

    Malware • Safely to say,covers trojan,virus,dialers,rabbits,worms,rootkit(bundled nowadays) • Already infecting computers since 1980s, threat is more obvious when the Internet is coming in • Attacking any operating system, Linux, Windows, Mac... even Android phones Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
  23. Intro The issues in general Motivation Solution Experiments Tools eof()

    Problems with adversaries detection • Some manually crafted, some automated • React relatively fast, difficult to trace • Too many (for example, spam) hence too time consuming for manual work Muhammad Najmi Ahmad Zabidi MOSC 2012 10/34
  24. Intro The issues in general Motivation Solution Experiments Tools eof()

    In house analysis • Given enough expertise, in house analysis could be useful • Maintaining reputation, having own group of analysts to handle incidents • Try minimize costs, use open source tools whenever possible Muhammad Najmi Ahmad Zabidi MOSC 2012 11/34
  25. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Machine Learning • Associated with the Artificial Intelligence • Mimicking human (brain) learning • Learns through experience • Deals with known and unknown patterns • Overlapping (or somehow originated) with Data Mining, Pattern Recognition Muhammad Najmi Ahmad Zabidi MOSC 2012 12/34
  26. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  27. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  28. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Deals with known data Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  29. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Deals with known data Supervised learning Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  30. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Deals with known data Supervised learning Popular algorithms includes: • Random Forest • Neural Networks • k-Nearest Neighbor • Decision Trees Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  31. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Deals with known data Supervised learning Popular algorithms includes: • Random Forest • Neural Networks • k-Nearest Neighbor • Decision Trees Predictive [Tan et al., 2005] Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  32. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Clustering Deals with known data Supervised learning Popular algorithms includes: • Random Forest • Neural Networks • k-Nearest Neighbor • Decision Trees Predictive [Tan et al., 2005] Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  33. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Popular algorithms includes: • Random Forest • Neural Networks • k-Nearest Neighbor • Decision Trees Predictive [Tan et al., 2005] Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  34. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Unsupervised learning Popular algorithms includes: • Random Forest • Neural Networks • k-Nearest Neighbor • Decision Trees Predictive [Tan et al., 2005] Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  35. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Unsupervised learning Popular algorithms includes: • Random Forest • Neural Networks • k-Nearest Neighbor • Decision Trees Popular algorithms includes: • K-means • Fuzzy C • Gaussian Predictive [Tan et al., 2005] Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  36. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Table 1: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Unsupervised learning Popular algorithms includes: • Random Forest • Neural Networks • k-Nearest Neighbor • Decision Trees Popular algorithms includes: • K-means • Fuzzy C • Gaussian Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005] Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
  37. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories What to look? • We look for patterns • In some case, have the spam,phishing mails corpus ready • We call these patterns as ‘‘features’’ Muhammad Najmi Ahmad Zabidi MOSC 2012 14/34
  38. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Spam/scam • The language that being used • Perhaps words like ‘‘You have won GBP100,000,000’’ notification through emails • Spam bombarded emails, some might be true businesses, but irresistable to handle. • Scam, asking people to bank in money for untruthful reasons Muhammad Najmi Ahmad Zabidi MOSC 2012 15/34
  39. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Phishing mails • Look for URL • Current effort for example by PhishTank is done by using public submission and (I believe) manual verification Muhammad Najmi Ahmad Zabidi MOSC 2012 16/34
  40. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Malware • Researchers tend to look on the Application Programming Interface (API) calls, some on the opcodes • Analysis done either by using static or dynamic analysis Muhammad Najmi Ahmad Zabidi MOSC 2012 17/34
  41. Intro The issues in general Motivation Solution Experiments Tools eof()

    Categories Some example Figure 1: Automated classification proposed by [Rieck et al., 2009] Muhammad Najmi Ahmad Zabidi MOSC 2012 18/34
  42. Intro The issues in general Motivation Solution Experiments Tools eof()

    The datasets • Spam email research is already quite sometimes compared to the other (phishing) • Sample dataset: • http://csmining.org/index.php/spam-email-datasets-.html • http://archive.ics.uci.edu/ml/datasets/Spambase • Scam email somehow very much associated with spam, since it is unwanted email. Might as well being categorized as ‘‘sub-spam’’ • Phishing emails samples: • Sample dataset: • http://phishtank.com Muhammad Najmi Ahmad Zabidi MOSC 2012 19/34
  43. Intro The issues in general Motivation Solution Experiments Tools eof()

    Feature Selection/Extraction • When analyzing, we’re interested with features • What kind of feature? • Important keywords, strong features • Non important features will be phased out.. unneccesary • Some features might be redundant Muhammad Najmi Ahmad Zabidi MOSC 2012 20/34
  44. Intro The issues in general Motivation Solution Experiments Tools eof()

    • There are algorithms which meant for this: • Information Gain • Support Vector Machine (SVM) • other... some maybe hybrid algoritms(combining several algorithms altogether) - also known as ensemble Muhammad Najmi Ahmad Zabidi MOSC 2012 21/34
  45. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy List of tools Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
  46. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy List of tools • Weka Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
  47. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy List of tools • Weka • R language Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
  48. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy List of tools • Weka • R language • Octave (as replacement for Matlab) Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
  49. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy List of tools • Weka • R language • Octave (as replacement for Matlab) • Python Sci-py with Matplotlib Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
  50. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy Figure 2: Weka Muhammad Najmi Ahmad Zabidi MOSC 2012 23/34
  51. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy Weka • Obtained data are in numbers and visualizations • Need to do some reading on how to interpret them • Test with different algorithms to get the best results Muhammad Najmi Ahmad Zabidi MOSC 2012 24/34
  52. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy R language • No merely a tool, but a language by itself • Usually being used by data analysts Muhammad Najmi Ahmad Zabidi MOSC 2012 25/34
  53. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy Figure 3: These books use R language for their analysis purposes Muhammad Najmi Ahmad Zabidi MOSC 2012 26/34
  54. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy Octave • Octave is an open source alternative for Matlab (MATrix LABoratory) • Works almost similar like Matlab does Muhammad Najmi Ahmad Zabidi MOSC 2012 27/34
  55. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy Figure 4: Octave also has GUI, QtOctave - discontinued Muhammad Najmi Ahmad Zabidi MOSC 2012 28/34
  56. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy Python Scipy #!/usr/bin/env python """ Example: simple line plot. Show how to make and save a simple line plot with labels, title and grid """ import numpy import pylab t = numpy.arange(0.0, 1.0+0.01, 0.01) s = numpy.cos(2*2*numpy.pi*t) pylab.plot(t, s) pylab.xlabel(’time (s)’) pylab.ylabel(’voltage (mV)’) pylab.title(’About as simple as it gets,folks’) pylab.grid(True) pylab.savefig(’simple_plot’) pylab.show() Muhammad Najmi Ahmad Zabidi MOSC 2012 29/34
  57. Intro The issues in general Motivation Solution Experiments Tools eof()

    Weka R language Octave Python Scipy Muhammad Najmi Ahmad Zabidi MOSC 2012 30/34
  58. Intro The issues in general Motivation Solution Experiments Tools eof()

    Flowchart Conclusion The flow Feature Selection Feature Categorization Clustering Classification Visualization Weka, Octave, R scipy, octave, R Weka, Octave, R scipy, octave, R Muhammad Najmi Ahmad Zabidi MOSC 2012 31/34
  59. Intro The issues in general Motivation Solution Experiments Tools eof()

    Flowchart Conclusion Conclusion Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
  60. Intro The issues in general Motivation Solution Experiments Tools eof()

    Flowchart Conclusion Conclusion • Malicious/unwanted threats from spam, scam, phishing and malware is not easy Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
  61. Intro The issues in general Motivation Solution Experiments Tools eof()

    Flowchart Conclusion Conclusion • Malicious/unwanted threats from spam, scam, phishing and malware is not easy • Perhaps one sample could be done by hands, but having thousands per day is tedious Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
  62. Intro The issues in general Motivation Solution Experiments Tools eof()

    Flowchart Conclusion Conclusion • Malicious/unwanted threats from spam, scam, phishing and malware is not easy • Perhaps one sample could be done by hands, but having thousands per day is tedious • Machine learning assist in automation Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
  63. Intro The issues in general Motivation Solution Experiments Tools eof()

    Flowchart Conclusion Conclusion • Malicious/unwanted threats from spam, scam, phishing and malware is not easy • Perhaps one sample could be done by hands, but having thousands per day is tedious • Machine learning assist in automation • Open source provides alternative (free as in minimal cost) for the analysis Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
  64. Intro The issues in general Motivation Solution Experiments Tools eof()

    Flowchart Conclusion Conclusion • Malicious/unwanted threats from spam, scam, phishing and malware is not easy • Perhaps one sample could be done by hands, but having thousands per day is tedious • Machine learning assist in automation • Open source provides alternative (free as in minimal cost) for the analysis • In house analysis provides security in an organization/enterprise reputation Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
  65. Intro The issues in general Motivation Solution Experiments Tools eof()

    Flowchart Conclusion Get in touch! najmi.zabidi @ gmail.com http://mypacketstream.blogspot.com This slides was created with L A TEX Beamer Muhammad Najmi Ahmad Zabidi MOSC 2012 33/34
  66. Intro The issues in general Motivation Solution Experiments Tools eof()

    Flowchart Conclusion Bibliography Rieck, K., Trinius, P., Willems, C., and Holz, T. (2009). Automatic analysis of malware behavior using machine learning. TU, Professoren der Fak. IV. Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. Muhammad Najmi Ahmad Zabidi MOSC 2012 34/34