Slide 1

Slide 1 text

Holy BATSense! Deploying TBATS Machine Learning Algorithm to Detect Security Events Pranshu Bajpai (@amirootyet) August 16, 2018 AI Village @ DEF CON 26 1

Slide 2

Slide 2 text

Agenda 1. Introduction 2. Background 3. Data Collection 4. Data plots 5. The forecasting algorithm: TBATS 6. Methodology 7. Results 8. Investigating Anomalous Events Examples of true positives 9. Conclusion 2

Slide 3

Slide 3 text

Introduction

Slide 4

Slide 4 text

About us Pranshu Bajpai • PhD candidate at Michigan State University • Previously worked as an independent penetration tester • Security researcher at Security Research Group, MSU • Active speaker at security conferences: DEF CON, GrrCon, ToorCon, BSides, APWG eCrime ...  http://cse.msu.edu/~bajpaipr/  https://twitter.com/amirootyet  https://www.linkedin.com/in/pranshubajpai/ Dr. Richard Enbody • Associate Professor at Michigan State University • Teaching computer architecture, security, programming • Books: Targeted Cyber Attacks, The Practice of Computing using Python 3

Slide 5

Slide 5 text

Disclaimer! The views expressed in this presentation are based entirely on my research efforts and do not relate to any of my present or previous employers! I assume basic knowledge of anomaly detection and artificial intelligence during this presentation! 4

Slide 6

Slide 6 text

The problem • Organizations are constantly targeted by attackers • Barrage of targeted and general reconnaissance attacks • Small ratio of security personnel to devices being monitored • Need of an assisting application that identifies events of interest • Raise alarms as security events happen while pruning off noise • Be “intelligent” enough to identify changing patterns based on past data and suppress noise 5

Slide 7

Slide 7 text

Challenges • Overwhelming amounts of information contributed by an amalgam of devices across the organization • Maintain a balance between false positives and true positives lest employees experience alert fatigue • Need for dynamic correlations that constantly adjust according to recent data observed in environment 6

Slide 8

Slide 8 text

Background

Slide 9

Slide 9 text

Anomalies Point Anomalies Individual data point considered anomalous with respect to the rest of the data points Example: exceptionally high number of login attempts 7

Slide 10

Slide 10 text

Anomalies Contextual Anomalies Data point considered anomalous in a specific context, but not otherwise Example: exceptionally high number of login attempts at 4 AM 7

Slide 11

Slide 11 text

Anomalies Collective Anomalies Related data points are collectively anomalous with respect to the entire dataset Example: exceptionally high number of expensive purchases 7

Slide 12

Slide 12 text

Related Work Anomaly detection approaches can be classified into: • Statistical • Static rule-based • Model-based • ... Specifically, the following forecasting methods have been popularly used in anomaly detection: • ARIMA • Holt-Winters • Neural networks 8

Slide 13

Slide 13 text

Related Work We encountered the following issues while adapting these approaches in our environments: • high false positive rates causing alert fatigue • excessive resource consumption • requirement of prior knowledge of attack patterns • diversity of network traffic Can we write a simple script taking advantage of the seasonalities inherent in data to effectively detect coarse anomalies across 1000s of devices? 9

Slide 14

Slide 14 text

Data Collection

Slide 15

Slide 15 text

The SIEM system • SIEM accumulates security event-related logs from critical sources within the organization • Examples: Qradar, LogRhythm, AlienVault, Splunk, ArcSight • SIEM system: • enables central storage and interpretation of logs • allows real time analysis for rapid incident handling • facilitates trend analysis and rapid reporting 10

Slide 16

Slide 16 text

Understanding events per second • SIEM allows statistical analysis on the log data accumulated • We calculate events per second (EPS) as follows: EPS = Number of Events Recorded Time Period • We consider the following 5 types of devices during this presentation: • Firewalls • Mail servers • Business critical infrastructure • Wireless services • Active directory 11

Slide 17

Slide 17 text

Examples of events Device Type Example of Events Firewall Connection attempts Admin login success or failure Alert messages Mail Server Successful user login Failed login attempts Rejected due to spam Invalid user Business Critical Infrastructure Invalid attempts at logical access Initialization and removal of system-level objects Requested action requires root privileges Wireless Services New client joins and access point (AP) AP detects device packet flood attempts Administrator logged in Microsoft Active Directory All authentication attempts 12

Slide 18

Slide 18 text

Data plots

Slide 19

Slide 19 text

About the plots... • Exact dates and times of events or incidents have been redacted throughout this document for operational security • Y-axis represents average events per second in an hour • Granularity of the collected EPS data is 60 minutes • Hourly EPS data exhibits multiple seasonal patterns • EPS data is expected to be of higher magnitude during hours of peak usage 13

Slide 20

Slide 20 text

Hourly EPS data for border firewall 0 50 100 150 200 250 300 10000 15000 20000 25000 Hours Events per Second (Border Firewall) 14

Slide 21

Slide 21 text

Hourly EPS data for mail services 0 50 100 150 200 250 300 400 600 800 1000 1400 Hours Events per Second (Mail Services) 15

Slide 22

Slide 22 text

Hourly EPS data for active directory 0 50 100 150 200 250 300 100 150 200 250 Hours Events per Second (Microsoft Active Directory) 16

Slide 23

Slide 23 text

Hourly EPS data for wireless services 0 50 100 150 200 250 300 200 400 600 800 Hours Events per Second (Wireless Services) 17

Slide 24

Slide 24 text

Hourly EPS data for business critical infrastructure 0 50 100 150 200 250 300 40 60 80 100 120 Hours Events per Second (Business Critical Infrastructure) 18

Slide 25

Slide 25 text

Exploratory Analysis of EPS Data 19

Slide 26

Slide 26 text

The forecasting algorithm: TBATS

Slide 27

Slide 27 text

What we were looking for? The algorithm should: • be able to handle complex seasonal patterns in the data: intra-daily and intra-weekly • complete predictive modeling within the time constraints on a production server • be able to accurately predict the next 24 hours based on limited past data (2 to 4 weeks) 20

Slide 28

Slide 28 text

TBATS • Exponential smoothing state space model, Box-Cox ... • Developed by Hyndman et al. and available in the R forecast package • Handles complex seasonalities well • Fast in generating predictions for our data • Scalable 21

Slide 29

Slide 29 text

Sample forecast from TBATS 22

Slide 30

Slide 30 text

Methodology

Slide 31

Slide 31 text

Algorithm Input: 336 EPS values (each hour of last 14 days) Output: 24 EPS predictions (each hour of next day) LOOP Process 1: for every 24 hours do 2: data = read (hourly EPS data for last 14 days) 3: if (missing data value in data) then 4: missing data value = 0 5: end if 6: seasonal periods = create seasons (data, seasonal periods = 24, 168) 7: model = tbats(seasonal periods) 8: forecasted values = forecast (model, next 24 values) 9: output (forecasted values) 10: end for 11: return 23

Slide 32

Slide 32 text

Algorithm Input: Forecasted EPS values from algorithm 1 Output: Alerts LOOP Process 1: for every hour of day do 2: read (forecasted EPS value for that hour) 3: calculate thresholds on forecasted EPS value for that hour 4: if (observed EPS value > upper threshold on forecasted value) then 5: alert_personnel() 6: end if 7: if (observed EPS value < lower threshold on forecasted value) then 8: alert_personnel() 9: end if 10: end for 11: return 24

Slide 33

Slide 33 text

Setting the alert threshold • Forecasted values are never perfect • Alert threshold keeps false positives in check (and avoid alert fatigue) • Function of the environment’s tolerance for number of alerts per day • Set by security personnel familiar with the environment • Tradeoff between false positives and false negatives depending on the threshold 25

Slide 34

Slide 34 text

Results

Slide 35

Slide 35 text

Actual observations against predictions - border firewall 0 200 400 600 800 1000 1200 10000 20000 30000 Time (Hours) Events per Second (Border Firewall) Actual Predicted Case investigation 26

Slide 36

Slide 36 text

Actual observations against predictions - critical infrastructure 0 200 400 600 800 1000 1200 0 500 1000 1500 2000 2500 3000 Time (Hours) Events per Second (Business Critical Infrastructure) Actual Predicted 27

Slide 37

Slide 37 text

Actual observations against predictions - mail services 0 200 400 600 800 1000 1200 500 1000 2000 3000 Time (Hours) Events per Second (Mail Services) Actual Predicted 28

Slide 38

Slide 38 text

Actual observations against predictions - wireless services 0 200 400 600 800 1000 1200 0 200 400 600 800 Time (Hours) Events per Second (Wireless Services) Actual Predicted 29

Slide 39

Slide 39 text

Actual observations against predictions - active directory 0 200 400 600 800 1000 1200 200 300 400 500 600 700 Time (Hours) Events per Second (Microsoft Active Directory) Actual Predicted Case investigation 30

Slide 40

Slide 40 text

Result • TBATS was able to accurately model our data with two inherent seasonalities • Forecasts were fairly accurate except for anomalous conditions that needed investigation Let us dive into these anomalous events! 31

Slide 41

Slide 41 text

Investigating Anomalous Events

Slide 42

Slide 42 text

False positives • Scheduled activity that occurs every month • example: massive spike in EPS data pertaining to critical infrastructure • alerts can be suppressed for such days and times • sophisticated attackers could focus on such days and times to masquerade their attacks 32

Slide 43

Slide 43 text

False negatives What did we miss? • Events have to be noisy enough to cause a spike at the macro level • This methodology will miss clever, quiet attacks! 33

Slide 44

Slide 44 text

True positives We discovered a lot of interesting events! • These would otherwise be lost in the wires • Following anomalous activity counts as a true positive: • Errors, changes, failures, and performance issues • Special events or abnormal use (e.g. football game on campus causing flash crowds) • Measurement issues (problems with data collection) • Security incidents Each of these require a response! 34

Slide 45

Slide 45 text

Examples of true positives Border Firewall • Notice the anomaly in the highlighted region Jump to figure • DDoS attack: send malicious packets to overload the border firewall’s session table • Firewall denies the traffic and logs events 35

Slide 46

Slide 46 text

Examples of true positives Business Critical Infrastructure • Logging software suffered an error Jump to figure • Relentlessly contributed the same error message until restarted • Alerts sent by our model drew attention to the logging failure within an hour 35

Slide 47

Slide 47 text

Examples of true positives Active Directory • Observed EPS values dropped significantly and surpassed the lower threshold causing an alert Jump to figure • One of the log agents in the SIEM architecture had stopped contributing logs 35

Slide 48

Slide 48 text

Examples of true positives Wireless services • Flash crowds during a childrens’ camp on campus Jump to figure • Unexpected peaks in EPS values 35

Slide 49

Slide 49 text

Conclusion

Slide 50

Slide 50 text

Conclusion Some AI is better than no AI! 36

Slide 51

Slide 51 text

Conclusion • We deployed an available R forecasting algorithm to predict security trends for the next hour • Deviations were reported, if found, to personnel automatically every hour • TBATS worked well for the multiple seasonalities inherent in our data • Adjusting thresholds allowed fine tuning alerts and controlling false positives • Raised alarms within 60 minutes of the first sign of malicious activity • Scalability of this approach allowed us to cover 16 categories of devices across campus • We plan to introduce more complex seasonalities (e.g. month of the year) 36

Slide 52

Slide 52 text

Thank you! • AI Village organizers • For the support! • Michigan State Infosec Team • Tyler Olsen • Nicholas Oas • Seth Edgar • Rob McCurdy  Questions  @amirootyet 37