Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Detecting Hacks: Anomaly Detection on Networking Data

Detecting Hacks: Anomaly Detection on Networking Data

See https://medium.com/@jamessirota for a series of blog entries that goes with this deck...

Defense in Depth for Big Data
Network Anomaly Detection Overview
Volume Anomaly Detection
Feature Anomaly Detection
Model Architecture
Deployment on OpenSOC Platform
Questions

James Sirota

June 16, 2015
Tweet

More Decks by James Sirota

Other Decks in Programming

Transcript

  1. 1 © 2010 Cisco and/or its affiliates. All rights reserved.

    Detecting Hacks: Anomaly Detection on Networking Data James Sirota (@JamesSirota) Lead Data Scientist – Managed Threat Defense Chester Parrott (@ParrottSquawk) Data Scientist – Managed Threat Defense June 2015
  2. © 2015 Cisco and/or its affiliates. All rights reserved. 2

    In the next few minutes… •  Defense in Depth for Big Data •  Network Anomaly Detection Overview •  Volume Anomaly Detection •  Feature Anomaly Detection •  Model Architecture •  Deployment on OpenSOC Platform •  Questions
  3. © 2015 Cisco and/or its affiliates. All rights reserved. 3

    Who are we? Big Data Security Analytics Open Source Managed Service
  4. © 2015 Cisco and/or its affiliates. All rights reserved. 4

    The New Defense-In-Depth Defense Strategy Static Sandboxing Threat Intel Feeds Rules Engines Volume- Based Feature- Based NLP-Based Token Clustering User Profiling Asset Profiling Interaction Profiling Dynamic Sandboxing Malware Classifiers Script Classifiers Perimeter Monitoring Web Scraping Soc. Media Analytics Model Validators Training Set Generation Signature Matching Rules- Based Matching Network Anomaly Detection Log Anomaly Detection Behavioral Anomaly Detection Malware Family Script Family Scraping Honeypots Misuse Detection Intrusion Detection Supervised Class. Look- Ahead Analytics Legacy Mindset Generic Threats Targeted Threats Future Threats
  5. © 2015 Cisco and/or its affiliates. All rights reserved. 5

    Network Anomaly Detection Network Anomaly Detection Volume- Based Feature- Based Statistical Process Control Frequency Domain Time series Forecasting Information Theory Principal Component Analysis Sketch- Based 3-sigma algorithms Exponential Smoothing ARIMA Fast Fourier Transform Wavelets Entropy Subspace Heavy Hitters Set Cardinality Probability Models Markov Models Bayes Nets Unsupervis ed ML Clustering Density Proximity Anomalous Traffic Patterns Interrelationships between Features
  6. © 2015 Cisco and/or its affiliates. All rights reserved. 6

    Volume-Based vs. Feature Based Telemetry Volume-Based Feature-Based Encrypted Traffic (Raw Packet) YES NO Raw Packet + Header Metadata YES YES Machine Exhaust Data YES (online) NO DPI Metadata NO YES Netflow YES YES Enrichment Metadata YES YES Application Logs YES YES Other Alerts NO* YES
  7. © 2015 Cisco and/or its affiliates. All rights reserved. 7

    Anomaly Detection: 3-Phase Process Unstructured Data Identify Anomaly Classify Alert Examine + Reinforce Training Set Historical Context
  8. © 2015 Cisco and/or its affiliates. All rights reserved. 8

    Phase 1: Identify Unstructured Data Understanding of Normal Anomaly A Anomaly B Anomaly C Anomaly (N)
  9. © 2015 Cisco and/or its affiliates. All rights reserved. 9

    Phase 2: Classify Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x Port Scan x x x x False Positive x x x x x x DDoS
  10. © 2015 Cisco and/or its affiliates. All rights reserved. 10

    Phase 3: Examine + Reinforce Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x False Positive x x x x x x DDoS x x x x x x False Positive x x x x x x False Positive x x x x False Positive x x x x x x DDoS
  11. © 2015 Cisco and/or its affiliates. All rights reserved. 11

    Basic Anomalies Anomaly   Definition   Alpha Flows Large volume point-to-point flows DoS Denial of service (distributed or single source) Flash Crowd Large volume of traffic to a single destination from a large number of sources Port Scan Probe to many destination ports on a small number of destination addresses Network Scan Probe to many destination addresses on a small number of destination ports Outage Events Traffic shifts because of equipment failures or maintenance Plateau Behavior Behavior caused by traffic reaching environmental limits Point-to-Multipoint Traffic from a single source to many destinations, e.g., content distribution Worms Scanning by worms for vulnerable hosts, which is a special case of network scan
  12. © 2015 Cisco and/or its affiliates. All rights reserved. 13

    Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Asset Bin Value Server 1 15 5pt * Server 2 15 5pt * Server (N) 15 5pt * assetID-metricID-Bin : 5pt Telemetry Anomaly? * 5-point summary (5pt): 1.  the sample minimum (smallest observation) 2.  the lower quartile or first quartile 3.  the median (middle value) 4.  the upper quartile or third quartile 5.  the sample maximum (largest observation) Table Name: Metric ID (Cumulative Volume)
  13. © 2015 Cisco and/or its affiliates. All rights reserved. 14

    Batch Analytics Forecasting Models Forecast Forecasting Algorithm (ARIMA/Holt-Winters, …)
  14. © 2015 Cisco and/or its affiliates. All rights reserved. 15

    Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Key: assetID-metricID-Bin: [Expected | STD] Telemetry Anomaly? Asset Bin Value Server 1 15 EX |STD Server 2 15 EX |STD Server (N) 15 EX |STD Table Name: Metric ID (Cumulative Volume)
  15. © 2015 Cisco and/or its affiliates. All rights reserved. 16

    Time Series DB Batch Model Deployment Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Pre-Compute Expected Values (Batch) Timestamp HIVE Time Series DB MR/Spark MR/Spark MR/Spark Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Expected Values Reference Cache Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Expected Values Reference Cache
  16. © 2015 Cisco and/or its affiliates. All rights reserved. 17

    Online Analytics Data Preparation Deseasonalizer AV CMA RAT UF RF DV
  17. © 2015 Cisco and/or its affiliates. All rights reserved. 18

    Online Analytics Other things to check for Trend: Seasonal Variability: Evolution of Regularities:
  18. © 2015 Cisco and/or its affiliates. All rights reserved. 19

    Online Processing 3-Sigma Algorithms Micro Forecasting Histogram Bins
  19. © 2015 Cisco and/or its affiliates. All rights reserved. 20

    Frequency Domain High •  Trendless •  Noise •  Spikes represent Anomalies Medium •  Flatter •  Finer-grained Trends Low •  Seasonal & ‘Peaky’ •  Weekly/Daily Trends
  20. © 2015 Cisco and/or its affiliates. All rights reserved. 21

    Frequency Domain – Wavelet Separation
  21. © 2015 Cisco and/or its affiliates. All rights reserved. 22

    Online Model Deployment Time Series DB Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Generate Adjuster Timestamp HIVE Time Series DB MR/Spark Adjuster / Decomposer Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Adjuster Decomposer MR/Spark MR/Spark
  22. © 2015 Cisco and/or its affiliates. All rights reserved. 23

    Feature-Based Anomaly Detection Continuous Numeric Features* •  Continuous Numeric Feature - can take on any value between its minimum value and its maximum value •  Normalization - adjusting values measured on different scales to a notionally common scale 1.  Proximity Based Techniques Example: K-Nearest Neighbors (KNN) 2. Clustering Example: K-Means 3. Density - Based MPS Anomaly KBps Anomaly Possible Explanation TOO HIGH TOO LOW Port Scan Network Scan TOO HIGH TOO HIGH DDoS TOO LOW TOO HIGH Control Traffic Anomaly OK OK No Anomaly Sample Anomalies Detected
  23. © 2015 Cisco and/or its affiliates. All rights reserved. 24

    Feature-Based Anomaly Detection Categorical Features * •  Categorical Features - can take on one of a limited, and usually fixed, number of possible values •  Stream Sketch - algorithm produces an approximate answer based on a summary of the data stream in memory Example: Protocol {UDP|FTP|HTTP|…}, GEO-MET {PHOENIX | DALLAS | LONDON| …}, … Count-Min (CM) Sketch : number of occurrences of the element in a stream (Heavy Hitters) Why not count? Protocol: 42k elements per asset. GeoMet: 246k per asset Time Series DB Categorical Data CM Sketch Heavy Hitters Asset Bin Value Server 1 15 HH Server 2 15 HH Server (N) 15 HH MR Table Name: Protocol Unstructured Data CM Sketch Alert Expected: {HTTP, UDP, FTP, DNS} ACTUAL: {DNS, ICMP, HTP, FTP}
  24. © 2015 Cisco and/or its affiliates. All rights reserved. 25

    Feature-Based Anomaly Detection Feature Ratios HyperLogLog: approximating the number of distinct elements in a multiset Useful Ratio: # distinct elements / total elements [0-1] •  Digest- structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means Unstructured Data Hyper LogLog Distinct Src_port Dst_port Src_ip Dst_ip Storm Bolt Src_port Dst_port Src_ip Dst_ip Ack Total Ratios Digest * Alert FEATURE DT RATIO Anomaly Possible Reason SRC_IP ~1/~0 Flash Crowd/DDoS SRC_PORT ~1/~0 Failure Probing/App Hijack DST_IP ~1/~0 Network Scan/DDoS DST_PORT ~1/~0 Port Scan/Footprinting
  25. © 2015 Cisco and/or its affiliates. All rights reserved. 26

    Feature-Based Anomaly Detection Correlation - Information Theory •  Information Theory - study of fundamental limits on signal processing, compression, and storage •  Entropy- a measure of unpredictability of information content Unstructured Data Anomaly-Free Training Set Entropy Summarizer Entropy Src_port Dst_port Src_ip Dst_ip Time Bin (n) SRC_I P SRC_POR T DST_I P DST_PORT SRC_IP - .95 .85 .75 SRC_PORT - .97 .76 DST_IP - - - .98 DST_PORT - - - - MR Alert Time Bin (n)
  26. © 2015 Cisco and/or its affiliates. All rights reserved. 27

    Principal Component Analysis (PCA) Analysis Component Principal •  Feature Selection Algorithm •  Dimensionality Reduction •  E.g. 4 features •  ServerA (A) •  ServerB (B) •  ServerC (C) •  Cumulative = A + B + C
  27. © 2015 Cisco and/or its affiliates. All rights reserved. 28

    PCA – Component Construction ServerA Traffic X -0.5052803 ServerB Traffic X -0.4990556 ServerC Traffic X -0.4816276 Cumulative X -0.5134882 PC1 σ: 0.0135 ServerA Traffic X 0.2801275 ServerB Traffic X 0.4611079 ServerC Traffic X -0.8395562 Cumulative X 0.0636666 PC2 σ: 0.5773 ServerA Traffic X 0.6867089 ServerB Traffic X -0.6988557 ServerC Traffic X -0.1441834 Cumulative X 0.138718 PC3 σ: 0.5773 ServerA Traffic X -0.4411929 ServerB Traffic X -0.2234362 ServerC Traffic X -0.2058916 Cumulative X 0.8444132 PC4 σ: 0.5773
  28. © 2015 Cisco and/or its affiliates. All rights reserved. 31

    Putting it All Together: OpenSOC RAW Transform Enrich Alert (Rules-Based) Enriched Filter Aggregators Router Model 1 Scorer HIVE + Hbase Long-Term Data Store Flume Kafka Storm Model 2 Model n OpenSOC-Streaming OpenSOC-Aggregation OpenSOC-ML SOC Alert Consumers UI UI UI UI UI Web Services Secure Gateway Services External Alert Consumers Big Data Stores Elastic Search Real-Time Index and Search Hbase OpenTSDB Titan Graph Alerts ES/HIVE Alerts Store Remedy Ticketing System
  29. © 2015 Cisco and/or its affiliates. All rights reserved. 32

    We are hiring… •  Data Scientists (Security) •  Aspiring Data Scientists •  Security/Networking Experience Required •  Software Engineering Experience Required •  PhD not required •  Background in stats or ML not required •  Security Researchers *Please contact us via LinkedIn with your profile
  30. © 2015 Cisco and/or its affiliates. All rights reserved. 33

    Book idea… Security Analytics on Hadoop •  Anomaly Detection •  Targeted Models •  Deployment Best Practices •  Alerts •  Visualization Techniques •  Etc… If interested in contributing please contact James Sirota on LinkedIn
  31. © 2015 Cisco and/or its affiliates. All rights reserved. 34

    OpenSOC Resources (@ProjectOpenSOC) Github Repo •  https://github.com/OpenSOC/opensoc Slides •  http://www.slideshare.net/JamesSirota •  https://speakerdeck.com/jsirota Corporate Blogs •  http://blogs.cisco.com/author/jamessirota •  http://blogs.cisco.com/security/opensoc-an-open-commitment-to-security Contributor Blogs •  https://medium.com/@jamessirota •  parrottsquawk.com