Slide 1

Slide 1 text

Shin, Jongho; LINE+ Graylab 2019.12.12 Machine Learning for monitoring Detecting anomalies in the monitoring system

Slide 2

Slide 2 text

Contents Anomaly detection using ML ML system design Results and Future works Summary 01 02 03 04

Slide 3

Slide 3 text

Anomalies are everywhere Anomaly?

Slide 4

Slide 4 text

Not just outliers, but malicious activities Unknown unknowns Fraud detection, Network intrusion detection, etc
 We don’t know what it would look like until we found it. Anomaly in monitoring?

Slide 5

Slide 5 text

ML is not good at finding subtle differences Hard to explain the result Adversarial ML Moreover, deep learnings are good at finding or generating similar things
 The result may not be intuitive. Signal fatigue. Poisoning attacks, evasion attacks. Challenges IUUQTXFCTUBOGPSEFEVDMBTTDTEMFDUVSFT4FTTJPOQEG

Slide 6

Slide 6 text

Google Security Monitoring Tools Group Disclaimer: this diagram is old data! How are others doing? IUUQTXFCTUBOGPSEFEVDMBTTDTEMFDUVSFT4FTTJPOQEG

Slide 7

Slide 7 text

Finding anomaly access in the critical assets - Multiple(hetero) models for each users - Time-series prediction on overall traffic How are we doing? Access log Logstash Elastic search User-level detection Overall detection Alarm system

Slide 8

Slide 8 text

Clustering - Grouping the elements based on similarity - Anomaly elements won’t be grouped with normal elements (hopefully) Clustering

Slide 9

Slide 9 text

Clustering We work with the security community. Hackers are always welcome in LINE. Clustering

Slide 10

Slide 10 text

HDBSCAN Hierarchical DBSCAN Can handle different density groups Clustering

Slide 11

Slide 11 text

HDBSCAN Hierarchical DBSCAN Can handle different density groups Clustering https://www.groundai.com/project/hierarchical-clustering-that-takes-advantage-of-both-density-peak-and-density-connectivity/

Slide 12

Slide 12 text

There are some method for anomaly scoring other than clustering Anomaly scoring https://scikit-learn.org/stable/auto_examples/plot_anomaly_comparison.html

Slide 13

Slide 13 text

Susto, Gian Antonio et al. “Anomaly Detection through on-line Isolation Forest: An application to plasma etching.” Isolation Forest - Partitioning the space until we can isolate the point - Less number of partition means more anomaly Anomaly scoring

Slide 14

Slide 14 text

Susto, Gian Antonio et al. “Anomaly Detection through on-line Isolation Forest: An application to plasma etching.” Isolation Forest - Partitioning the space until we can isolate the point - Less number of partition means more anomaly Anomaly scoring

Slide 15

Slide 15 text

Extended Isolation Forest Random slicing Anomaly scoring Hariri, Sahand, Matias Carrasco Kind, and Robert J. Brunner. "Extended Isolation Forest."

Slide 16

Slide 16 text

Additive model - Trend, seasonal, and holiday - FB’s Prophet library Time-series prediction y(t) = g(t) + s(t) + h(t) + εt

Slide 17

Slide 17 text

Overall filtering - Reduced more than 90% - Rough filtering - FP >> FN - No FP/FN ratio analysis :( Result

Slide 18

Slide 18 text

- Deep learning - Hyperparameter optimization - Better result explanation - Adversarial attacks What’s next?

Slide 19

Slide 19 text

In a nutshell, - Anomaly detection is difficult even with ML - Still it’s better than manual detection - There’s no silver bullet solution - Open subject - More models are robust, but hard to harmonize them Summary

Slide 20

Slide 20 text

- https://web.stanford.edu/class/cs259d/ - Campello, Ricardo JGB, Davoud Moulavi, and Jörg Sander. "Density-based clustering based on hierarchical density estimates." Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, 2013. - Hariri, Sahand, Matias Carrasco Kind, and Robert J. Brunner. "Extended Isolation Forest." arXiv preprint arXiv: 1811.02141 (2018). - Taylor, Sean J., and Benjamin Letham. "Forecasting at scale." The American Statistician 72.1 (2018): 37-45. References

Slide 21

Slide 21 text

Thank you