Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for monitoring: Detecting anomalies in the monitoring system

Machine Learning for monitoring: Detecting anomalies in the monitoring system

By Shin, Jongho at BECKS#5 https://becks.kktix.cc/events/twbecks5

LINE Developers Taiwan

December 12, 2019
Tweet

More Decks by LINE Developers Taiwan

Other Decks in Programming

Transcript

  1. Not just outliers, but malicious activities Unknown unknowns Fraud detection,

    Network intrusion detection, etc
 We don’t know what it would look like until we found it. Anomaly in monitoring?
  2. ML is not good at finding subtle differences Hard to

    explain the result Adversarial ML Moreover, deep learnings are good at finding or generating similar things
 The result may not be intuitive. Signal fatigue. Poisoning attacks, evasion attacks. Challenges IUUQTXFCTUBOGPSEFEVDMBTTDTEMFDUVSFT4FTTJPOQEG
  3. Google Security Monitoring Tools Group Disclaimer: this diagram is old

    data! How are others doing? IUUQTXFCTUBOGPSEFEVDMBTTDTEMFDUVSFT4FTTJPOQEG
  4. Finding anomaly access in the critical assets - Multiple(hetero) models

    for each users - Time-series prediction on overall traffic How are we doing? Access log Logstash Elastic search User-level detection Overall detection Alarm system
  5. Clustering - Grouping the elements based on similarity - Anomaly

    elements won’t be grouped with normal elements (hopefully) Clustering
  6. There are some method for anomaly scoring other than clustering

    Anomaly scoring https://scikit-learn.org/stable/auto_examples/plot_anomaly_comparison.html
  7. Susto, Gian Antonio et al. “Anomaly Detection through on-line Isolation

    Forest: An application to plasma etching.” Isolation Forest - Partitioning the space until we can isolate the point - Less number of partition means more anomaly Anomaly scoring
  8. Susto, Gian Antonio et al. “Anomaly Detection through on-line Isolation

    Forest: An application to plasma etching.” Isolation Forest - Partitioning the space until we can isolate the point - Less number of partition means more anomaly Anomaly scoring
  9. Extended Isolation Forest Random slicing Anomaly scoring Hariri, Sahand, Matias

    Carrasco Kind, and Robert J. Brunner. "Extended Isolation Forest."
  10. Additive model - Trend, seasonal, and holiday - FB’s Prophet

    library Time-series prediction y(t) = g(t) + s(t) + h(t) + εt
  11. Overall filtering - Reduced more than 90% - Rough filtering

    - FP >> FN - No FP/FN ratio analysis :( Result
  12. In a nutshell, - Anomaly detection is difficult even with

    ML - Still it’s better than manual detection - There’s no silver bullet solution - Open subject - More models are robust, but hard to harmonize them Summary
  13. - https://web.stanford.edu/class/cs259d/ - Campello, Ricardo JGB, Davoud Moulavi, and Jörg

    Sander. "Density-based clustering based on hierarchical density estimates." Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, 2013. - Hariri, Sahand, Matias Carrasco Kind, and Robert J. Brunner. "Extended Isolation Forest." arXiv preprint arXiv: 1811.02141 (2018). - Taylor, Sean J., and Benjamin Letham. "Forecasting at scale." The American Statistician 72.1 (2018): 37-45. References