Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anomaly Detection for a Water Treatment System Using Unsupervised Machine Learning

Anomaly Detection for a Water Treatment System Using Unsupervised Machine Learning

24b83045f3e82fb02ee8211b46a2e6de?s=128

Yoriyuki Yamagata

May 28, 2018
Tweet

Transcript

  1. Anomaly Detection for a Water Treatment System Using Unsupervised Machine

    Learning Nov 18, 2017 DMCIS Jun Inoue¹, Yoriyuki Yamagata¹, Yuqi Chen², Christopher M. Poskitt² and Jun Sun² 1. National Institute of Advanced Industrial Science and Technology (AIST), Japan 2. Singapore University of Technology and Design (SUTD), Singapore
  2. SWaT test bed in SUTD • Scaled down but fully

    operational water treatment system • For security research on social infrastructures and CPSs
  3. Dataset • 26 actuators, each has 3 positions and 25

    sensors • Data for normal behaviors and behaviors under attacks • Subject to 36 network attacks to all subsystems P1-P6 • Each network attack changes transmitted sensor values and actuator commands in different ways
  4. Application of DNN and SVM • Unsupervised machine learning •

    DNN and SVM are applied to normal data to learn normal behavior • Attack data are used for hyper-parameter tuning and evaluation (only)
  5. LSTM a₁ a₁ a₁ a₁ a₁ a₁ v₁ μ₁ σ₁

    a₁ v₁ v₂ v₃ v₄ v₅ a₁ a₁ O DNN • Novel DNN architecture • Feed-forward layers on LSTM • Compute outlier factors
  6. Training error and F1 score • Training error decreases smoothly

    • F1 score does not improve along with epochs • This may suggest a problem in the neural architecture
  7. One-class SVM (RBF kernel) • Create fixed size vectors by

    sliding windows of width 4 • Learn the characteristics of data using normal data • Hyper-parameter tuning using attack data • Logarithmic grid search • Randomized search
  8. DNN and SVM outputs

  9. Performance (overall) Method Precision Recall F1 DNN 0.98295 0.67847 0.80281

    SVM 0.92500 0.69901 0.79628 .)% 0.84767 0.64473 0.73240 Range 0.12829 0.93803 0.22571 Trivial 0.11980 1.00000 0.21397
  10. SVM false positives

  11. Attack Classification A. Override control B. Unnatural sensor values C.

    Gradual change of sensor values
  12. Override control ID DNN SVM MHD 1 0.00000 0.00000 0.00000

    2 0.00000 0.00000 0.11061 4 0.00000 0.03571 0.00000 13 0.00000 0.00000 0.00000 14 0.00000 0.00000 0.00000 17 0.00000 0.00000 0.01813 21(#) 0.00000 0.01667 0.00000 22(#) 0.99792 1.00000 0.00000 23(#) 0.87639 0.87500 0.97561 24 0.00000 0.00000 0.00000
  13. Override control *% DNN SVM MHD   0.00000 0.00909

    0.00000   0.00000 0.00000 0.03391   0.00000 0.00000 0.00000  0.87639 0.93570 0.93443  0.00000 0.00000 0.00000   0.00000 0.00333 0.00000  0.00000 0.00000 0.00000  0.00000 0.00000 0.03119   1.00000 1.00000 0.15565
  14. Unnatural sensor values *% DNN SVM MHD  0.71667 0.72083

    0.00000  0.00000 0.88800 0.00000  0.92708 0.88810 0.49170  1.00000 0.43333 0.06832  0.97833 1.00000 0.99822  0.12333 0.13000 0.18919  0.84524 0.84762 0.94177   0.00000 0.01667 0.00000   0.99792 1.00000 0.00000   0.87500 0.87639 0.97561   0.00000 0.00909 0.00000
  15. Unnatural sensor values *% DNN SVM MHD   0.00000

    0.00000 0.03391   0.00000 0.00000 0.00000   0.00000 0.00333 0.00000  0.00000 0.00000 0.00000  0.00000 0.90455 0.00000  0.00000 0.00000 0.11036  0.00000 0.11852 0.00000   1.00000 1.00000 0.15565  0.92333 0.92667 0.50178  0.94048 0.00000 0.00000  0.93333 0.92667 0.84512
  16. Gradual change of sensor values *% %// 47. .)% 

    0.00000 0.00000 0.00000  0.00000 0.00000 0.00000  0.00000 0.35679 0.00000
  17. Observation • Unnatural sensor value(s) is easiest to detect •

    Attacks which Override control and changes sensor values gradually are difficult to detect
  18. Conclusion • DNN and SVM have almost same performance •

    SVM is more sensitive, but prone to generate false positives • Recall rate widely varies across types of attacks
  19. Threats to validity • We use only SWaT dataset •

    Attacks are all artificial • Recall and precision are measured by numbers of log entries • Duration of attacks can affect the outcome
  20. Future works • Improve recall rates • DNN : better

    neural architecture • SVM : feature engineering • Comparison to other methods • Experiment using different systems than SWaT