Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning

Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning

Presentation at Software Engineering Symposium 2020

Yoriyuki Yamagata

September 12, 2020
Tweet

More Decks by Yoriyuki Yamagata

Other Decks in Technology

Transcript

  1. Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning Y. Yamagata1,

    S. Liu2, T. Akazaki3, Y. Duan2 and J. Hao2 1: AIST, 2: Tianjin University, 3: Fujitsu Laboratories Software Engineering Symposium 2020
  2. Falsification Throttle, Brake Car speed Input Deterministic CPS model Output

    Specification: Car speed must be < 200km/h Goal: Find an input (counter-example) which violates the specification
  3. Robustness guided falsification v: speed t: time 200km/h r =

    T min t=1 (200 − vt ) Robustness r • Find a counter-example by minimizing robustness • Cast a falsification problem into a numerical optimization problem
  4. Proposed methods • Nelder Mead, Genetic programming [Donzé, 2010] •

    Simulated annealing, cross-entropy method [Annpureddy et al., 2011] • Monte-Carlo tree search [Zhang et al., 2018] • aLVTS [Ernst et al., 2019] • Stochastic Optimization with Adaptive Restart [Mathesen et al, 2020] • Gradient decent [Bennani et al., 2020] • Surrogate model [Menghi, 2020]
  5. Our Contribution • Recast robustness guided falsification into a reinforcement

    learning problem • Implemented the proposed method using a deep- reinforcement learning framework • Perform comparison with S-Taliro (widely used robustness guided falsification tool)
  6. Reinforcement learning problem Agent Environment Action State Reward Maximize R

    = T ∑ t=1 rγ t rt Condition: the law of the environment is unknown to the agent
  7. Recasting falsification into reinforcement learning T min t=1 (200 −

    vt ) ∼ − log T ∑ t=1 exp[ − (200 − vt )] Want to find an input which minimizes Therefore, we need to maximize T ∑ t=1 exp[ − (200 − vt )] We can solve this optimization problem using reinforcement learning with the reward exp[ − (200 − vt )]
  8. Deep reinforcement learning • We use deep reinforcement learning algorithms

    • algorithms using deep learning • Versatile, can adapt non-linear system dynamics • In particular, we use two algorithms • DDQN (Q-learning approach) • A3C (Actor-Critic approach)
  9. Implementation Falsifier A3C, DDQN (ChainerRL) 4ZTUFN*OQVU System Model Simulink Subsystem

    Robustness Monitor Taliro-Monitor System Output Robustness Simulink model
  10. Implementation (cont.) • Falsifier • Custom Simulink block, implemented by

    MATLAB • Reinforcement learning part is implemented by Python • Use Python library ChainerRL (now PFRL) • Robustness monitor • reuse the monitor in S-Taliro • System model (Target model)
  11. Experiment Use 3 models (Chasing Cars, Automatic Transmission, Power Train

    Control) Falsifier is allowed to run 200 simulations to falsify a specification for each trial 100 trials are repeated for each model and property, because the result varies by stochastic nature of the agent No pre-training, no hyper-parameter tuning At the start of each trail, the “memory” of the agent is reset The memory is kept between simulations
  12. Evaluation metrics Use the number of simulation required to falsify

    Reason: • Execution time depends on • implementation details (combination of Python and MATLAB slows down simulation) • Scheduling (we run experiment concurrently on a single machine) • We find the time required for reinforcement learning part is insignificant
  13. Statistical analysis We need to compare two random variables X

    and Y, whose distributions are unknown and highly skewed Therefore, we do not use average etc but relative effect size measure means X tends smaller than Y We perform non-parametric statistical testing with the null p > 0.5 p = 0.5 p = P(X < Y) + 1 2 P(X = Y)
  14. Chasing Cars model y_in y_out Car 2 Throttle Brake y_out

    Car 1 y_in y_out Car 3 y_in y_out Car 4 y_in y_out Car 5 1 2 3 4 1 2 5
  15. Power Train Control Fuel Control System Verification and Validation stub

    system Pedal Angle Engine Speed A/F A/F ref Verification measurement Mode [Deshmukh et al. 2014]
  16. Results: Chasing Cars 0 50 100 150 200 Number of

    simulations Properties Algorithm A3C DDQN RAND CE SA ϕ1 ϕ2 ϕ3 ϕ4 ϕ5
  17. Results: Chasing Cars 0 50 100 150 200 Number of

    simulations Properties Algorithm A3C DDQN RAND CE SA ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 Smaller, Better Proposed Baseline
  18. p between proposed methods and baselines Smaller, proposed methods are

    better Bold/Italic indicate statistically significant difference
  19. Result: Automatic Transmission 0 50 100 150 200 Number of

    simulations Properties Algorithm A3C DDQN RAND CE SA ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ϕ7 ϕ8 ϕ9
  20. ptc_fml34_sensorfail 0 50 100 150 200 Number of simulations Properties

    Algorithm A3C DDQN RAND CE SA ϕ26 ϕ27 ϕ30 ϕ31 ϕ32 ϕ33 Result: Power Train Control φ34
  21. Summary of experiments Chasing Cars: Proposed methods almost always outperform

    baselines, except in which RAND outperforms all methods Automatic Transmission: A3C either outperforms baselines or shows equal performance. The performance of DDQN is unstable Power Train Control: Proposed methods underperform baselines φ2
  22. Observations and conclusion The proposed methods often outperform baselines, but

    not always However, whenever the proposed methods underperform baselines, RAND outperforms or perform equally to other methods As a conclusion, a combination of reinforcement learning and uniform random inputs could be a good approach
  23. Future works Investigate causes of performance difference Support other properties

    than safety properties Vary time-step automatically Hyper-parameter tuning (but how?) Compare different reinforcement algorithms Improve usability
  24. More info • Paper: Y. Yamagata, S. Liu, T. Akazaki,

    Y. Duan and J. Hao, "Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning," in IEEE Transactions on Software Engineering, 2020 • Implementation: https://github.com/yoriyuki-aist/Falsify • Comparison to other tools: ARCH-COMP 2019 Category Report: Falsification, EasyChair, 2019