Yoriyuki Yamagata
September 12, 2020

# Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning

Presentation at Software Engineering Symposium 2020

## Yoriyuki Yamagata

September 12, 2020

## Transcript

1. ### Falsiﬁcation of Cyber-Physical Systems Using Deep Reinforcement Learning Y. Yamagata1,

S. Liu2, T. Akazaki3, Y. Duan2 and J. Hao2 1: AIST, 2: Tianjin University, 3: Fujitsu Laboratories Software Engineering Symposium 2020
2. ### Falsiﬁcation Throttle, Brake Car speed Input Deterministic CPS model Output

Speciﬁcation: Car speed must be < 200km/h Goal: Find an input (counter-example) which violates the speciﬁcation
3. ### Robustness guided falsiﬁcation v: speed t: time 200km/h r =

T min t=1 (200 − vt ) Robustness r • Find a counter-example by minimizing robustness • Cast a falsiﬁcation problem into a numerical optimization problem
4. ### Proposed methods • Nelder Mead, Genetic programming [Donzé, 2010] •

Simulated annealing, cross-entropy method [Annpureddy et al., 2011] • Monte-Carlo tree search [Zhang et al., 2018] • aLVTS [Ernst et al., 2019] • Stochastic Optimization with Adaptive Restart [Mathesen et al, 2020] • Gradient decent [Bennani et al., 2020] • Surrogate model [Menghi, 2020]
5. ### Our Contribution • Recast robustness guided falsiﬁcation into a reinforcement

learning problem • Implemented the proposed method using a deep- reinforcement learning framework • Perform comparison with S-Taliro (widely used robustness guided falsiﬁcation tool)
6. ### Reinforcement learning problem Agent Environment Action State Reward Maximize R

= T ∑ t=1 rγ t rt Condition: the law of the environment is unknown to the agent
7. ### Recasting falsiﬁcation into reinforcement learning T min t=1 (200 −

vt ) ∼ − log T ∑ t=1 exp[ − (200 − vt )] Want to ﬁnd an input which minimizes Therefore, we need to maximize T ∑ t=1 exp[ − (200 − vt )] We can solve this optimization problem using reinforcement learning with the reward exp[ − (200 − vt )]
8. ### Deep reinforcement learning • We use deep reinforcement learning algorithms

• algorithms using deep learning • Versatile, can adapt non-linear system dynamics • In particular, we use two algorithms • DDQN (Q-learning approach) • A3C (Actor-Critic approach)
9. ### Implementation Falsiﬁer A3C, DDQN (ChainerRL) 4ZTUFN*OQVU System Model Simulink Subsystem

Robustness Monitor Taliro-Monitor System Output Robustness Simulink model
10. ### Implementation (cont.) • Falsiﬁer • Custom Simulink block, implemented by

MATLAB • Reinforcement learning part is implemented by Python • Use Python library ChainerRL (now PFRL) • Robustness monitor • reuse the monitor in S-Taliro • System model (Target model)
11. ### Experiment Use 3 models (Chasing Cars, Automatic Transmission, Power Train

Control) Falsiﬁer is allowed to run 200 simulations to falsify a speciﬁcation for each trial 100 trials are repeated for each model and property, because the result varies by stochastic nature of the agent No pre-training, no hyper-parameter tuning At the start of each trail, the “memory” of the agent is reset The memory is kept between simulations
12. ### Evaluation metrics Use the number of simulation required to falsify

Reason: • Execution time depends on • implementation details (combination of Python and MATLAB slows down simulation) • Scheduling (we run experiment concurrently on a single machine) • We ﬁnd the time required for reinforcement learning part is insigniﬁcant
13. ### Baselines • RAND: uniform random input • CE: Cross Entropy

method • SA: Simulated Annealing
14. ### Statistical analysis We need to compare two random variables X

and Y, whose distributions are unknown and highly skewed Therefore, we do not use average etc but relative eﬀect size measure means X tends smaller than Y We perform non-parametric statistical testing with the null p > 0.5 p = 0.5 p = P(X < Y) + 1 2 P(X = Y)
15. ### Chasing Cars model y_in y_out Car 2 Throttle Brake y_out

Car 1 y_in y_out Car 3 y_in y_out Car 4 y_in y_out Car 5 1 2 3 4 1 2 5
16. ### Chasing Cars model, falsiﬁed properties Properties are artiﬁcial, gradually become

complex from to φ1 φ5

19. ### Power Train Control Fuel Control System Veriﬁcation and Validation stub

system Pedal Angle Engine Speed A/F A/F ref Veriﬁcation measurement Mode [Deshmukh et al. 2014]

21. ### Results: Chasing Cars 0 50 100 150 200 Number of

simulations Properties Algorithm A3C DDQN RAND CE SA ϕ1 ϕ2 ϕ3 ϕ4 ϕ5
22. ### Results: Chasing Cars 0 50 100 150 200 Number of

simulations Properties Algorithm A3C DDQN RAND CE SA ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 Smaller, Better Proposed Baseline
23. ### p between proposed methods and baselines Smaller, proposed methods are

better Bold/Italic indicate statistically signiﬁcant difference
24. ### Result: Automatic Transmission 0 50 100 150 200 Number of

simulations Properties Algorithm A3C DDQN RAND CE SA ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ϕ7 ϕ8 ϕ9

26. ### ptc_fml34_sensorfail 0 50 100 150 200 Number of simulations Properties

Algorithm A3C DDQN RAND CE SA ϕ26 ϕ27 ϕ30 ϕ31 ϕ32 ϕ33 Result: Power Train Control φ34

28. ### Summary of experiments Chasing Cars: Proposed methods almost always outperform

baselines, except in which RAND outperforms all methods Automatic Transmission: A3C either outperforms baselines or shows equal performance. The performance of DDQN is unstable Power Train Control: Proposed methods underperform baselines φ2
29. ### Observations and conclusion The proposed methods often outperform baselines, but

not always However, whenever the proposed methods underperform baselines, RAND outperforms or perform equally to other methods As a conclusion, a combination of reinforcement learning and uniform random inputs could be a good approach
30. ### Future works Investigate causes of performance difference Support other properties

than safety properties Vary time-step automatically Hyper-parameter tuning (but how?) Compare different reinforcement algorithms Improve usability
31. ### More info • Paper: Y. Yamagata, S. Liu, T. Akazaki,

Y. Duan and J. Hao, "Falsiﬁcation of Cyber-Physical Systems Using Deep Reinforcement Learning," in IEEE Transactions on Software Engineering, 2020 • Implementation: https://github.com/yoriyuki-aist/Falsify • Comparison to other tools: ARCH-COMP 2019 Category Report: Falsiﬁcation, EasyChair, 2019