Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning

Falsiﬁcation of Cyber-Physical Systems Using Deep Reinforcement Learning Y. Yamagata1,
S. Liu2, T. Akazaki3, Y. Duan2 and J. Hao2 1: AIST, 2: Tianjin University, 3: Fujitsu Laboratories Software Engineering Symposium 2020

Falsification Throttle, Brake Car speed Input Deterministic CPS model Output
Specification: Car speed must be < 200km/h Goal: Find an input (counter-example) which violates the specification

Robustness guided falsiﬁcation v: speed t: time 200km/h r =
T min t=1 (200 − vt ) Robustness r • Find a counter-example by minimizing robustness • Cast a falsiﬁcation problem into a numerical optimization problem

Proposed methods • Nelder Mead, Genetic programming [Donzé, 2010] •
Simulated annealing, cross-entropy method [Annpureddy et al., 2011] • Monte-Carlo tree search [Zhang et al., 2018] • aLVTS [Ernst et al., 2019] • Stochastic Optimization with Adaptive Restart [Mathesen et al, 2020] • Gradient decent [Bennani et al., 2020] • Surrogate model [Menghi, 2020]

Our Contribution • Recast robustness guided falsiﬁcation into a reinforcement
learning problem • Implemented the proposed method using a deep- reinforcement learning framework • Perform comparison with S-Taliro (widely used robustness guided falsiﬁcation tool)

Reinforcement learning problem Agent Environment Action State Reward Maximize R
= T ∑ t=1 rγ t rt Condition: the law of the environment is unknown to the agent

Recasting falsiﬁcation into reinforcement learning T min t=1 (200 −
vt ) ∼ − log T ∑ t=1 exp[ − (200 − vt )] Want to ﬁnd an input which minimizes Therefore, we need to maximize T ∑ t=1 exp[ − (200 − vt )] We can solve this optimization problem using reinforcement learning with the reward exp[ − (200 − vt )]

Deep reinforcement learning • We use deep reinforcement learning algorithms
• algorithms using deep learning • Versatile, can adapt non-linear system dynamics • In particular, we use two algorithms • DDQN (Q-learning approach) • A3C (Actor-Critic approach)

Implementation Falsiﬁer A3C, DDQN (ChainerRL) 4ZTUFN*OQVU System Model Simulink Subsystem
Robustness Monitor Taliro-Monitor System Output Robustness Simulink model

Implementation (cont.) • Falsiﬁer • Custom Simulink block, implemented by
MATLAB • Reinforcement learning part is implemented by Python • Use Python library ChainerRL (now PFRL) • Robustness monitor • reuse the monitor in S-Taliro • System model (Target model)

Experiment Use 3 models (Chasing Cars, Automatic Transmission, Power Train
Control) Falsiﬁer is allowed to run 200 simulations to falsify a speciﬁcation for each trial 100 trials are repeated for each model and property, because the result varies by stochastic nature of the agent No pre-training, no hyper-parameter tuning At the start of each trail, the “memory” of the agent is reset The memory is kept between simulations

Evaluation metrics Use the number of simulation required to falsify
Reason: • Execution time depends on • implementation details (combination of Python and MATLAB slows down simulation) • Scheduling (we run experiment concurrently on a single machine) • We ﬁnd the time required for reinforcement learning part is insigniﬁcant

Baselines • RAND: uniform random input • CE: Cross Entropy
method • SA: Simulated Annealing

Statistical analysis We need to compare two random variables X
and Y, whose distributions are unknown and highly skewed Therefore, we do not use average etc but relative eﬀect size measure means X tends smaller than Y We perform non-parametric statistical testing with the null p > 0.5 p = 0.5 p = P(X < Y) + 1 2 P(X = Y)

Chasing Cars model y_in y_out Car 2 Throttle Brake y_out
Car 1 y_in y_out Car 3 y_in y_out Car 4 y_in y_out Car 5 1 2 3 4 1 2 5

Chasing Cars model, falsiﬁed properties Properties are artiﬁcial, gradually become
complex from to φ1 φ5

Automatic Transmission © Mathwork Inc.

Automatic Transmission, falsiﬁed properties Modiﬁed from [Hoxha and Fainekos, 2014]

Power Train Control Fuel Control System Veriﬁcation and Validation stub
system Pedal Angle Engine Speed A/F A/F ref Veriﬁcation measurement Mode [Deshmukh et al. 2014]

Power Train Control, Falsiﬁed Properties [Deshmukh et al. 2014]

Results: Chasing Cars 0 50 100 150 200 Number of
simulations Properties Algorithm A3C DDQN RAND CE SA ϕ1 ϕ2 ϕ3 ϕ4 ϕ5

Results: Chasing Cars 0 50 100 150 200 Number of
simulations Properties Algorithm A3C DDQN RAND CE SA ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 Smaller, Better Proposed Baseline

p between proposed methods and baselines Smaller, proposed methods are
better Bold/Italic indicate statistically signiﬁcant difference

Result: Automatic Transmission 0 50 100 150 200 Number of
simulations Properties Algorithm A3C DDQN RAND CE SA ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ϕ7 ϕ8 ϕ9

p between proposed methods and baselines

ptc_fml34_sensorfail 0 50 100 150 200 Number of simulations Properties
Algorithm A3C DDQN RAND CE SA ϕ26 ϕ27 ϕ30 ϕ31 ϕ32 ϕ33 Result: Power Train Control φ34

p between proposed methods and baselines

Summary of experiments Chasing Cars: Proposed methods almost always outperform
baselines, except in which RAND outperforms all methods Automatic Transmission: A3C either outperforms baselines or shows equal performance. The performance of DDQN is unstable Power Train Control: Proposed methods underperform baselines φ2

Observations and conclusion The proposed methods often outperform baselines, but
not always However, whenever the proposed methods underperform baselines, RAND outperforms or perform equally to other methods As a conclusion, a combination of reinforcement learning and uniform random inputs could be a good approach

Future works Investigate causes of performance difference Support other properties
than safety properties Vary time-step automatically Hyper-parameter tuning (but how?) Compare different reinforcement algorithms Improve usability

More info • Paper: Y. Yamagata, S. Liu, T. Akazaki,
Y. Duan and J. Hao, "Falsiﬁcation of Cyber-Physical Systems Using Deep Reinforcement Learning," in IEEE Transactions on Software Engineering, 2020 • Implementation: https://github.com/yoriyuki-aist/Falsify • Comparison to other tools: ARCH-COMP 2019 Category Report: Falsiﬁcation, EasyChair, 2019

Falsification of Cyber-Physical Systems Using D...

Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning

Yoriyuki Yamagata

More Decks by Yoriyuki Yamagata

Other Decks in Technology

Featured

Transcript

Falsiﬁcation of Cyber-Physical Systems Using Deep Reinforcement Learning Y. Yamagata1,

Falsiﬁcation Throttle, Brake Car speed Input Deterministic CPS model Output

Robustness guided falsiﬁcation v: speed t: time 200km/h r =

Proposed methods • Nelder Mead, Genetic programming [Donzé, 2010] •

Our Contribution • Recast robustness guided falsiﬁcation into a reinforcement

Reinforcement learning problem Agent Environment Action State Reward Maximize R

Recasting falsiﬁcation into reinforcement learning T min t=1 (200 −

Deep reinforcement learning • We use deep reinforcement learning algorithms

Implementation Falsiﬁer A3C, DDQN (ChainerRL) 4ZTUFN*OQVU System Model Simulink Subsystem

Implementation (cont.) • Falsiﬁer • Custom Simulink block, implemented by

Experiment Use 3 models (Chasing Cars, Automatic Transmission, Power Train

Evaluation metrics Use the number of simulation required to falsify

Baselines • RAND: uniform random input • CE: Cross Entropy

Statistical analysis We need to compare two random variables X

Chasing Cars model y_in y_out Car 2 Throttle Brake y_out

Chasing Cars model, falsiﬁed properties Properties are artiﬁcial, gradually become

Automatic Transmission © Mathwork Inc.

Automatic Transmission, falsiﬁed properties Modiﬁed from [Hoxha and Fainekos, 2014]

Power Train Control Fuel Control System Veriﬁcation and Validation stub

Power Train Control, Falsiﬁed Properties [Deshmukh et al. 2014]

Results: Chasing Cars 0 50 100 150 200 Number of

Results: Chasing Cars 0 50 100 150 200 Number of

p between proposed methods and baselines Smaller, proposed methods are

Result: Automatic Transmission 0 50 100 150 200 Number of

p between proposed methods and baselines

ptc_fml34_sensorfail 0 50 100 150 200 Number of simulations Properties

p between proposed methods and baselines

Summary of experiments Chasing Cars: Proposed methods almost always outperform

Observations and conclusion The proposed methods often outperform baselines, but

Future works Investigate causes of performance difference Support other properties

More info • Paper: Y. Yamagata, S. Liu, T. Akazaki,