T min t=1 (200 − vt ) Robustness r • Find a counter-example by minimizing robustness • Cast a falsiﬁcation problem into a numerical optimization problem
learning problem • Implemented the proposed method using a deep- reinforcement learning framework • Perform comparison with S-Taliro (widely used robustness guided falsiﬁcation tool)
vt ) ∼ − log T ∑ t=1 exp[ − (200 − vt )] Want to ﬁnd an input which minimizes Therefore, we need to maximize T ∑ t=1 exp[ − (200 − vt )] We can solve this optimization problem using reinforcement learning with the reward exp[ − (200 − vt )]
• algorithms using deep learning • Versatile, can adapt non-linear system dynamics • In particular, we use two algorithms • DDQN (Q-learning approach) • A3C (Actor-Critic approach)
MATLAB • Reinforcement learning part is implemented by Python • Use Python library ChainerRL (now PFRL) • Robustness monitor • reuse the monitor in S-Taliro • System model (Target model)
Control) Falsiﬁer is allowed to run 200 simulations to falsify a speciﬁcation for each trial 100 trials are repeated for each model and property, because the result varies by stochastic nature of the agent No pre-training, no hyper-parameter tuning At the start of each trail, the “memory” of the agent is reset The memory is kept between simulations
Reason: • Execution time depends on • implementation details (combination of Python and MATLAB slows down simulation) • Scheduling (we run experiment concurrently on a single machine) • We ﬁnd the time required for reinforcement learning part is insigniﬁcant
and Y, whose distributions are unknown and highly skewed Therefore, we do not use average etc but relative eﬀect size measure means X tends smaller than Y We perform non-parametric statistical testing with the null p > 0.5 p = 0.5 p = P(X < Y) + 1 2 P(X = Y)
baselines, except in which RAND outperforms all methods Automatic Transmission: A3C either outperforms baselines or shows equal performance. The performance of DDQN is unstable Power Train Control: Proposed methods underperform baselines φ2
not always However, whenever the proposed methods underperform baselines, RAND outperforms or perform equally to other methods As a conclusion, a combination of reinforcement learning and uniform random inputs could be a good approach
Y. Duan and J. Hao, "Falsiﬁcation of Cyber-Physical Systems Using Deep Reinforcement Learning," in IEEE Transactions on Software Engineering, 2020 • Implementation: https://github.com/yoriyuki-aist/Falsify • Comparison to other tools: ARCH-COMP 2019 Category Report: Falsiﬁcation, EasyChair, 2019