Evaluating and designing automated verification methods for critical software systems

Slide 1

Slide 1 text

Evaluating and designing automated verification methods for critical software systems Zoltán Micskei https://home.mit.bme.hu/~micskeiz Habilitation presentation

Slide 2

Slide 2 text

Habilitation presentation 2 • Increasing role of software systems • Safety-, mission-, business critical • Correct, reliable operation is essential • Many verification & validation methods Critical software systems

Slide 3

Slide 3 text

Habilitation presentation 3 Development and testing process (example) System design Requirements System testing Acceptance testing Integration testing Architecture design Module design Module implementation Module testing formal verification code-based test generation model checking model-based testing review

Slide 4

Slide 4 text

Habilitation presentation 4 Development and testing process (example) System design Requirements System testing Acceptance testing Integration testing Architecture design Module design Module implementation Module testing formal verification code-based test generation model checking model-based testing review Goal: Increasing the applicability of advanced model-based and automated verification methods in the engineering practice

Slide 5

Slide 5 text

Habilitation presentation 5 Goal: Increasing the applicability of advanced model-based and automated verification methods in the engineering practice Human factor Research Question 1: Do engineers and tool developers interpret the semantics of system models and generated tests consistently? Research Question 2: What are the limitations of current automated verification tools that hinder widespread adoption? Research Question 3: What new verification methods and tools can solve the identified challenges? Limitations Methods

Slide 6

Slide 6 text

6 New scientific results Habilitation presentation

Slide 7

Slide 7 text

Understanding the semantics of models and tests Thesis 1 Habilitation presentation 7

Slide 8

Slide 8 text

Habilitation presentation 8 • Structure and (discrete) behavior • Standards: model semantics with informal descriptions • What traces are possible? UML/SysML based system models Concurrency Non-deterministic choices Do all engineers interpret in same way? Do all tools (simulator, *generator) interpret the model in the same way?

Slide 9

Slide 9 text

Habilitation presentation 9 Understanding the semantics of modeling languages Detailed studies: UML PSSM, UML doActivity Semantics can be misinterpreted easily Inconsistencies even in the standard

Slide 10

Slide 10 text

Habilitation presentation 10 Code-based test generation /// Calculates the sum of given number of /// elements from an index in an array. int CalculateSum(int start, int number, int[] a) { if(start+number > a.Length || a.Length <= 1) throw new ArgumentException(); int sum = 0; for (int i = start; i < start+number-1; i++) sum += a[i]; return sum; } [TestMethod] public void CalculateSumTest284() { int[] ints = new int[5] { 4,5,6,7,8 }; int i = CalculateSum(0, 0, ints); Assert.AreEqual(0, i); } [TestMethod] public void CalculateSumTest647() { int[] ints = new int[5] { 4,5,6,7,8 }; int i = CalculateSum(0, 4, ints); Assert.AreEqual(15, i); } Select test inputs Observe behavior Generate test code Test generator Can we detect if a generated test captures a failure?

Slide 11

Slide 11 text

Habilitation presentation 11 Understanding the semantics of generated tests Typical research evaluation setup Faulty impl. Correct impl. Generated tests OK Bug! Real-world scenario Implementation (we do not know whether it is faulty or correct) Generated tests OK? Bug? Study with human participants Can they detect when a test captures a bug? Interpretation of generated tests is not perfect Must consider in evaluations!

Slide 12

Slide 12 text

Habilitation presentation 12 Thesis 1 Through analysis and studies, I have shown that semantic inconsistencies in the interpretation of modelling languages and generated tests negatively affect the number of errors that can be detected by the verification methods that use them. 1.1. There are interpretational differences in the sets of possible traces considered by modelers and developers of simulator and verification tools for behavioral modelling languages. I identified the types of discrepancies through an examination of the PSSM specification [j1]. 1.2. When evaluating the tests generated by code-based test generation tools, people identify fewer errors than would be expected from an evaluation comparing the incorrect and correct versions. I designed the concept of an experiment to measure the performance of human evaluation of generated tests. Based on the results, I proposed a classification framework [j2]. Thesis 1 and publications [j1] M. Elekes, V. Molnár, and Z. Micskei. “Assessing the specification of modelling language semantics: a study on UML PSSM”. in: Software Quality Journal 31 (2 2023) [j2] D. Honfi and Z. Micskei. “Classifying generated white-box tests: an exploratory study”. In: Software Quality Journal 27 (3 2019), pp. 1339–1380. M. Elekes, V. Molnár, Z. Micskei: To Do or Not to Do: Semantics and Patterns for Do Activities in UML PSSM State Machines. IEEE Tran. on Soft. Eng., pp. 2124-2141, IEEE, 2024.

Slide 13

Slide 13 text

Evaluating verification tools Thesis 2 Habilitation presentation 13

Slide 14

Slide 14 text

Habilitation presentation 14 Code-based test generator tools Detailed, language feature-level evaluation for test generator tools? Significant differences in tools Identifying “hard” features

Slide 15

Slide 15 text

Habilitation presentation 15 • CEGAR-based model checking • Designing experiment setup • Basis for: – New algorithms – Portfolio design Model checking algorithms Comparing low-level algorithm variants for model checkers?

Slide 16

Slide 16 text

Habilitation presentation 16 • Industrial collaboration (safety-critical software) • C code base • Design new experiment setup: – Evaluating code-based test generator (MC/DC coverage) – Evaluating the test suite of 15 real-world module Mutation testing of embedded software How can mutation testing be applied for embedded software? Mutation testing can be successfully applied for safety-critical embedded software

Slide 17

Slide 17 text

Habilitation presentation 17 Thesis 2 I designed experiments that systematically evaluated different testing and verification methods and software tools, and identified limitations in the current applicability and scalability of the tools. 2.1. I proposed a method based on language feature coverage using compact test code snippets and an experiment to compare source code-based test generation tools. By evaluating the data, I identified typical limitations of the tools under investigation, confirming the theoretical and practical limitations of test generation algorithms [j3; c8]. 2.2. I proposed a series of experiments to evaluate CEGAR-based model checking algorithm variants using predicate and explicit abstraction on software and hardware models. The results identified which configurations are efficient on which input model types [j4]. 2.3 I proposed a series of experiments to investigate the applicability of mutation testing in embedded safety-critical software environments. The results show that mutation testing can find shortcoming both in a test generator targeting MC/DC coverage and a test suite produced by testers meeting safety standards [j5]. Thesis 2 and publication [j3] L. Cseppentő and Z. Micskei. “Evaluating code-based test input generator tools”. In: Software Testing, Verification and Reliability 27 (6 2017), pp. 1–24. [j4] Á. Hajdu and Z. Micskei. “Efficient Strategies for CEGAR-Based Model Checking”. In: Journal of Automated Reasoning 64 (2020), pp. 1051–1091 [j5] A. A. Serban and Z. Micskei. “Application of Mutation testing in Safety-critical Embedded Systems: A Case Study”. In: Acta Polytechnica Hungarica 21 (8 2024)

Slide 18

Slide 18 text

New verification methods and tools Thesis 3 Habilitation presentation 18

Slide 19

Slide 19 text

Habilitation presentation 19 • SysML models with state machines and detailed activities • Selecting language subset („pragmatic subset”) – Expressive power usable in practice – Semantic constraints – Efficient verification is possible • Successful verification even for large industrial model Verification of SysML models Semantically correct formal verification scaling to industrial models?

Slide 20

Slide 20 text

Habilitation presentation 20 • Approach independent of modeling languages • Automated evaluation of change impact • Industrial study: testing search & rescue robots Model-based regression testing Efficient retesting after changes in domain-specific languages?

Slide 21

Slide 21 text

Habilitation presentation 21 Supporting code-based test generation How can symbolic execution based test generation be more effective? Supporting interpretation Automated isolation of dependencies

Slide 22

Slide 22 text

Habilitation presentation 22 3. tézis. I have developed new tools and methods to help make verification of system models and test generation more efficient by overcoming problems that arise in engineering practice. 3.1. I have selected a consistent subset of the SysML system modeling language's element set and its semantic variants ("pragmatic subset") that allows the verification of industry-scale models corresponding to the subset. The subset is defined in the related paper [j6]. 3.2. I developed a method for model-based support of regression testing based on mapping elements of input modelling languages to a general regression test selection metamodel. The method has the advantage of being independent of the input modelling languages [c9]. 3.3. I have proposed a concept to support symbolic execution-based test generation using i) visualization to aid tester interpretation of the generation and ii) source code transformations that automatically isolate dependencies [j7; c10]. Thesis 3 and publications [j6] B. Horváth, V. Molnár, B. Graics, Á. Hajdu, I. Ráth, Á. Horváth, R. Karban, G. Trancho, Z. Micskei. “Pragmatic Verification and Validation of Industrial Executable SysML Models”. In: Systems Engineering 26.6 (2023), pp. 693–714. [c9] D. Honfi, G. Molnár, Z. Micskei, I. Majzik. “Model-Based Regression Testing of Autonomous Robots”. In: SDL 2017: Model-Driven Engineering for Future Internet”. Springer, 2017, pp. 119–135 [j7] D. Honfi and Z. Micskei. “Automated Isolation for White-box Test Generation”. In: Information and Software Technology 125 (2020), pp. 1–16.

Slide 23

Slide 23 text

Impact of the results and conclusion Habilitation presentation 23

Slide 24

Slide 24 text

Habilitation presentation 24 • H2020 / ITEA projects • New modeling languages and V&V methods International R&D projects • Object Management Group (OMG) • Feedback for PSSM standard Standardization • thyssenkrupp • IncQuery Group and NASA JPL • Knorr-Bremse Industrial collaborations Impact and exploitation

Slide 25

Slide 25 text

Habilitation presentation Conclusion 25