Evaluating and designing automated verification methods for critical software systems

Evaluating and designing automated verification methods for critical software systems
Zoltán Micskei https://home.mit.bme.hu/~micskeiz Habilitation presentation

Habilitation presentation 2 • Increasing role of software systems •
Safety-, mission-, business critical • Correct, reliable operation is essential • Many verification & validation methods Critical software systems

Habilitation presentation 3 Development and testing process (example) System design
Requirements System testing Acceptance testing Integration testing Architecture design Module design Module implementation Module testing formal verification code-based test generation model checking model-based testing review

Habilitation presentation 4 Development and testing process (example) System design
Requirements System testing Acceptance testing Integration testing Architecture design Module design Module implementation Module testing formal verification code-based test generation model checking model-based testing review Goal: Increasing the applicability of advanced model-based and automated verification methods in the engineering practice

Habilitation presentation 5 Goal: Increasing the applicability of advanced model-based
and automated verification methods in the engineering practice Human factor Research Question 1: Do engineers and tool developers interpret the semantics of system models and generated tests consistently? Research Question 2: What are the limitations of current automated verification tools that hinder widespread adoption? Research Question 3: What new verification methods and tools can solve the identified challenges? Limitations Methods

6 New scientific results Habilitation presentation

Understanding the semantics of models and tests Thesis 1 Habilitation
presentation 7

Habilitation presentation 8 • Structure and (discrete) behavior • Standards:
model semantics with informal descriptions • What traces are possible? UML/SysML based system models Concurrency Non-deterministic choices Do all engineers interpret in same way? Do all tools (simulator, *generator) interpret the model in the same way?

Habilitation presentation 9 Understanding the semantics of modeling languages Detailed
studies: UML PSSM, UML doActivity Semantics can be misinterpreted easily Inconsistencies even in the standard

Habilitation presentation 10 Code-based test generation /// <summary>Calculates the sum
of given number of /// elements from an index in an array.</summary> int CalculateSum(int start, int number, int[] a) { if(start+number > a.Length || a.Length <= 1) throw new ArgumentException(); int sum = 0; for (int i = start; i < start+number-1; i++) sum += a[i]; return sum; } [TestMethod] public void CalculateSumTest284() { int[] ints = new int[5] { 4,5,6,7,8 }; int i = CalculateSum(0, 0, ints); Assert.AreEqual<int>(0, i); } [TestMethod] public void CalculateSumTest647() { int[] ints = new int[5] { 4,5,6,7,8 }; int i = CalculateSum(0, 4, ints); Assert.AreEqual<int>(15, i); } Select test inputs Observe behavior Generate test code Test generator Can we detect if a generated test captures a failure?

Habilitation presentation 11 Understanding the semantics of generated tests Typical
research evaluation setup Faulty impl. Correct impl. Generated tests OK Bug! Real-world scenario Implementation (we do not know whether it is faulty or correct) Generated tests OK? Bug? Study with human participants Can they detect when a test captures a bug? Interpretation of generated tests is not perfect Must consider in evaluations!

Habilitation presentation 12 Thesis 1 Through analysis and studies, I
have shown that semantic inconsistencies in the interpretation of modelling languages and generated tests negatively affect the number of errors that can be detected by the verification methods that use them. 1.1. There are interpretational differences in the sets of possible traces considered by modelers and developers of simulator and verification tools for behavioral modelling languages. I identified the types of discrepancies through an examination of the PSSM specification [j1]. 1.2. When evaluating the tests generated by code-based test generation tools, people identify fewer errors than would be expected from an evaluation comparing the incorrect and correct versions. I designed the concept of an experiment to measure the performance of human evaluation of generated tests. Based on the results, I proposed a classification framework [j2]. Thesis 1 and publications [j1] M. Elekes, V. Molnár, and Z. Micskei. “Assessing the specification of modelling language semantics: a study on UML PSSM”. in: Software Quality Journal 31 (2 2023) [j2] D. Honfi and Z. Micskei. “Classifying generated white-box tests: an exploratory study”. In: Software Quality Journal 27 (3 2019), pp. 1339–1380. M. Elekes, V. Molnár, Z. Micskei: To Do or Not to Do: Semantics and Patterns for Do Activities in UML PSSM State Machines. IEEE Tran. on Soft. Eng., pp. 2124-2141, IEEE, 2024.

Evaluating verification tools Thesis 2 Habilitation presentation 13

Habilitation presentation 14 Code-based test generator tools Detailed, language feature-level
evaluation for test generator tools? Significant differences in tools Identifying “hard” features

Habilitation presentation 15 • CEGAR-based model checking • Designing experiment
setup • Basis for: – New algorithms – Portfolio design Model checking algorithms Comparing low-level algorithm variants for model checkers?

Habilitation presentation 16 • Industrial collaboration (safety-critical software) • C
code base • Design new experiment setup: – Evaluating code-based test generator (MC/DC coverage) – Evaluating the test suite of 15 real-world module Mutation testing of embedded software How can mutation testing be applied for embedded software? Mutation testing can be successfully applied for safety-critical embedded software

Habilitation presentation 17 Thesis 2 I designed experiments that systematically
evaluated different testing and verification methods and software tools, and identified limitations in the current applicability and scalability of the tools. 2.1. I proposed a method based on language feature coverage using compact test code snippets and an experiment to compare source code-based test generation tools. By evaluating the data, I identified typical limitations of the tools under investigation, confirming the theoretical and practical limitations of test generation algorithms [j3; c8]. 2.2. I proposed a series of experiments to evaluate CEGAR-based model checking algorithm variants using predicate and explicit abstraction on software and hardware models. The results identified which configurations are efficient on which input model types [j4]. 2.3 I proposed a series of experiments to investigate the applicability of mutation testing in embedded safety-critical software environments. The results show that mutation testing can find shortcoming both in a test generator targeting MC/DC coverage and a test suite produced by testers meeting safety standards [j5]. Thesis 2 and publication [j3] L. Cseppentő and Z. Micskei. “Evaluating code-based test input generator tools”. In: Software Testing, Verification and Reliability 27 (6 2017), pp. 1–24. [j4] Á. Hajdu and Z. Micskei. “Efficient Strategies for CEGAR-Based Model Checking”. In: Journal of Automated Reasoning 64 (2020), pp. 1051–1091 [j5] A. A. Serban and Z. Micskei. “Application of Mutation testing in Safety-critical Embedded Systems: A Case Study”. In: Acta Polytechnica Hungarica 21 (8 2024)

New verification methods and tools Thesis 3 Habilitation presentation 18

Habilitation presentation 19 • SysML models with state machines and
detailed activities • Selecting language subset („pragmatic subset”) – Expressive power usable in practice – Semantic constraints – Efficient verification is possible • Successful verification even for large industrial model Verification of SysML models Semantically correct formal verification scaling to industrial models?

Habilitation presentation 20 • Approach independent of modeling languages •
Automated evaluation of change impact • Industrial study: testing search & rescue robots Model-based regression testing Efficient retesting after changes in domain-specific languages?

Habilitation presentation 21 Supporting code-based test generation How can symbolic
execution based test generation be more effective? Supporting interpretation Automated isolation of dependencies

Habilitation presentation 22 3. tézis. I have developed new tools
and methods to help make verification of system models and test generation more efficient by overcoming problems that arise in engineering practice. 3.1. I have selected a consistent subset of the SysML system modeling language's element set and its semantic variants ("pragmatic subset") that allows the verification of industry-scale models corresponding to the subset. The subset is defined in the related paper [j6]. 3.2. I developed a method for model-based support of regression testing based on mapping elements of input modelling languages to a general regression test selection metamodel. The method has the advantage of being independent of the input modelling languages [c9]. 3.3. I have proposed a concept to support symbolic execution-based test generation using i) visualization to aid tester interpretation of the generation and ii) source code transformations that automatically isolate dependencies [j7; c10]. Thesis 3 and publications [j6] B. Horváth, V. Molnár, B. Graics, Á. Hajdu, I. Ráth, Á. Horváth, R. Karban, G. Trancho, Z. Micskei. “Pragmatic Verification and Validation of Industrial Executable SysML Models”. In: Systems Engineering 26.6 (2023), pp. 693–714. [c9] D. Honfi, G. Molnár, Z. Micskei, I. Majzik. “Model-Based Regression Testing of Autonomous Robots”. In: SDL 2017: Model-Driven Engineering for Future Internet”. Springer, 2017, pp. 119–135 [j7] D. Honfi and Z. Micskei. “Automated Isolation for White-box Test Generation”. In: Information and Software Technology 125 (2020), pp. 1–16.

Impact of the results and conclusion Habilitation presentation 23

Habilitation presentation 24 • H2020 / ITEA projects • New
modeling languages and V&V methods International R&D projects • Object Management Group (OMG) • Feedback for PSSM standard Standardization • thyssenkrupp • IncQuery Group and NASA JPL • Knorr-Bremse Industrial collaborations Impact and exploitation

Habilitation presentation Conclusion 25

Evaluating and designing automated verification...

Evaluating and designing automated verification methods for critical software systems

Critical Systems Research Group

More Decks by Critical Systems Research Group

Other Decks in Research

Featured

Transcript