Test Suite Evaluation and Optimisation

VW-Seminar “Code- Analyse and beyond” 2020-12-11 Prof. Dr. Stefan Wagner
Test Suite Evaluation and Optimisation

You can copy, share and change, film and photograph, blog,
live-blog and tweet this presentation given that you attribute it to its author and respect the rights and licences of its parts. based on slides by @SMEasterbrook und @ethanwhite

2013 2014 2015 2016 2017 2018 2019 23 26 26
31 35 26 23 Budget for quality assurance decreases Proportion of total budget allocated to QA and testing in % Data from: Capgemini, Microfocus. World Quality Report 2019–20

Increased amount of developments and releases Shift to Agile/DevOps causing
more test iteration cycles Increased challenges with test environments Business demands higher IT quality Increased inefficiency of test activities 4,85 4,95 5,08 5,39 5,7 Factors that increase costs Mean on scale from 1 to 7, where 7 is „most important“ Data from: Capgemini, Microfocus. World Quality Report 2019–20

How e f f ective is my test suite in
detecting faults? Test Input 1 Software Under Test Test Input 2 Test Input 3 Test Output 1 Test Output 2 Test Output 3 Test Suite Executing code to detect faults

Pseudo-Tested Code Test Suite Evaluation and Optimisation

Code Coverage Measures what code has been executed – but
not whether it is good in detecting faults.

Code Coverage – Example public class Calculation { private int
value; public Calculation() { this.value = 0; } public void add(int x) { this.value += x; } public boolean isEven() { return this.value % 2 == 0; } } @Test public void testCalculation() { Calculation calc = new Calculation(); calc.add(6); assertTrue(calc.isEven()); }

Mutation Testing – Idea Injecting faults and observing whether they
are detected

Mutation Testing – Process For every covered, mutatable code piece
Initial analysis for determining the relationship between code and test cases Mutation of a piece of code Execution of relevant test cases

Mutation Operators

Will my tests be able to tell? Idea: Remove whole
method implementation R. Niedermayr, E. Juergens, S. Wagner. CSED, 2016 + – public int computeFactorial(int n) { int result = 1; for (int i = n; i > 1; i--) { result = result * i; } return result; return 0; }

Implementation of a tool to do that: https://github.com/STAMP-project/pitest- descartes

Pseudo-tested code in popular open source Between 6% and 53%

Advantages of pseudo-tested code detection Much faster to compute than
mutations No equivalent mutant problem More valid than code coverage

Too Trivial to Test Test Suite Evaluation and Optimisation

Identify code regions with low fault risk FAULTY NON-FAULTY (Traditional)
Defect Prediction OTHER LOW FAULT RISK Inverse Defect Prediction Idea: Inverse Defect Prediction (IDP) R. Niedermayr, T. Röhm, S. Wagner. PeerJ Computer Science, 2019

Association Rule Mining ▪ Identify rules in a large dataset
of transactions ▪ Rules describe implications between items   { antecedent } → { consequent } ▪ Properties of rules ▪ support (significance) ▪ confidence (precision)

# Rule Support Conﬁdence # 1 { NoMethodInvocations, SlocLessThan4, NoArithmeticOperations,
NoNullLiterals } ) { NotFaulty } 10.43% 96.76% 2 { NoMethodInvocations, SlocLessThan4, NoArithmeticOperations } ) { NotFaulty } 11.03% 96.09% 3 { NoMethodInvocations, SlocLessThan4, NoCastExpressions, NoNullLiterals } ) { NotFaulty } 10.43% 95.43% 4 { NoMethodInvocations, SlocLessThan4, NoCastExpressions, NoInstantiations } ) { NotFaulty } 10.13% 95.31% 5 { NoMethodInvocations, SlocLessThan4, NoCastExpressions } ) { NotFaulty } 11.03% 94.85%

Proportion of low-fault-risk methods 0 % 25 % 50 %
75 % 100 % Chart Closure Lang Math Mockito Time 0 % 25 % 50 % 75 % 100 % Chart Closure Lang Math Mockito Time Methods SLOC Only 0.3% contain a fault. Are 5.7 times less likely to contain a fault.

Trace-Based Test Selection Test Suite Evaluation and Optimisation

scheme as veriﬁcation-use. For a failure occurrence, the set of
veriﬁcation-use references represents those shared data on which the failure behavior of the system can be observed. Figure 1. Illustration of ECUs used in Chassis Control Subsystem. What to test in continuous integration? What if we do not have access to the code?

Trace-Based Test Selection Process Figure 1: Trace-Based Test Selection Process
dentify these ECUs by tracing the input signals sent during a test run through the Function Web. The whole process is depicted in Figure 1. We divide it nto two parts: Preparation and Selection. Step 2. We generate the Traces for each Keyword. They are common to the entire test suite and represent a certain user interaction with the system, such as Activate ACC. Each of these Keywords is linked to a concrete implementation, which consists of the signals and the respective

ACC Function Web Structure Figure 2: ACC Function Web Structure

Test Suite Reduction Rate per ECU

Data-Flow-Based Coverage Criteria for Black-Box Integration Test Suite Evaluation and
Optimisation

scheme as veriﬁcation-use. For a failure occurrence, the set of
veriﬁcation-use references represents those shared data on which the failure behavior of the system can be observed. Figure 1. Illustration of ECUs used in Chassis Control Subsystem. When do we stop integration testing? What if we do not have access to the code?

Exemplary Data Flow Profile of a Test Case Table II
EXEMPLARY DATA-FLOW PROFILE OF A FUNCTIONAL TEST CASE Data Flow Usage Shared Data Involved ECU Precondition V VEHICLE Brake Control System COND PBRK Parking Brake GEAR SELECT Gear Box Controller Stimulation V VEHICLE Brake Control System Veriﬁcation V VEHICLE Brake Control System HYD BRK TRQ Brake Control System V VEHICLE Brake Control System COND PBRK Parking Brake REQ GEAR SELECT Brake Control System

Example Coverage Criterion Veri f i cation-Data-Use: A Veri f
i cation-Data-Use use occurs if, for a shared data d, at least one test case exists which contains a reference to d for the purpose of behavior veri f i cation. Using this criterion, untested data f l ow can be revealed.

Similarity of Test Cases to Detected and Undetected Failures Precondition
641/2805 No Match 2164/2805 Match Stimulation 1806/1984 No Match 178/1984 Match Verification 2018/2156 No Match 138/2156 Match 0 20 40 60 80 100 Figure 4. Similarity of test-case to undetected failures focusing the u of data flow for verification purpose amount of matching, subsequent matching or non-matching references to shared data are shown in table 2. It can be seen that the highest amount of non-matching data-flow references occur for the verification usage of the data- flow. Further, data flow referenced for the precondition and stimulation usage are subject to subsequent effects. Over all purposes of shared data usage, a relative high amount of matches could be identified. Precondition 5/238 No Match 23/238 Subsequent Match 210/238 Match Stimulation 1/116 No Match 22/116 Subsequent Match 93/116 Match Verification 18/98 No Match 5/98 Subsequent Match 75/98 Match 0 20 40 60 80 100 Figure 2. Similarity of test cases to detected failures

Conclusions Glass Box: Detection of pseudo-tested code Identification of code
too trivial to test Black Box: Identify test cases to be executed based on traces Stop testing/find new test cases based on data- flow coverage

Prof. Dr. Stefan Wagner e-mail [email protected] phone +49 (0) 711
685-88455 WWW www.iste.uni-stuttgart.de/ese Twitter prof_wagnerst ORCID 0000-0002-5256-8429 Institute of Software Engineering Slides are available at www.stefan-wagner.biz.

Pictures used in this slide deck Max paraboloid by IkamusumeFan
under CC BY-SA 4.0 (https:// commons.wikimedia.org/wiki/File:Max_paraboloid.svg)

Test Suite Evaluation and Optimisation

Test Suite Evaluation and Optimisation

Stefan Wagner

More Decks by Stefan Wagner

Other Decks in Research

Featured

Transcript

VW-Seminar “Code- Analyse and beyond” 2020-12-11 Prof. Dr. Stefan Wagner

You can copy, share and change, film and photograph, blog,

2013 2014 2015 2016 2017 2018 2019 23 26 26

Increased amount of developments and releases Shift to Agile/DevOps causing

How e f f ective is my test suite in

Pseudo-Tested Code Test Suite Evaluation and Optimisation

Code Coverage Measures what code has been executed – but

Code Coverage – Example public class Calculation { private int

Mutation Testing – Idea Injecting faults and observing whether they

Mutation Testing – Process For every covered, mutatable code piece

Mutation Operators

Will my tests be able to tell? Idea: Remove whole

Implementation of a tool to do that: https://github.com/STAMP-project/pitest- descartes

Pseudo-tested code in popular open source Between 6% and 53%

Advantages of pseudo-tested code detection Much faster to compute than

Too Trivial to Test Test Suite Evaluation and Optimisation

Identify code regions with low fault risk FAULTY NON-FAULTY (Traditional)

Association Rule Mining ▪ Identify rules in a large dataset

# Rule Support Conﬁdence # 1 { NoMethodInvocations, SlocLessThan4, NoArithmeticOperations,

Proportion of low-fault-risk methods 0 % 25 % 50 %

Trace-Based Test Selection Test Suite Evaluation and Optimisation

scheme as veriﬁcation-use. For a failure occurrence, the set of

Trace-Based Test Selection Process Figure 1: Trace-Based Test Selection Process

ACC Function Web Structure Figure 2: ACC Function Web Structure

Test Suite Reduction Rate per ECU

Data-Flow-Based Coverage Criteria for Black-Box Integration Test Suite Evaluation and

scheme as veriﬁcation-use. For a failure occurrence, the set of

Exemplary Data Flow Profile of a Test Case Table II

Example Coverage Criterion Veri f i cation-Data-Use: A Veri f

Similarity of Test Cases to Detected and Undetected Failures Precondition

Conclusions Glass Box: Detection of pseudo-tested code Identification of code

Prof. Dr. Stefan Wagner e-mail [email protected] phone +49 (0) 711

Pictures used in this slide deck Max paraboloid by IkamusumeFan