Slide 1

Slide 1 text

Evaluating and Improving White-Box Test Generation Dávid Honfi Advisor: Zoltán Micskei, PhD Public PhD defense, http://hdl.handle.net/10890/15132 Critical Systems Research Group (ftsrg) Department of Measurement and Information Systems Budapest University of Technology and Economics

Slide 2

Slide 2 text

Evaluating and Improving White-Box Test Generation 2 • Software testing is crucial for quality • Thorough testing requires time and effort • Proposals of automated techniques • White-box test generation is being one of them • Non-trivial practical application of such techniques Scope and Motivation Extend the empirical evidence on white-box test generation techniques, and propose novel approaches that facilitate their use in practice.

Slide 3

Slide 3 text

White-Box Test Generation public int ClassifyNum(int n) { if(n > 0) { if(n % 2 == 0) return 0; else return 1; } else return 2; } 3 Evaluating and Improving White-Box Test Generation ID Input [n] Observed output T1 0 2 T2 1 1 T3 2 0 Seems to be easy on simple code snippets, but what about real code?

Slide 4

Slide 4 text

Evaluating and Improving White-Box Test Generation 4 Research Method and Challenges Study 1 Test generation during development Study 2 Classification of generated white-box tests Empirical studies Identified challenges C1 UNDER- STANDING C2 LACK OF TRUST C3 LOW COVERAGE C4 COMPLEX PROGRAMS

Slide 5

Slide 5 text

Thesis 1 Empirical Investigation of Practical White-Box Test Generation Evaluating and Improving White-Box Test Generation 5

Slide 6

Slide 6 text

Evaluating and Improving White-Box Test Generation 6 • Several existing studies target white-box test generation • Only some of them employ human participants • Problem: limited conclusions for practical settings • Possible solutions: Background Replications New setting • Increase confidence in results • Build a body of knowledge • Manage validity concerns • Provides novel insights • Gives freedom in study design • Yields limited validity Study 1 Study 2

Slide 7

Slide 7 text

Evaluating and Improving White-Box Test Generation 7 • Original study by Rojas et al. [1] • Goal: „Empirically evaluate the effects of using an automated test generation tool during development” • External differentiated replication with changes including: – Test generator tool: EvoSuite → Microsoft Pex/IntelliTest – Programming language: Java → C# – Participants knowledge: Students only → Professionals and students T1.1 Study 1: Test Generation in Development [1] José Miguel Rojas, Gordon Fraser, and Andrea Arcuri. Automated unit test generation during software development: a controlled experiment and think-aloud observations. In: Proceedings of the ISSTA 2015, pp. 338–349. ACM, 2015. doi: 10.1145/2771783.2771801.

Slide 8

Slide 8 text

Evaluating and Improving White-Box Test Generation 8 Summary of results (30 participants) • Coverage can be higher with test generation • Test generation reduces the amount of user activity required • Spending more time with test generation improves quality T1.1 Study 1: Test Generation in Development

Slide 9

Slide 9 text

T1.2 Study 2: Classification of White-Box Tests Motivation • Question: Are these tests correct? ◦ OK: correct w.r.t specification ◦ WRONG: contradicts specification • Not considered in empirical studies • Issues caused in practice: ◦ Real efficiency can be worse ◦ Can be an effort-intensive task [TestMethod] public void CalculateSumTest284() { int[] ints = new int[5] { 4,5,6,7,8 }; int i = CalculateSum(0, 0, ints); Assert.AreEqual(0, i); } [TestMethod] public void CalculateSumTest647() { int[] ints = new int[5] { 4,5,6,7,8 }; int i = CalculateSum(0, 4, ints); Assert.AreEqual(15, i); } 9 Evaluating and Improving White-Box Test Generation Goal: How do developers who use test generator tools perform in deciding whether the generated tests encode expected or unexpected behavior?

Slide 10

Slide 10 text

Evaluating and Improving White-Box Test Generation 10 RQ1: How do developers perform in the classification of generated tests? RQ2: How much time do developers spend with the classification? T1.2 Study 2: Classification of White-Box Tests Subjects Objects Context Task • Students only • Source: V&V course • Basic experience • Apply voluntarily • Source: GitHub • 5 methods in 4 repos • Artificial faults (ODC) • 3 tests per method • 15 mins tutorial • Experiment portal • Visual Studio • Test runs and debug • Classify all 15 tests • At most 60 minutes • Activities recorded • Data is logged

Slide 11

Slide 11 text

Evaluating and Improving White-Box Test Generation 11 Summary of results (106 participants) • RQ1: ”… yield only a moderate classification performance …” • RQ2: ”… more than one minute is usually required …” T1.2 Study 2: Classification of White-Box Tests

Slide 12

Slide 12 text

Evaluating and Improving White-Box Test Generation 12 1.1 I designed and conducted a replication for an existing empirical study with 30 human participants on using white-box test generation during development to gain further insights of the topic. 1.2 I designed and conducted an empirical study and its replication with 106 individuals altogether. The studies addressed the white-box test classification performance of the participants. Thesis 1 – Summary Design and execution of empirical studies to investigate the challenges of white-box test generation in practical settings. Σ Publication: SQJ’19 I have analyzed the use of white-box test generation in two separate empirical studies. Based on the results I have identified and quantified new challenges that may hinder practical white-box test generation, and also strengthened the evidence for already known challenges.

Slide 13

Slide 13 text

Thesis 2 Automated Isolation of Dependencies in White-Box Test Generation Evaluating and Improving White-Box Test Generation 13

Slide 14

Slide 14 text

Evaluating and Improving White-Box Test Generation 14 Motivation: challenges of white-box test generators • C3: Low code coverage • C4: Large, complex programs • One of common root causes: environment interaction Background bool TransferMoney(Token userToken, long amount, Account destination) { if (amount <= 0) throw new Exception("Invalid amount to transfer"); int balance = DB.RunQuery("GetBalance", userToken); if (balance < amount) throw new Exception("Not enough balance"); TransferProcessor tp = new TransferProcessor(userToken); ProcessedTransfer pt = tp.Process(amount, destination); return pt.IsSuccess; } DB service network

Slide 15

Slide 15 text

Evaluating and Improving White-Box Test Generation 15 T2.1 Design of the Approach Users can also defne behavior with input-effect pairs

Slide 16

Slide 16 text

Evaluating and Improving White-Box Test Generation 16 Experimental evaluation with 44 kLoC of open source projects • RQ1: How does it improve statement and branch coverage? • RQ2: How does it increase the time spent with test generation? T2.2 Adaptation and Evaluation

Slide 17

Slide 17 text

Evaluating and Improving White-Box Test Generation 17 2.1 I designed an approach to automatically isolate the unit under test from its dependencies for test generation purposes. This approach also generates a parameterized sandbox around the isolated unit that can be utilized by the test generator. I mapped the automated isolation approach to the domain of a concrete white-box test generator (Microsoft Pex/IntelliTest) to demonstrate the feasibility of the approach. 2.2 I conducted a large-scale quantitative evaluation to assess the performance of the approach via the implemented tool. The tool was able to improve the coverage reached by generated tests by around 50-60% in problematic cases. Thesis 2 – Summary Design and evaluation of an approach that automatically transforms the code under test from its dependencies for test generation. Σ Publications: IST’20, DiB’20, PP’17, BME’16, BME’17 I designed a novel, automated, code transformation based approach for alleviating the external dependency problem in white-box test generation.

Slide 18

Slide 18 text

Thesis 3 Visually Aiding the Use and Problem Resolution of Symbolic Execution Evaluating and Improving White-Box Test Generation 18

Slide 19

Slide 19 text

Evaluating and Improving White-Box Test Generation 19 Motivation: challenges of white-box test generators • C1: Difficulty of understanding generated white-box tests • C2: Low trust in white-box test generation techniques and tools Idea: visualizing symbolic execution based test generation • The visualization should be clear, traceable, and detailed • It should represent all necessary information – To help in grasping an overview and understanding of the process – To support precise problem identification Background

Slide 20

Slide 20 text

Evaluating and Improving White-Box Test Generation 20 • Main concept: symbolic execution tree • Elements of visualization – Nodes: shape, color, border, label – Edges: color • Additional data attached to nodes – Source code mapping – Path conditions • Use cases – Engineering – Education T3 Design of the Approach

Slide 21

Slide 21 text

Evaluating and Improving White-Box Test Generation 21 • Mapping of the technique to domain of Microsoft Pex/IntelliTest • Implementation as an IDE extension T3 Adaptation

Slide 22

Slide 22 text

Evaluating and Improving White-Box Test Generation 22 • Identification of additional metrics from related domains using survey papers and intuition – Static code metrics (SC) – Symbolic execution related metrics (SE) – Test code metrics (GT) – Generic graph metrics (GG) • Attachment of metrics to tree: node (N), path (P), execution (E) • Mapping of metrics to problems in SE based on experiences and intuition – Constraint solver (CS) – State space exploration (SSE) – Object creation (OC) – Environment interaction (EI) T3 Multi-Domain Metrics

Slide 23

Slide 23 text

Evaluating and Improving White-Box Test Generation 23 I designed a generic visual representation of symbolic execution trees that handles additional data, which are related to the path conditions, constraints and generated test cases, for easier understanding and issue identification. I adapted the generic representation to a concrete white-box test generator (Microsoft Pex/IntelliTest) by precisely mapping each generic concept to the concrete domain. Based on the analysis of multiple related domains, I identified and organized several metrics that alleviate the problem identification process during test generation based on symbolic execution. Thesis 3 – Summary Designed a novel visualization technique for symbolic execution to support understanding and problem identification. Σ Publications: ICST’15, BME’18 I proposed a possible visual representation of symbolic execution trees that can handle additional me tadata as well. I implemented the technique for Microsoft Pex/IntelliTest, an advanced symbolic execu tion-based test generator.

Slide 24

Slide 24 text

Summary Evaluating and Improving White-Box Test Generation 24

Slide 25

Slide 25 text

Evaluating and Improving White-Box Test Generation 25 Publications Thesis 1 Thesis 2 Thesis 3 Journal SQJ IST DIB PerPol Conference ICST Workshop 2x BME BME Related to theses Highlights 8 publications • 4 journal (incl. IST, SQJ) • 1 conference (ICST) • 3 workshop (BME) 8 independent citation (peer-reviewed)

Slide 26

Slide 26 text

• Tools implemented – T2 (AutoIsolator): Automated isolation for Microsoft Pex/IntelliTest – IDE extension for immediate dynamic analysis – Website: https://ftsrg.github.io/autoisolator – T3 (SEViz): Visualizing the test generation of Microsoft Pex/IntelliTest – Website: https://ftsrg.github.io/seviz – Cao et al. [2] built a tool on the basis of SEViz • Education – T2: Served as a basis of a B.Sc. thesis – T3: Using the implemented SEViz tool – Lab in the SWSV course at BME – Basis of two B.Sc. theses Evaluating and Improving White-Box Test Generation 26 Applications

Slide 27

Slide 27 text

Evaluating and Improving White-Box Test Generation 27 Conclusions Thesis 1 Thesis 2 Thesis 3 • Design and evaluate an approach for automatically isolating dependencies • Future work – User-oriented empirical study – Object state tracking and use • Evaluate white-box test generators in practical settings • Future work – Further replications – New studies for challenges Study 1 Study 2 New challenges • Design an approach for visualization, and propose use cases • Future work – User-oriented empirical study – Mapping CFG to SE tree Eval. Tool Automated isolation Tool Uses Visualization of SE