The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

The Impact of Test Case Summaries on Bug Fixing Performance:
An Empirical Investigation Sebastiano Panichella Annibale Panichella Moritz Beller Andy Zaidam Harald Gall

Why? @Test public void test0() throws Throwable { Option option0
= new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } 2 Class Name: Option.java Library: Apache Commons-Cli

@Test public void test0() throws Throwable { Option option0 =
new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Class Name: Option.java Library: Apache Commons-Cli Why? 3 Q1: What are the main differences? Q2: Do they cover different parts of the code?

new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Class Name: Option.java Library: Apache Commons-Cli 4 Why? Q1: What are the main differences? Q2: Do they cover different parts of the code?

new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Class Name: Option.java Library: Apache Commons-Cli 5 Candidate Assertions Why? Q1: What are the main differences? Q2: Do they cover different parts of the code?

new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Class Name: Option.java Library: Apache Commons-Cli 6 Q3: Are these assertions correct? Why? Q1: What are the main differences? Q2: Do they cover different parts of the code?

new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void test1() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb"); Option option1 = new Option("aaabbb", true, "aaabbb"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } 7 Test Code Comprehension Generated Tests Production Code public class Options implements Serializable { private static final long serialVersionUID = 1L; /** a map of the options with the character key */ private Map shortOpts = new HashMap(); /** a map of the options with the long key */ private Map longOpts = new HashMap(); /** a map of the required options */ private List requiredOpts = new ArrayList(); Earl T. Barr, et al., “The Oracle Problem in Software Testing: A Survey”. IEEE Transactions on Software Engineering, 2015.

Are Generated Tests Helpful? G. Fraser et al., Does Automated
Unit Test Generation Really Help Software Testers? A Controlled Empirical Study, TOSEM 2015. Do not lead to detection of more faults. 8 0% Testing Comprehension Testing time 75% 100%

Our Solution Test Case 9

Test Coverage Analysis COBERTURA Test Suite Generation Option.java TestDescriber @Test
public void testProva() throws Throwable { Option option0 = new Option("aaa", true, "aaa"); Option option1 = new Option("aaa", true, "aaa"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } @Test public void testProva2() throws Throwable { Option option0 = new Option("aaa", true, "aaa"); Option option1 = new Option("aaa", true, "aaa"); option0.setLongOpt("adafv"); option1.setLongOpt("adafv"); boolean boolean0 = option1.equals((Object) option0); assertEquals("arg", option1.getArgName()); assertTrue(option0.hasArg()); assertTrue(boolean0); } Summary Generation 10

Summary Generator Software Words Usage Model: deriving <actions>, <themes>, and
<secondary arguments> from class, methods, attributes and variable identiﬁers E. Hill et al. Automatically capturing source code context of NL-queries for software maintenance and reuse. ICSE 2009 11

Summary Generator public class Option { public Option(String opt, String
longOpt, boolean hasArg, String descr) throws IllegalArgumentException { OptionValidator.validateOption(opt); this.opt = opt; this.longOpt = longOpt; if (hasArg) { this.numberOfArgs = 1; } this.description = descr; } ... } SWUM in TestDescriber: Covered Code 12

public class Option { public Option(String opt, String longOpt, boolean
hasArg, String descr) throws IllegalArgumentException { OptionValidator.validateOption(opt); this.opt = opt; this.longOpt = longOpt; if (hasArg) { //FALSE this.numberOfArgs = 1; } this.description = descr; } ... } Summary Generator SWUM in TestDescriber: 1) Select the covered statements Covered Code 13

public class Option { public Option(String opt, String longOpt, boolean
hasArg, String descr) throws IllegalArgumentException { OptionValidator.validateOption(opt); this opt = opt; this longOpt = longOpt; if (hasArg) {false } this description = descr; } ... } SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. Summary Generator Covered Code 14

public class Option { public Option(String opt, String long Opt,
boolean has Arg, String descr) throws IllegalArgumentException { Option Validator.validate Option(opt); this opt = opt; this long Opt = long Opt; if (has Arg) {false ; } this description = descr; } ... } SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. 3) Identiﬁer Splitting (Camel case) Summary Generator Covered Code 15

public class Option { public Option(String option, String long Option,
boolean has Argument String description) throws IllegalArgumentException { Option Validator.validate Option(option); this option = option; this long Option = long Option; if (has Argument) {false } this description = description; } ... } SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. 3) Identiﬁer Splitting (Camel case) 4) Abbreviation Expansion (using external vocabularies) Summary Generator Covered Code 16

SWUM in TestDescriber: 1) Select the covered statements 2) Filter
out Java keywords, etc. 3) Identiﬁer Splitting (Camel case) 4) Abbreviation Expansion (using external vocabularies) 5) Part-of-Speech tagger Summary Generator <actions> = Verbs <themes> = Nouns/Subjects <secondary arguments> = Nouns / objectes, adjectives, etc public class Option { Option(String option, String long Option , boolean has Argument String description) throws IllegalArgumentException Option Validator.validate Option(option); this option = option ; this long Option = long Option; if (has Argument false } this description = description; } NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN ADJ Covered Code 17

Summary Generator NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN
NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN The test case instantiates an "Option" with: - option equal to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument public class Option { Option(String option, String long Option , boolean has Argument String description) throws IllegalArgumentException Option Validator.validate Option(option); this option = option ; this long Option = long Option; if (has Argument false } this description = description; } NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN ADJ Natural Language Sentences Parsed Code 18

The test case instantiates an "Option" with: - option equal
to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument Natural Language Sentences 19 Class Level Method Level Statement Level Branch Level Summarisation Levels

Summarisation Levels The test case instantiates an "Option" with: -
option equal to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument Natural Language Sentences 20 Class Level Method Level Statement Level Branch Level Do Test Summaries Improve Test Readability? Do Test Summaries Help Developers?

Case Study Bug Fixing Tasks Involving 30 Developers 21

Context Object: two Java classes from Apache Commons Primitives and
Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015] Subjects: 30 Developers ArrayIntList.java Rational.java 22

Subjects: 30 Developers (23 Researchers and 7 Developers) Context Object:
two Java classes from Apache Commons Primitives and Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015] ArrayIntList.java Rational.java 23

Study Procedure 24

Bug Fixing Tasks Group 1 Group 2 ArrayIntList.java Rational.java ArrayIntList.java
Rational.java 25

Rational.java 26

Rational.java 27

Rational.java Comments Comments TestDescriber 28

Bug Fixing Tasks Experiment conducted Ofﬂine via a Survey platform
Each participant received the experiment package consisting of: 1. A pretest questionnaire 2. Instructions and materials to perform the experiment 3. A post-test questionnaire We do not revealed the goal of the study 45 minutes of time for each task 29

How do test case summaries impact the number of bugs
ﬁxed by developers? RQ1

RQ1: How do test case summaries impact the number of
bugs ﬁxed by developers? 31

bugs fixed by developers? Participants WITHOUT TestDescriber summaries fixed 40% of injected bugs None of them was able to fix all bugs. 32

bugs fixed by developers? Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs. 33 Participants WITHOUT TestDescriber summaries fixed 40% of injected bugs None of them was able to fix all bugs.

bugs fixed by developers? With summaries, the participants were able to fix twice as many number of bugs (+50%,+100%), in the same time window (45 minutes). The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE 34 Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs. Participants WITHOUT TestDescriber summaries fixed 40% of injected bugs None of them was able to fix all bugs.

bugs fixed by developers? Results are not influenced by developers’ experience: (i) the number of bugs fixed is not significantly influenced by the programming experience; (ii)there is no significant interaction between the programming experience and the presence of test case summaries. 35 The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE

bugs fixed by developers? Results are not influenced by developers’ experience: (i) the number of bugs fixed is not significantly influenced by the programming experience; (ii) there is no significant interaction between the programming experience and the presence of test case summaries. Summary: Using automatically generated test case summaries significantly helps developers to identify and fix more bugs. 36

How do test case summaries impact developers to change test
cases in terms of structural and mutation coverage? RQ2

ArrayIntList.java Rational.java RQ2: How do test case summaries impact developers
to change test cases in terms of structural and mutation coverage? 38

to change test cases in terms of structural and mutation coverage? ONLY for Rational there is an improvements of the mutation score (+10%) when tests are enriched with summaries. 10% 39

to change test cases in terms of structural and mutation coverage? ONLY for Rational there is an improvements of the mutation score (+10%) when tests are enriched with summaries. 10% Summary: Test case summaries do not influence how the developers manage the test cases in terms of structural coverage. 40

Test Cases Summaries and Comprehension Without With 4% 6% 14%
33% 14% 6% 32% 9% 36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries 41

Test Cases Summaries and Comprehension WITH Summaries: (i) 46% of
participants consider the test cases as “easy to understand”. (iii) Only 18% of participants considered the test cases as incomprehensible. Without With 4% 6% 14% 33% 14% 6% 32% 9% 36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries 42

Test Cases Summaries and Comprehension WITHOUT Summaries: (i) Only 15%
of participants consider the test cases as “easy to understand”. (iii) 40% of participants considered the test cases as incomprehensible. WITH Summaries: (i) 46% of participants consider the test cases as “easy to understand”. (iii) Only 18% of participants considered the test cases as incomprehensible. Without With 4% 6% 14% 33% 14% 6% 32% 9% 36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries 43

Without With 4% 6% 14% 33% 14% 6% 32% 9%
36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries Test Cases Summaries and Comprehension WITHOUT Summaries: (i) Only 15% of participants consider the test cases as “easy to understand”. (iii) 40% of participants considered the test cases as incomprehensible. WITH Summaries: (i) 46% of participants consider the test cases as “easy to understand”. (iii) Only 18% of participants considered the test cases as incomprehensible. Summary: Test summaries statistically improve the comprehensibility of automatically generated test case according to human judgments. 44

Quality of TestDescriber’ Summaries Expressiveness 30% 70% Is easy to
read   and understand  Is somewhat readable   and understandable Is hard to read and   understand Conciseness 10% 52% 38% Has no unnecessary   information  Has some unnecessary   information Has a lot of unnecessary  information Content adequacy 13% 37% 50% Is not missing any   information  Missing some   information Missing some very   important information 45

Conclusion 1) Using automatically generated test case summaries significantly helps
developers to identify and fix more bugs. 2) Test case summaries do not influence how the developers manage the test cases in terms of structural coverage. 3) Test summaries statistically improve the comprehensibility of automatically generated test case according to human judgments. Panichella et al. “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation”. ICSE 2016 48

The Impact of Test Case Summaries on Bug Fixing...

The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation

More Decks by Sebastiano Panichella

Other Decks in Research

Featured

Transcript